Twitter AI Agent - 2026-05-05¶
1. What People Are Talking About¶
1.1 The /goal Command Redefines Multi-Day Agent Runs π‘¶
The day's highest-scoring post came from @AlexFinn declaring (349 likes, 38 replies, 424 bookmarks, 15209 views) that "/goal" is "the biggest advancement in AI coding this year" because "it allows your AI agent to quite literally work for days without stopping. You give a mission. It works until the mission is complete." The key insight: /goal is useless without a well-structured executive plan, shifting the bottleneck from model capability to prompt architecture.

@cheddarmandem surfaced (23 likes, 30 bookmarks, 828 views) the OpenAI cookbook article on executive plan templates that work with /goal, linking directly to developers.openai.com. @Dimillian wrote (105 likes, 129 bookmarks, 13827 views) a long reflection on compound engineering: "the tech stack matters as much as the method," arguing that the right harness + model pairing determines whether multi-day runs succeed or burn budget.

Comparison to prior day: May 4 focused on the build-vs-buy harness engineering debate and the AHE paper's automated harness evolution. May 5 shifts the conversation to a specific product feature (/goal) that embodies what harness engineering enables in practice -- continuous multi-day execution with structured planning.
Discussion insight: The replies reveal a tension between /goal's power and its dependency on "executive plans" -- essentially, the bottleneck has moved from coding to writing good plans for agents, creating demand for plan templates and structured prompt architectures.
1.2 SubQ Ships 12M Token Context, Declares RAG Dead π‘¶
@KateMillerGems proclaimed (997 likes, 39 replies, 4 bookmarks, 115309 views): "RIP RAG pipelines. RIP chunking hacks. RIP summarisation loops. They were never clever engineering. They were apologies for a broken foundation. SubQ shipped 12M tokens of the working context. The workarounds are over." The post's 115K views and 997 likes made it the most-viewed AI-agent post of the day.
@alex_whedon announced (47 likes, 72 bookmarks, 9393 views) SubQ's early access alongside SubQ Code, their coding agent. @willdepue pushed back technically (59 likes, 8 bookmarks, 2523 views): "Can you clarify the approximate complexity of your method? Is it O(n), O(n log n), O(n^k < 2)? What's stopping you from demoing 100M, 1B, even 10B context if truly subquadratic?"
Comparison to prior day: May 4 featured context engineering gaining structured vocabulary with architecture diagrams and workbooks. May 5 escalates with SubQ's specific product claim of 12M working tokens, framing it as a replacement for entire categories of infrastructure (RAG, chunking, summarization) rather than an incremental improvement.
Discussion insight: The skepticism from @willdepue about computational complexity signals the community is not accepting the "RAG is dead" claim at face value -- demanding proof of scalability characteristics before declaring victory over established patterns.
1.3 Hermes + HyperFrames Creates the Agent Video Stack π‘¶
The Hermes ecosystem's video capability became a major thread. @AndyMarlowg celebrated (49 likes, 40 bookmarks, 8618 views): "Hermes runs anywhere. HyperFrames renders anywhere. now they ship together -- this is the agent video stack I wanted. hermes skills install hyperframes." @dr_cintas explained (23 likes, 21 bookmarks, 3013 views) the mechanics: "Describe the video you want. Your agent writes the HTML. It renders to MP4. No editor. No timeline."
@_0xpainn demonstrated (30 likes, 8 bookmarks, 233 views) the broader self-improving angle: "A self improving AI agent that runs on a $5/month VPS costs $0 per task. Free to run. Free per task. Forever. Rent a VPS for $5. Install Ollama. One command to install Hermes. Done. It creates reusable skills from every task." @MystiqueMide tested (14 likes, 175 views) voice mode on Discord, continuing the multi-modal expansion.
Comparison to prior day: May 4 saw the Hermes ecosystem expanding through desktop apps, Shopify skills, and MMORPG plugins. May 5 adds a video creation pipeline (HyperFrames) and voice mode, completing Hermes's evolution from text-only agent into a full multi-modal automation platform.
Discussion insight: Replies from @Feel594326 and @yafadec815 highlight that the community values "portability" -- the ability to run the same agent stack anywhere without vendor lock-in is a key differentiator for Hermes over proprietary alternatives.
1.4 Multi-Agent Orchestration Tools Formalize π‘¶
@geminicli announced Scion (144 likes, 112 bookmarks, 6132 views): "a new multi-agent orchestration tool that orchestrates agents (Claude Code, Gemini CLI, Codex, and others) as isolated, concurrent processes. Each agent gets its own container, git worktree, and credentials." @daytradingzoo showed (23 likes, 25 bookmarks, 1562 views) a practical two-agent setup: "Claude - frontend, Codex - backend. Handovers, second-opinion reviews, github for alignment."

@om_patel5 described (5 likes, 2 bookmarks, 395 views) a novel pattern: "two developers and two Claude Codes in the same chat room -- all four talking. Two humans and two AI agents in one conversation planning a feature together." @aakashgupta reported (8 likes, 24 bookmarks, 2902 views): "A 21-agent team running 4 parallel sprints inside Claude Code shipped from idea to App Store submission."
Comparison to prior day: May 4 discussed multi-agent coordination complexity as a frustration point (running "5-7 agents directly all day feels timid"). May 5 shows the tooling catching up -- Scion provides formal container isolation, while practitioners share concrete multi-agent workflows that actually work in production.
Discussion insight: @om_patel5 raises a critical warning: "two AI models can make each other sound more confident and hallucinate while quietly drifting from what you actually asked for" -- suggesting that multi-agent setups need explicit decision logs and human checkpoints to prevent collective drift.
1.5 HeyGen Agent Saturates the Async Communication Niche π‘¶
The HeyGen + Superhuman Go integration generated volume across 15+ posts. @viipin8 captured (91 likes, 14 bookmarks, 11197 views) the pitch: "most updates fail because they don't land the first time so they get repeated in threads, meetings, follow-ups. Superhuman Go + HeyGen Agent turning them into video/voice feels like a clean way to fix that." @Parul_Gautam7 added (46 likes, 10 bookmarks, 6849 views): "Typing updates -- ignored. Meetings -- overkill. Now you can just say it."
@Logical_Girll noted (31 likes, 6 bookmarks, 610 views): "feels like the start of agents living inside your tools, not beside them." The coordinated amplification pattern (15+ accounts quoting the same @HeyGen announcement) was similar to May 4's launch pattern but with higher sustained engagement.
Comparison to prior day: May 4 noted the HeyGen + Superhuman Go launch as a "New and Notable" signal. May 5 shows the integration saturating the discourse with organic rephrasing and adoption signals, suggesting it resonated beyond the initial launch coordination.
1.6 Agent Skill Verification and Supply Chain Security π‘¶
@omarsar0 published two key threads (21 likes, 34 bookmarks, 1703 views). The first on skill verification: "If you ship agent skills, your runtime is treating signed-and-cleared skills as trusted by default. This paper argues a skill is untrusted code until it is verified." He called for "SKILL.md before agent skill libraries become the next attack surface."

His second thread (105 likes, 170 bookmarks, 7371 views) covered HeavySkill's agentic harness design: "They argue that what actually drives agent harness performance is not the orchestration code. It's a single inner skill: parallel reasoning."

@yzg75001 replied: "i've been running agent skills in production and the trust model is basically vibes-based rn -- signed skill = trusted, which is wild. we need something like sigstore for agent skills."
Comparison to prior day: May 4 identified agent security gaps (morse code exploits, $200K drained). May 5 elevates the conversation from "agents get hacked" to "the skill distribution layer itself is an attack surface," with a published paper proposing formal verification and @omarsar0 framing it as a supply-chain problem.
Discussion insight: @DylSwanepoel's reply crystallizes the shift: "Agent skills are not just prompts or convenience wrappers. They are executable artifacts. Once a skill can touch tools, data, money, or production systems, trust cannot be implied by where it came from."
1.7 Enterprise Agent Governance Deepens: ServiceNow + Microsoft π‘¶
@ServiceNowNews announced (47 likes, 8 bookmarks, 3570 views) that "ServiceNow + Microsoft are unifying agent governance across both platforms." AI Control Tower now extends governance to Microsoft Agent 365, giving teams "one place to discover, approve, and manage agents." @Sam_Badawi provided (66 likes, 2 bookmarks, 4463 views) financial context: "$NOW is integrating its AI Control Tower with $MSFT Agent 365, expanding governance and visibility across AI agents operating within both ecosystems."
@Arkive_live reported (12 likes, 2 bookmarks, 44 views): "An internal AI agent at Meta recently leaked restricted employee data and org charts. Even tech giants struggle with governance when AI steps beyond controlled environments."
Comparison to prior day: May 4 featured Microsoft Agent 365 going GA and DeepMind's agent attack taxonomy. May 5 deepens the enterprise governance narrative with ServiceNow's cross-platform integration announcement and a concrete Meta data leak incident, reinforcing that governance is a live operational concern, not theoretical.
2. What Frustrates People¶
Token Cost and Optimization Complexity¶
@akshay_pachaar quantified (130 likes, 223 bookmarks, 24291 views) dramatic savings: "Claude Code used 3x fewer tokens with one change: Before: 10.4M tokens, 10 errors, $9.21. After: 3.7M tokens, 0 errors, $2.81." The fix was using "Insforge Skills + CLI as the backend context engineering layer" -- implying that out-of-the-box agent configurations burn money unnecessarily.
Multi-Agent Confidence Drift¶
@om_patel5 warned (5 likes, 2 bookmarks, 395 views) that in multi-agent setups "two AI models can make each other sound more confident and hallucinate while quietly drifting from what you actually asked for." The failure mode is subtle: each agent validates the other's output, creating a closed loop that looks correct but has departed from the original intent.
Skill Trust Model Is "Vibes-Based"¶
@yzg75001 admitted in a reply to @omarsar0: "i've been running agent skills in production and the trust model is basically vibes-based rn -- signed skill = trusted, which is wild. the supply chain attack vector is real." The gap between "shipped to marketplace" and "verified safe to execute" remains unresolved.
Demoware vs Production-Ready Agents¶
@databricks opened (53 likes, 18 bookmarks, 2717 views) with a blunt assessment: "Most 'agentic AI' is still demoware. Data work and coding are clear exceptions." @LandonExplr replied: "Agentic AI works where outputs are verifiable. Data pipelines qualify. Everything else claiming 'agentic' is still demoware."
3. What People Wish Existed¶
Inter-Agent Communication Pipes¶
@SaidAitmbarek described (12 likes, 1 bookmark, 206 views) his workflow of copying between ChatGPT and Codex and wished for: "a headless bridge (pipe) that streams data between agents (ad-hoc). Like OAuth, but for agents and with durable streams." The gap: no standard way for two agents from different providers to pass context back and forth in real time.
Skill Verification Infrastructure¶
@omarsar0 called for (21 likes, 34 bookmarks) "SKILL.md" -- a verification standard for agent skills similar to software supply chain attestation. @yzg75001 amplified: "we need something like sigstore for agent skills before someone ships a malicious skill that has write access to prod."
Context Scalability Proof Beyond 12M Tokens¶
@willdepue challenged (59 likes, 8 bookmarks) SubQ directly: "What's stopping you from demoing 100M, 1B, even 10B context if truly subquadratic?" The need: verifiable demonstration that long-context solutions actually scale, not just large-number claims without complexity analysis.
Agent-Native Permission and Identity Layers¶
@Arkive_live argued (12 likes) after the Meta leak: "Enterprises need intelligence layers with built-in permission boundaries and auditability from day one." The Meta incident proved that retrofitting governance onto existing agent deployments fails.
4. Tools and Methods in Use¶
| Tool | Category | Sentiment | Strengths | Limitations |
|---|---|---|---|---|
| Claude Code + /goal | Coding agent | Positive | Multi-day autonomous runs, executive plan structure, subagents | Requires carefully written plans; useless without good prompts |
| Hermes Agent | Agent framework | Positive | HyperFrames video, voice mode, $5/mo VPS deployment, 100+ skills, 17 platforms | New skill trust model unverified; ecosystem quality variance |
| Scion | Multi-agent orchestrator | Positive | Container isolation per agent, git worktree separation, concurrent processes | Brand new; limited production track record |
| SubQ | Long-context engine | Polarized | Claims 12M working tokens; eliminates RAG/chunking | Complexity unproven; no public demo at 100M+ scale |
| Insforge Skills | Context engineering | Positive | 3x token reduction on Claude Code; zero errors vs 10 errors baseline | Requires CLI familiarity; new ecosystem |
| Genie Code (Databricks) | Data agent | Positive | 4+ years of harness tuning; Spark Declarative Pipelines; natural language | Limited to data engineering domain |
| OpenClaw | Agent platform | Positive | 13,700+ skills in marketplace; per-agent model selection | Running it yourself is "still a nightmare" (per @cyrilXBT) |
| AG-UI Protocol | Agent protocol | Positive | Adopted by Google, AWS, Microsoft, LangChain, Mastra, TanStack | Protocol not runtime; requires implementation |
| DeepSeek TUI | Terminal coding agent | Emerging | 1M context, sub-agents, keyboard-driven, git management | Less ecosystem support than Claude/Codex |
| Flue | Agent harness | Positive | One-click Render deploy; drop .ts file in agents/ | TypeScript only; early stage |
The standout shift: the tools conversation has moved from "which model is best" to "which orchestration layer minimizes token spend while maximizing autonomous run duration." Insforge's 3x token reduction and /goal's multi-day execution represent the new performance frontier.
5. What People Are Building¶
| Project | Who built it | What it does | Problem it solves | Stack | Stage | Links |
|---|---|---|---|---|---|---|
| Scion | @geminicli | Multi-agent orchestrator with container isolation | Agents sharing state/credentials create blast radius | Containers, git worktrees, Claude/Gemini/Codex | Shipped | post |
| HyperFrames Skill | @HeyGen + @NousResearch | One-line video rendering from Hermes agent | Agents cannot create video without external editors | Hermes skills, HTML-to-MP4 | Shipped | post |
| Entire CLI Skills | @ashtom / @EntireHQ | Teaches agents to use full commit context (prompts, transcripts, decisions) | Agents lack access to reasoning behind code changes | Open source, session-handoff | Shipped | post |
| Autoresearch v2 | @AndrewK404 | Long optimization runs with memory, async experiments, falsifiers | Karpathy's PoC lacks honesty enforcement in long runs | Agent swarm, memory layer | Shipped | post |
| AgenC Marketplace | @tetsuoarena | On-chain agent task marketplace across Claude/Codex/Hermes/MCP | Wallet authority mixed with untrusted task text | Solana, MCP, multiple agent CLIs | Devnet | post |
| AutoSwarm | @artemg314 | Auto-optimizes entire multi-agent pipeline (30% to 90% on Terminal-Bench) | Individual agent optimization ignores team dynamics | Meta-agent, Terminal-Bench | Shipped | post |
| DAEMON Terminal | @DaemonTerminal | Agent IDE with local LLM, plugin marketplace, team workspaces | Fragmented agent tools, no collaboration | Ollama, LM Studio, plugins | Roadmap | post |
| Odysseus (VLM Gaming Agent) | @sethkarten | Finetuning VLMs to beat Mario via PPO reinforcement learning | Gaming agents lack reactive spatial reasoning | VLMs, PPO, Super Mario Land | Research | post |
| oh-my-agent-skills | @GT_Chiang | 14 skills across 6 bundles -- execution logic for agent failure recovery | Agents fail without structured recovery paths | Skills SDK | Shipped | post |
| PEPT://BASE Hermes Skill | @peptbase | Structured peptide intelligence for biotech agent workflows | No agent-native peptide knowledge access | Hermes skills, 100+ peptides | Shipped | post |
| Obsidian Vault for Agent Memory | @tom_doerr | Persistent knowledge base accessible to coding agents | Agents lose context between sessions | Obsidian, agent integration | Shipped | post |
6. New and Notable¶
Coinbase Cuts 14% and Mandates AI-Native Pods¶
@rohanpaul_ai analyzed (6 likes, 4 bookmarks, 2272 views) Brian Armstrong's memo announcing 14% layoffs plus "one-person teams" where engineers, designers, and PMs merge into single roles augmented by agent fleets. The framing: "AI has changed the smallest useful unit of software work from a team to a single high-context operator." This is the first major public company explicitly restructuring around agent capabilities rather than simply adding AI tooling.
Grok Build and Grok Terminal Enter the Coding Agent Race¶
@MarioNawfal reported (141 likes, 16 bookmarks, 33142 views): "Grok is done being just a chatbot. xAI is dropping Grok Build, a full coding agent, and Grok Terminal, straight from your command line." @testerlabor added (14 likes) that 3 Grok Build models are training simultaneously. The coding agent market gains another well-funded competitor.
xAI Launches Custom Voice Cloning for Agent APIs¶
@AlternativeTo reported (9 likes, 3 bookmarks, 1302 views) that xAI launched Custom Voices: "clone and deploy voices for Grok text-to-speech, apps, and Voice Agent APIs in under two minutes, with multilingual support." This lowers the barrier for voice agent personalization significantly.
TradingAgents Tops GitHub Finance Repos¶
@quantscience_ listed (72 likes, 121 bookmarks, 3521 views) the fastest growing GitHub finance repos: "TradingAgents (+7.9K stars) -- multi-agent LLM trading framework from UCLA/MIT with fundamental analyst, sentiment analyst, technicals, risk manager with DeepSeek V4 thinking."

Ctx2Skill: Self-Evolving Skills via Multi-Agent Self-Play¶
@HuggingPapers shared (43 likes, 33 bookmarks, 2062 views) a new framework that "autonomously discovers skills from complex contexts through multi-agent self-play. No human labels or external feedback needed." Result: GPT-4.1 solve rates improved from 11.1% to 16.5% on CL-bench.

7. Where the Opportunities Are¶
[+++] Agent skill verification and supply chain security. @omarsar0's "SKILL.md" framing, @yzg75001's admission that production trust is "vibes-based," and 13,700+ skills in OpenClaw's marketplace converge on a clear gap: no sigstore-equivalent for agent skills exists. Whoever builds verified skill attestation captures the trust layer as skill marketplaces scale.
[+++] Context engineering that eliminates token waste. @akshay_pachaar demonstrated 3x token reduction (10.4M to 3.7M) with zero errors via Insforge Skills. SubQ claims 12M tokens of working context. The pattern: context engineering layers that sit between the user and the model, structuring information to minimize redundant reasoning, represent the fastest-growing value capture point.
[++] Multi-agent orchestration with drift detection. Scion ships container isolation. @om_patel5 identifies confidence drift. @artemg314's AutoSwarm lifts multi-agent pipelines from 30% to 90%. The gap between "agents running in parallel" and "agents running in alignment" creates opportunity for coordination tools with built-in divergence detection.
[++] Agent-native video and voice production. HyperFrames + Hermes, HeyGen + Superhuman Go, xAI Custom Voices, and Deepgram + Together all shipped on the same day. The convergence signals that multi-modal agent output (not just text) is becoming table stakes, with opportunities in quality control, brand consistency, and automated editing.
[+] One-person team infrastructure. Coinbase's restructuring around "one-person teams" and @code_rams's "3-agent solo founder stack" both point to infrastructure that enables a single operator to manage agent fleets across research, content, and operations. Tools that make this pattern reliable (monitoring, handoffs, quality gates) have growing demand.
[+] Decentralized agent work marketplaces. AgenC, ShelleyBay, GenLayer, and Handshake trading skills all represent independent attempts at on-chain agent task marketplaces. The pattern is validated across multiple teams; execution and trust infrastructure will determine winners.
8. Takeaways¶
-
The /goal command makes "executive plan" quality the new bottleneck. AlexFinn's 2577-score post and 424 bookmarks prove the community recognizes that multi-day agent runs are possible but depend entirely on structured planning documents -- shifting the premium skill from "write good code" to "write good plans for agents." (source)
-
Context engineering is delivering measurable 3x cost reductions today. @akshay_pachaar's before/after (10.4M tokens/$9.21 vs 3.7M tokens/$2.81) with zero errors proves that context engineering layers are not theoretical -- they produce immediate, quantifiable savings for anyone running agents at scale. (source)
-
The skill ecosystem has outgrown its trust infrastructure. With 13,700+ skills in OpenClaw, HyperFrames shipping for Hermes, and practitioners admitting trust is "vibes-based," the agent skill layer is scaling distribution faster than verification. The supply-chain security framing from @omarsar0 is the early warning before the first major skill-based exploit. (source)
-
Multi-agent orchestration moved from concept to containerized product. Scion's container isolation per agent, @daytradingzoo's frontend/backend split, and @om_patel5's "4-person team" (2 humans + 2 Claudes) all demonstrate that multi-agent is now a practical workflow pattern, not a research curiosity -- but drift detection remains unsolved. (source)
-
Coinbase's 14% cut signals that "AI-native org structure" is now a public-company operating thesis. One-person teams, flattened hierarchies, and agent fleets are no longer blog posts -- they are driving layoff decisions at a $60B+ company. This will accelerate enterprise demand for agent management and governance tooling. (source)
-
The agent video stack arrived in a single day. HyperFrames, HeyGen + Superhuman, xAI Custom Voices, and Deepgram STT on Together all shipped on May 5. Agents are no longer text-in/text-out -- multi-modal output is becoming a default capability, collapsing previously separate tool categories into single-skill installs. (source)