Skip to content

Twitter AI Agent - 2026-05-06

1. What People Are Talking About

1.1 Harness Engineering Becomes the Defining Agent Discipline πŸ‘•

@Vtrivedy10 published (81 likes, 6 replies, 2 quotes, 100 bookmarks, 4214 views) an 8-point manifesto on agent harness engineering: "You can outperform any default harness+model (including codex & claude code) on pretty much any Task by engineering the harness around it." Key claims: a "general purpose" agent does not exist (it is a tradeoff between customization time and performance), evals are a moat, and frontier closed models are "far too expensive for the large majority of tasks the world needs to do."

@oneill_c endorsed (97 likes, 8963 views) domain-specific model training at Harvey: "The best companies in the world are now using their domain expertise and real-world feedback loops to build evals and rewards (and thus eventually models) that allow them to go beyond what you can get with prompt- and harness-engineering of the closed source models." @itsandrewgao noted (10 likes, 1628 views): "very underrated how much of an improvement over base models harness engineering provides."

@tezos shared (25 likes, 899 views) a TezDev 2026 talk where @yurug observed that "most teams using AI agents are just adapting existing processes -- akin to putting a motor on a bicycle -- rather than rethinking workflows from the ground up. Instead, teams need to fix the harness."

Comparison to prior day: May 5 highlighted /goal command and multi-day runs as the practical embodiment of harness engineering. May 6 elevates the conversation to first principles -- @Vtrivedy10's manifesto treats harness engineering as a discipline with its own tradeoff curves, cost models, and evaluation systems, while @oneill_c signals that the frontier companies are moving past prompt engineering entirely into custom model training driven by harness-derived evals.

Discussion insight: @BetterSayAJ's reply crystallizes the tension: "maps closely to classic IR + systems thinking. once you fix the model, most gains come from retrieval, tooling, and eval loops. the 'model eats scaffolding' idea still feels incomplete in production settings." @Vtrivedy10 conceded that he expects "the model to eat way more scaffolding" while believing teams that "help agents prepare good tooling + context for themselves will have better products."


1.2 AWS Agent Toolkit Launches with 40+ Skills and 15,000 API Access πŸ‘•

@clare_liguori announced (193 likes, 6 replies, 3 quotes, 183 bookmarks, 12788 views) the AWS Agent Toolkit: "40+ skills, 3 agent plugins, Remote MCP server that agents can use to call all 15,000+ AWS APIs + run scripts, search docs, retrieve skills." The GitHub repo link points to github.com/aws/agent-toolkit-for-aws.

@awsdevelopers amplified (14 likes, 3634 views): "Your coding agent can now pull live docs, make real API calls through your credentials, and follow step-by-step playbooks for common tasks. Works with any MCP-compatible agent."

Comparison to prior day: May 5 featured individual tools (Insforge Skills, Hermes skills) as context-engineering layers for coding agents. May 6 marks a platform-level entry: AWS packaging 15,000 APIs as agent-accessible skills through a single MCP server represents the largest single-day expansion of available agent capabilities from a cloud provider.

Discussion insight: The 183 bookmarks on @clare_liguori's post (vs. 6 replies) indicates a strong "save for later" signal -- practitioners are bookmarking infrastructure tooling rather than engaging in debate, suggesting the value proposition is clear and the next step is implementation.


1.3 The Self-Driving Codebase Thesis Gains Enterprise Validation πŸ‘•

@grinich introduced (143 likes, 8 replies, 3 quotes, 118 bookmarks, 25466 views) their internal agent system Horizon: "These aren't just agents to write code on demand. They detect triggers, spin up secure sandboxes, gather dynamic context, open PRs, verify their work, unblock the next task, and learn from every failure." He framed the future: "software engineering may look more like swarms of event-driven agents, running continuously, with humans setting direction and reviewing the work. Ramp built Inspect. Stripe built Minions. Spotify built Honk."

Horizon system architecture showing event-driven agent swarms with secure sandboxes

Comparison to prior day: May 5's Coinbase restructuring announcement signaled the organizational side of AI-native engineering. May 6 provides the technical architecture behind such restructuring -- @grinich names four companies (Ramp, Stripe, Spotify, and his own) that have independently built continuous agent systems, validating the pattern beyond a single company's thesis.

Discussion insight: @varadh's reply "everyone is building the same legos!" and @grinich's rebuttal "it's more like building the warp drive!" captures the debate: is continuous-agent infrastructure commoditizing (everyone builds the same thing) or differentiating (the implementation details matter enormously)?


1.4 Skill Authoring Emerges as a Standalone Discipline πŸ‘•

@mattpocockuk floated (290 likes, 36 replies, 47 bookmarks, 6348 views): "Sounds mad, but maybe I should just make a course about writing great skills? Breaking down daily tasks into skills. Turning HITL tasks into AFK ones. Creating a working language with the agent." The 290 likes and 36 replies made this the most-engaged non-infrastructure post of the day.

@asmah2107 advised (10 likes, 304 views): "Writing the AI agent skills by hand (and not using AI) is great way for you to understand how the LLM will behave when that skill is invoked. Might save you from context debt in future!" @neo4j launched (7 likes, 229 views) Neo4j Agent Skills: "Your coding agent does not know Cypher 25 exists. It was trained before GQL alignment shipped. Neo4j Agent Skills put the right knowledge next to the right task."

@doodlestein shared (29 likes, 32 bookmarks, 1320 views) a "repo junk cleaner" prompt pattern, demonstrating skills as reusable prompt architectures for repository maintenance.

Comparison to prior day: May 5 discussed skill verification and supply chain security as a trust problem. May 6 shifts to skill authoring as a craft -- @mattpocockuk treating skills as a teachable discipline and @asmah2107 arguing for hand-writing them suggest the community is recognizing that skill quality (not just skill quantity) is the bottleneck.

Discussion insight: @carlospeix's reply captures the pedagogical challenge: "your skills are (very) good because you build them from experience, trial and error, and real-life use. That's difficult to grasp for learners." This suggests skills are tacit knowledge that resists codification -- the opposite of what a course format assumes.


1.5 Voice Agent Latency Hits Sub-200ms Production Threshold πŸ‘•

@kimmonismus declared (71 likes, 7 replies, 24 bookmarks, 9359 views): "Sub-200ms TTFA is the number that matters. Anything above ~300ms in a voice agent and you can feel the lag. Everything else is downstream of that," commenting on Inworld AI's Realtime TTS-2 launch.

@omooretweets reflected (27 likes, 1924 views) on a16z's voice agent thesis: "When we published our first voice agent thesis, the feedback looked like this -- enterprises having AI handle basic calls sounded crazy. Two years later, voice AI is consensus -- with healthcare, finance, insurance leading adoption."

@DeepgramAI announced (8 likes, 740 views) native integration on Together: "One platform, full voice-agent stack: Deepgram transcription, Together-hosted LLMs, Aura-2 TTS. Sub-3-second round trip on reference builds."

Comparison to prior day: May 5 featured xAI Custom Voice cloning and HeyGen + Superhuman as the video/voice layer. May 6 narrows to a specific technical threshold (sub-200ms TTFA) as the quality bar for production voice agents, with Inworld and Deepgram shipping products that meet it.

Discussion insight: @jason_haugh from the BPO side provided the business case: "anything past about a quarter second has callers registering it as artificial. They hang up before the agent recovers. Trust collapses fast at that line." @m13v_ identified the deeper failure mode: "the under-appreciated failure mode past 300ms is users start filling silence with extra words, which throws off asr endpointing and the model commits to a wrong intent."


1.6 HermesOS Free Tier and Workspace Expansion πŸ‘’

@Wayland_Six announced (79 likes, 6 replies, 51 bookmarks, 9024 views) HermesOS free tier going live: "No credit card required. No markup on AI costs (BYO key). No trial countdown. Persistent memory, browser automation, terminal access, tool use, scheduled tasks, Telegram/Discord/Slack/WhatsApp integrations."

@outsource_ celebrated (19 likes, 713 views) Hermes-Workspace crossing 3,400 stars: "HermesWorld is built directly into the workspace! Your AI agent gets a command center, memory, tools, files, terminal... and now a world to walk around in."

@RoundtableSpace argued (62 likes, 43743 views): "Hermes Agent is starting to feel like a different category of AI tool. Not just a chatbot or a coding assistant, but a server side agent that remembers workflows, builds skills, and gets better the more you use it."

Comparison to prior day: May 5 featured Hermes expanding through HyperFrames video and voice mode. May 6 focuses on accessibility -- free tier hosting and a 3D workspace environment lower barriers to entry, shifting from "what Hermes can do" to "who can access Hermes."

Discussion insight: @GumbiiDigital's skepticism ("ANYTHING attached to a crypto coin is a scam") and @Conso_xyz's challenge ("Different category or just better branding for the same limitations?") show that Hermes still faces credibility questions, particularly around its crypto-adjacent positioning.


1.7 Claude Managed Agents Gets Dreaming, Outcomes, and Multi-Agent πŸ‘•

@RLanceMartin summarized (25 likes, 1004 views) the day's Anthropic news: "raising Claude Code + API rate limits, SpaceX compute partnership (300 MW capacity), Claude Managed Agents gets dreaming, outcomes, multi-agent, & webhooks."

Claude Code with Claude news roundup including rate limits and SpaceX partnership

@adocomplete detailed (15 likes, 1783 views) the features: "Dreaming: reviews past sessions to improve agents over time. Outcomes: set a quality rubric, agent iterates until it passes. Multiagent Orchestration: parallel specialists. Webhooks."

Comparison to prior day: May 5's /goal command discussion focused on multi-day autonomous runs as the user-facing innovation. May 6 reveals the platform infrastructure behind it -- Dreaming (self-improvement between sessions), Outcomes (quality gates), and multi-agent orchestration move Claude from "tool you prompt" to "system that improves itself."

Discussion insight: The SpaceX partnership for 300 MW of compute capacity signals that Anthropic is preparing for the computational demands of always-on agent fleets -- dreaming and continuous execution require sustained inference, not burst capacity.


2. What Frustrates People

Token Cost Remains the Primary Agent Scaling Barrier

@RoundtableSpace demonstrated (78 likes, 13 replies, 50 bookmarks, 46174 views) the same dramatic cost difference highlighted yesterday: "One change cut Claude Code token usage by 3x. Before: 10.4M tokens, 10 errors, $9.21. After: 3.7M tokens, 0 errors, $2.81. Swapping Supabase for Insforge Skills + CLI as the context engineering layer." The 46K views and 13 replies indicate this is a persistent pain point, not a one-day story.

Demoware vs Production Agents Remains Unresolved

@databricks reiterated (56 likes, 20 bookmarks, 3198 views): "Most 'agentic AI' is still demoware. Data work and coding are clear exceptions." @LandonExplr replied: "Agentic AI works where outputs are verifiable. Data pipelines qualify. Everything else claiming 'agentic' is still demoware." @tezos echoed with @yurug's observation that teams are "putting a motor on a bicycle" rather than rethinking workflows.

Local Model Infrastructure Is Hardware-Gated

@0xSero showed (111 likes, 12 replies, 46 bookmarks, 5699 views) vllm-studio running a local coding agent, but @BopityBibity replied: "at home if you own a couple 5k gpu inference only machines." The gap between the promise of local agents and the hardware required to run them frustrates practitioners who cannot justify the capital expenditure.

Agent Specification Handoff Is Still Manual

@hasantoxr identified (104 likes, 70 bookmarks, 20572 views) the core problem: "Coding agents sped up what can be built, not what should be built. PMs still write specs in Google Docs. Agents still hallucinate intent. Engineers still ship the wrong thing." The handoff between human thinking and agent execution remains a friction point even as execution speed increases.


3. What People Wish Existed

Unified Virtual Filesystem for Agent Access

@guohao_li argued (8 likes, 487 views): "Agents don't need more sophisticated CLIs or agent skills. What they actually need is a unified virtual file system layer." He referenced @zechengzh's Mirage project: "1.1M+ lines of code. We rewrote bash from the ground up so cat, grep, head, and pipes work across heterogeneous services. S3, Google Drive, Slack, Gmail, GitHub, Linear, Notion, Postgres, MongoDB, SSH, and more, all mounted side-by-side as one filesystem."

Agent Persistence as Default Infrastructure

@Marktechpost covered (12 likes, 8129 views) CopilotKit's Enterprise Intelligence Platform: "The gap between a demo agent and a production agent is memory." The platform provides "Threads -- persistent session objects that capture generative UI, human-in-the-loop workflows, shared state, voice, files, and multimodal interactions across sessions and devices."

Anti-Bot Detection Before Agent Browsing

@kljukusa released (27 likes, 14 bookmarks, 844 views) a "/what-antibot" skill: "This skill uses HTTP requests to detect anti-bot security before your browser agent ever visits a site." The need: agents waste time and get blocked visiting sites without knowing what security they face in advance.

Secure Agent Payment Infrastructure

@Whitelist1Media outlined (61 likes, 3850 views) the security stack needed for agents managing funds autonomously: "Multi-factor approvals (2-of-3 or time-lock), max transaction limits per day, on-chain monitoring + alerting, keys in hardware or MPC wallets. Never in the prompt." The framing: "The risk is no longer the technology. It is prompt quality, limits and governance."


4. Tools and Methods in Use

Tool Category Sentiment Strengths Limitations
AWS Agent Toolkit Cloud agent SDK Positive 40+ skills, 15K API access, MCP-compatible, remote server AWS-only; new ecosystem
InsForge Skills + CLI Context engineering Positive 3x token reduction, zero errors, open source, local Requires CLI familiarity
Claude Managed Agents Agent platform Positive Dreaming, outcomes, multi-agent, webhooks, SpaceX compute Research preview features
Hermes Agent + HermesOS Agent framework Positive Free tier, persistent memory, browser automation, 3400+ stars Crypto-adjacent skepticism
vllm-studio Local inference Positive Full coding agent at home, supports SGLang, usability polish Requires expensive GPU hardware
Horizon (WorkOS) Internal agent system Positive Event-driven, sandboxed, self-verifying, learns from failures Internal; not generally available
Rezonant PM-to-agent bridge Positive Screen recording to PRD, integrates GitHub/Linear/Jira/Figma New; unproven at scale
Neo4j Agent Skills Domain skills Positive Cypher 25, driver patterns, GraphRAG, MCP, GDS Graph-specific domain
Mirage (Strukto) Virtual filesystem Emerging Unified VFS across S3/Drive/Slack/GitHub/etc, versioned workspaces 6-week-old project
DGX Spark Local hardware Positive 128GB unified memory, full CUDA stack, Nemotron 30B at 56 tok/s Ecosystem software not mature yet
Inworld Realtime TTS-2 Voice model Positive Sub-200ms TTFA, 100+ languages, conversational awareness New; limited production data
CopilotKit Enterprise Agent persistence Emerging Threads, persistent sessions, CLHF roadmap Enterprise pricing; roadmap features

The standout shift: the tools conversation has moved from "which context engineering layer saves tokens" (May 5) to "which platform provides the complete agent lifecycle" -- AWS launching 15K API access, Anthropic adding dreaming/outcomes, and WorkOS naming four companies with internal agent systems signal that the infrastructure layer is consolidating around full-lifecycle platforms.


5. What People Are Building

Project Who built it What it does Problem it solves Stack Stage Links
Horizon @grinich / WorkOS Event-driven agent swarms for continuous codebase maintenance Agents that only work on-demand miss triggers and context Secure sandboxes, git, PRs, failure learning Internal post
AWS Agent Toolkit @clare_liguori / AWS 40+ skills + MCP server for 15K AWS APIs Agents cannot access cloud infrastructure natively MCP, remote server, agent plugins Shipped post
Mirage (Strukto) @zechengzh Unified VFS mounting S3/Drive/Slack/GitHub/Postgres as one filesystem Agents cannot access heterogeneous services uniformly Custom bash, VFS, workspace versioning Shipped post
Rezonant @RezonantAI Screen-recording to structured PRD for coding agents PM-to-agent spec handoff is lossy and manual GitHub/Linear/Jira/Figma/Notion integrations Shipped post
vllm-studio @0xSero Full local coding agent UI on top of vLLM Coding agents depend on cloud APIs and monthly fees vLLM, SGLang, local GPU inference Shipped post
HermesOS Free Tier @Wayland_Six Free persistent Hermes agent hosting with BYO key Agent hosting requires credit cards and per-use fees Hermes, NousResearch, cloud deploy Shipped post
/what-antibot Skill @kljukusa Detects anti-bot security before browser agent visits a site Browser agents get blocked without warning HTTP probing, open source Shipped post
Neo4j Agent Skills @neo4j Cypher 25, driver patterns, GraphRAG skills for coding agents Agents trained before GQL alignment cannot use new syntax Skills SDK, 5 language drivers Shipped post
x402B @BosonProtocol Secure escrow scheme extending x402 to high-value agent commerce x402 push payments have no recourse for failed delivery Smart contracts, NFT forward contracts, CertiK-audited Shipped post
4lpha Multi-Agent @4lpha_agent Three-agent trading flow: Safety, Social, Gatekeeper Single-agent trading misses safety/social/decision layering LLMs, token analysis, multi-agent pipeline Shipped post
Goblintown Protocol @0xbl33p Multi-agent orchestration with DAG decomposition, debate, specialist spawning Tasks need adversarial verification between agents OpenAI, DAG orchestration, Solana Shipped post
MCP Memory Service @tom_doerr Persistent shared memory for multi-agent AI pipelines Agents lose shared state between sessions and across pipelines MCP, memory layer Shipped post

6. New and Notable

Anthropic Partners with SpaceX for 300 MW Compute Capacity

@RLanceMartin reported (25 likes, 1004 views) that Anthropic is partnering with SpaceX for 300 MW of compute capacity alongside raising Claude Code and API rate limits. The compute scale signals preparation for always-on agent fleets that require sustained inference (dreaming, continuous execution) rather than burst capacity for individual queries.

DGX Spark Identified as Underexplored Agent Hardware

@sudoingX argued (74 likes, 27 bookmarks, 6201 views) that NVIDIA's DGX Spark is "the silent king nobody is talking about: 128gb unified memory in a form factor that fits on a desk corner, full cuda stack, runs nemotron 30b q8 at 56 tok/s on hermes agent." He identified the opportunity: "nobody has written custom kernels for this specific silicon yet. that is an openlane for builders."

DGX Spark hardware for portable AI agent workstations

ServiceNow Hits $1B in AWS Marketplace Transactions

@WisemanCap reported (97 likes, 8963 views): "ServiceNow reaches $1 billion in AWS Marketplace transactions -- launches data foundation for autonomous AI -- expands Build Agent to work with major AI coding tools." This confirms enterprise-scale agent spending is materializing in procurement data.

Gallatin AI Wins National Security Hackathon with Agentic Radio

@AtomsNotBits reported (12 likes, 575 views) that Gallatin AI won first place at Cerebral Valley's National Security Hackathon with "an agentic radio agent built on Palantir's Maven Smart System in 24 hours." A team of three built the system that processes battlefield radio communications -- agent capabilities entering defense and intelligence domains.

Deel Launches Akai: 91,000 Manual Hours Saved Monthly

@matiii noted (33 likes, 3854 views) the Deel announcement: "100% of Deel's operations teams, across Finance, Tax, Treasury, Benefits, and HR, run on Akai. 100,000+ cases handled automatically every month. 91,000+ manual hours saved every single month." Every team now builds agents themselves without developers.


7. Where the Opportunities Are

[+++] Agent harness engineering as a service. @Vtrivedy10's manifesto, @oneill_c's domain-specific model training at Harvey, and @tezos's "fix the harness" talk all converge: teams need help going from default agent configurations to task-optimized harnesses. The 3x token reduction from Insforge proves the ROI. Whoever packages harness optimization (evals, skills, tool selection) as a managed service captures the gap between "agent works" and "agent works efficiently."

[+++] Platform-level agent skill distribution. AWS launched 40+ skills with 15K API access. Neo4j launched domain-specific skills. @mattpocockuk is considering teaching skill authoring as a course. The pattern: skills are the new packages, and the distribution/discovery layer is wide open. A "npm for agent skills" with quality signals, verification, and dependency management has no clear winner.

[++] Agent persistence and memory infrastructure. CopilotKit's Enterprise Intelligence Platform, @tom_doerr's MCP Memory Service, HermesOS's persistent agents, and @GithubProjects's shared workspace all target the same gap: agents that retain context across sessions. The "demo to production" gap is increasingly defined by memory, not capability.

[++] Voice agent infrastructure below 200ms TTFA. @kimmonismus set the bar, @jason_haugh confirmed it from the BPO side, and Inworld shipped a model that meets it. The opportunity: voice agent quality control tooling, latency monitoring, and fallback systems that prevent the "compensation loop" failure mode @m13v_ identified.

[+] Local-first agent hardware ecosystem. @sudoingX identified DGX Spark as underexplored, @0xSero shipped vllm-studio for local coding agents, and @Vtrivedy10 argued open model harness engineering will "take off" as teams map costs to ROI. Custom kernels, optimized inference stacks, and turnkey local agent setups have growing demand from cost-sensitive teams.

[+] Agent-to-agent commerce and escrow. Boson Protocol's x402B, Bankr's self-funding agents, Swarms Marketplace's $30K hackathon, and 4lpha's multi-agent trading all point to agents that transact with each other. The trust and settlement infrastructure for agent-to-agent commerce is nascent and fragmented.


8. Takeaways

  1. Harness engineering has graduated from technique to discipline. @Vtrivedy10's 8-point manifesto, @oneill_c's endorsement of domain-specific model training, and @tezos's conference talk all frame harness engineering as a structured practice with its own tradeoff curves -- no longer a collection of tips but a field with evaluation systems, cost models, and career paths. (source)

  2. AWS entering with 15,000 API-accessible skills via MCP changes the skill distribution game. @clare_liguori's launch of the AWS Agent Toolkit is the largest single-day expansion of agent-accessible capabilities from a cloud provider. With 183 bookmarks and minimal debate, practitioners are treating this as infrastructure to adopt, not a thesis to argue. (source)

  3. The "self-driving codebase" is no longer a thought experiment -- four companies named their implementations. @grinich listing Ramp (Inspect), Stripe (Minions), Spotify (Honk), and WorkOS (Horizon) as independent builds of continuous agent systems confirms the pattern has reached escape velocity in engineering organizations. (source)

  4. Skill authoring is becoming a teachable craft, not just a side effect of agent use. @mattpocockuk's 290-like post about a skill-writing course, @asmah2107's advice to write skills by hand, and Neo4j shipping domain-specific skills all signal that "skill author" is emerging as a distinct role with learnable practices. (source)

  5. Sub-200ms voice latency is now the production threshold -- anything above loses callers instantly. @kimmonismus set the number, @jason_haugh confirmed it from BPO operations, and @m13v_ identified the deeper failure mode (user compensation loops breaking intent detection). Voice agents above 300ms are not just slow -- they create cascading errors. (source)

  6. Anthropic's SpaceX compute deal and Dreaming feature signal always-on agent infrastructure. 300 MW of compute capacity plus a feature that "reviews past sessions to improve agents over time" means Claude is being architected for continuous operation, not request-response. The economics of agent hosting are shifting from per-call to per-agent-lifetime. (source)