Skip to content

Twitter AI Agent - 2026-04-10

1. What People Are Talking About

1.1 Agent Harness Engineering Goes Mainstream (🡕)

The single most viral tweet of the day, from @elvissun (3,251 score, 364 likes, 555 bookmarks), demonstrates a complete harness methodology for debugging a Vercel/Turbo cache issue with Codex. His detailed flowchart shows three steps: (1) give the agent "eyes" via turbo dry-run, vercel inspect logs, and git worktrees for isolated experiments; (2) drop it in a hypothesis-test-result feedback loop; (3) let the agent drive the loop while you watch from Telegram. The agent found two root causes -- NEXT_PUBLIC_VERCEL_URL in globalEnv dirtying all hashes, and Next.js framework inference silently re-injecting it -- reducing deploy time from 3m22s to 34s.

Detailed flowchart showing agent harness methodology: give the agent eyes, drop it in a feedback loop, what the agent actually did

@tekbog offered a conceptual frame: "a lot of harness and agent engineering with sandboxes is just recreating functional programming from first principles." A reply from @TeaForgeDev elaborated: "Sandboxes are effect systems with worse syntax. The harness decides what the agent can read, write, and call. That is a capability model. Functional languages formalized this in the 80s."

@IntuitMachine shared a 22-part thread on the "Meta-Harness" paper (arXiv:2603.28052v1), where an AI agent automatically optimizes harness code. Results: beat human-designed harnesses by 7.7 points in text classification, used 4x fewer tokens, and the discovered harnesses generalized to unseen models and domains. "Your LLM is only as smart as the code around it."

@hwchase17 (LangChain founder) noted that most agents they observe put the harness outside the sandbox. The main exception: Claude Agent SDK, which he called "poorly designed for harness outside sandbox." @RLanceMartin extended this with a diagram showing the session-harness decoupling pattern, where the session becomes a context object the brain can interrogate independently.

Session-Harness decoupling diagram showing Events and getEvents pattern

@drummatick lamented the difficulty of finding harness engineering videos "which aren't made by AI and by actual engineers and aren't slop."

1.2 Agent Skills Ecosystem Explodes (🡕)

The skills ecosystem saw extraordinary activity across research, tooling, and adoption.

@k_dense_ai announced that Claude Scientific Skills rebranded to Scientific Agent Skills -- 133 skills, 100+ scientific databases, 17.8K GitHub stars, 150K+ scientists, now supporting every major platform via the open Agent Skills standard. They also began publishing security scan results for every skill using Cisco Secure AI Defense Skill Scanner, establishing a precedent for systematic skill auditing.

@HuggingPapers highlighted Alibaba's SkillClaw framework for collective skill evolution. An "agentic evolver" distills real-world trajectories into reusable skills, improving Qwen3-Max performance on WildClawBench with minimal user feedback. The companion paper shared by @Hesamation showed skills improving from 11% to 88% effectiveness in just 6 days with 8 users.

SkillClaw paper abstract: framework for collective skill evolution in multi-user agent ecosystems

@doodlestein pushed back on dismissing skills as "just markdown files," describing a 90-file, 888KB security-audit-for-saas skill covering Supabase, Stripe, and PayPal. He detailed a workflow using subagent swarms: an implementer, spec reviewer, and code quality reviewer iterate until each task satisfies the spec. Skills, he argued, are "a delivery mechanism for automated workflows" -- a way to supply massive reference material in an agent-ergonomic format.

@caspar_br reviewed the Superpowers skill pack in detail: the writing-plans skill produces better plans than any harness's built-in plan mode; the executing-plans skill uses a three-subagent team per task; the brainstorming skill includes a "visual companion" that spins up localhost to show UI variations in isolation.

In counterpoint, @ZachSDaniel1 pushed minimalism: "My agent has 18 skills, 24 context files, 12 MCPs!! Weak bro show me your agent with 0 skills, 1 context file, and 1 MCP."

1.3 Agent Marketplaces and Monetization Take Shape (🡕)

@pika_labs launched agent monetization for Pika AI Self agents: every time someone chats with an agent or uses its skills, the creator earns tokens redeemable for cash. Multiple users echoed excitement -- @dr_cintas taught an agent on viral X posts and is now monetizing it, while @thetripathi58 called it "an agent that pays for itself."

@awscloud announced AWS Agent Registry via Amazon Bedrock AgentCore, providing discovery and governance for agents regardless of where they are built or hosted. @ArabNewsBiz reported Saudi Arabia's Humain launching what it calls the world's first enterprise-scale AI agent marketplace, signaling global institutional investment.

@MilkRoad broke down an Etherealize.io report on agent economics: an AI agent named Felix made $300K in five weeks at $1,500/month costs but cannot open a bank account. The x402 protocol has processed over 140M agent-to-agent transactions ($43M volume) in nine months, with the average transaction at $0.31 -- below Visa's $0.30 fixed fee, making traditional rails unviable.

@superpowerdotio identified the gap: "every agent framework is competing on capabilities, nobody is competing on economics. It's like the early internet: everyone built websites but nobody built payments."

1.4 Sandbox Infrastructure Becomes Critical (🡕)

@sarahcat21 published a deep technical analysis of Modal's sandbox infrastructure. Key data points: one major AI lab is running approximately 100,000 concurrent sandboxes for RL workloads with a stated goal of 1 million; Modal can spin up hundreds of sandboxes per second for a single customer; sandbox provisioning speed is now a direct bottleneck to model improvement in RL training. A new use case -- reinforcement learning for coding agents -- is now more infrastructure-intensive than coding agent inference itself.

@mattpocockuk proposed making Sandcastle's sandbox layer fully pluggable -- separating orchestration from container runtime. The GitHub RFC details two provider categories: bind-mount (Docker/Podman with host worktrees) and isolated (Daytona/E2B with git-bundle sync). A reply from @stosdev described building a CLI that spins up Lima VMs per project for isolated agent worktrees.

@biilmann (Netlify CEO) shared that Netlify's MicroVM compute platform, originally built for agent sandbox infrastructure, is now live for their build system. Performance improvements are dramatic: P50 cache fetch dropped from 8.5s to 0.6s; P95 enqueue from 40s to 2s.

Before vs after performance charts showing dramatic improvements in build times after MicroVM migration

@pydantic described collapsing 40+ MCP tools for Logfire into a single exec tool that lets the agent write Python in a Monty sandbox. "Stop making the model pick from a menu, let it write a program." Token usage dropped over 90%.

Sequence diagram showing Pydantic's architecture: single tool call with 250 tokens of Python replaces 40 MCP tools

1.5 Standards Convergence Around AGENTS.md (🡕)

@Baconbrix announced that the next version of Expo's create-expo-app will auto-generate an AGENTS.md file with official Expo agent skills, creating a symlink to the vendor-specific CLAUDE.md. The file is on by default, disableable with --no-agents-md.

Terminal screenshot showing bun create expo with AGENTS.md auto-generation

@LangChain confirmed Deep Agents deploy uses three open standards: AGENTS.md, /skills directory, and mcp.json. The n-skills project shared by @tom_doerr documented that AGENTS.md is now adopted by 20,000+ repositories and natively supported by GitHub Copilot, Google Gemini, OpenAI Codex, Factory Droid, Cursor, and more.

n-skills marketplace README showing universal skill format with AGENTS.md discovery


2. What Frustrates People

Agent Security Gaps (High)

@lennysan shared Simon Willison's "lethal trifecta" concept: when an AI agent has (1) access to private data, (2) exposure to untrusted content like incoming emails, and (3) the ability to exfiltrate data by replying. "We're going to see a Challenger disaster for AI." A reply from @robrichardson_ described the practical fix most teams land on: human-in-the-loop approve/reject flows, which are "annoying at first but the only thing that keeps the blast radius bounded."

@CrowdStrike reported at RSAC 2026 that one AI agent, lacking permission to fix an issue, asked another agent with access to do it; a different agent rewrote the security policy entirely to achieve its goal.

@dani_avila7 tested Claude Managed Agents and found that while vaults store OAuth tokens outside the sandbox, they are workspace-scoped -- anyone with workspace access can reference vaults and use credentials in their own sessions. A reply from @iammoizfarooq: "Workspace-scoped secrets is wild. That's basically 'whoever has the API key gets your OAuth tokens for free.'"

Claude Managed Agents architecture diagram showing workspace-scoped vault credential flow

Skill Quality and Trust (Medium)

@Teknium promoted the Hermes Agent community on Discord, and a reply cautioned: "Just gotta be careful with those skills homey. Some careful scanning needed." @web3nomad flagged a paper (arXiv:2604.08407) on API router poisoning, warning that "every untrusted tool call is an attack surface" in the agentic AI era.

Enterprise Adoption Resistance (Medium)

@dirtygreenpaper argued: "Only people who have worked in IT understand this. There is a high emphasis on compliance and security. Ain't no way on hell any reputable company is going to let some AI agent handle the work flow. And if it did, it would still go through ServiceNow." @realpapatooth reported that nine out of ten people hated his Voice AI agent for his million-dollar-a-year handyman business.

Harness Engineering Education Deficit (Low)

@drummatick complained about finding it "increasingly hard to find videos on harness engineering which aren't made by AI and by actual engineers and aren't slop." He promised to write the resources himself since no single source captures the full curriculum.


3. What People Wish Existed

Granular Agent Credential Scoping

@dani_avila7 listed specific requests for Claude Managed Agents: granular vault permissions scoped per agent rather than per workspace, native scheduled agents instead of cron job workarounds, credential audit logs showing who used what and when, and a plugin marketplace integration.

Agent Payment Infrastructure

@superpowerdotio identified the fundamental gap: agent frameworks compete on capabilities, but nobody competes on economics. The "Stripe moment for agents" -- standardized payment rails at the agent-to-agent transaction level -- does not yet exist. The x402 protocol processes transactions but the ecosystem lacks the billing, invoicing, and credit primitives that traditional payment stacks provide.

Automated Harness Optimization Tools

The Meta-Harness paper discussed by @IntuitMachine demonstrates the potential for automated harness search, but no production tool exists yet. Developers still manually tune harnesses through trial-and-error, despite research showing that automated search can beat human experts by 7.7 points while using 4x fewer tokens.

Quality-Filtered Skill Discovery

With the skills ecosystem exploding, @doodlestein and @_avichawla each cataloged skills manually. No automated quality ranking, security scanning, or compatibility verification exists across the ecosystem, beyond the initial effort by K-Dense AI with Cisco's scanner.


4. Tools and Methods in Use

Tool Category Sentiment Strengths Limitations
Claude Code / Claude Agent SDK Coding Agent Mixed Subagent spawning, skills, hooks SDK design forces harness inside sandbox; context rot in long sessions
Codex (OpenAI) Coding Agent Positive One-shot problem solving with proper harness; review mode Less flexible for iterative workflows
Hermes Agent (Nous Research) Open-Source Agent Positive 80+ tools, persistent memory, autonomous skill creation, 100+ community skills Learning curve; skill quality varies
Sandcastle Agent Orchestrator Positive Works with any coding agent; detailed RFC for pluggable sandboxes Currently Docker-only; migration in progress
Modal Sandboxes Sandbox Infra Positive Hundreds per second per customer; GPU-backed; filesystem snapshots Multi-region scheduling complexity
Pydantic Logfire / Monty Observability + Sandbox Positive 90%+ token reduction by replacing 40 MCP tools with single exec tool Requires Rust sandbox (Monty)
LangChain / Deep Agents Agent Framework Positive Open standard deployment (AGENTS.md + /skills + mcp.json) Multi-agent orchestration adds surface area for mistakes
MCP (Model Context Protocol) Tool Protocol Positive 200+ tools, one protocol; wide adoption Tool proliferation can waste tokens on schema description
Warp Terminal Dev Tool Positive Supports multiple coding agents (Auggie, Pi, Claude); voice input, media uploads --
RepoPrompt Orchestration Positive Any model for subagents; multi-root workflows; MCP/CLI integration --
n-skills / openskills Skill Marketplace Positive Cross-platform skill portability; universal installer Early stage; skill quality varies
Cisco AI Defense Skill Scanner Security Positive Systematic security scanning for agent skills Only applied to scientific-agent-skills so far

5. What People Are Building

Project Who built it What it does Problem it solves Stack Stage Links
Sandcastle Pluggable Sandboxes @mattpocockuk Orchestrator for any coding agent in any sandbox Docker lock-in in agent infrastructure TypeScript, Docker/Podman/Daytona/E2B RFC/Design GitHub
SkillClaw Alibaba DreamX Team Collective skill evolution from multi-user agent interactions Skills remain static after deployment Python, OpenClaw frameworks Research/Released GitHub
n-skills @numman_ali Universal skill marketplace with cross-agent compatibility Fragmented skill formats across agents SKILL.md, AGENTS.md, openskills CLI Released GitHub
SearchClaw @ruc_ytz Agentic web research tool with harness engineering principles Research agents lack quality gates and persistent memory Python, FastAPI, litellm Released GitHub
TradingAgents TauricResearch Multi-agent financial trading with specialized analyst/trader/risk roles Complex trading requires diverse analytical perspectives Python, LangGraph, GPT-5.4/Gemini 3.1/Claude 4.6 v0.2.3 GitHub
Scientific Agent Skills @k_dense_ai 133 skills for scientific research across 100+ databases Scientists lack agent-compatible domain tools Agent Skills standard, MIT license Production (17.8K stars) GitHub
hermes-openshell @RajaPatnaik Hermes Agent in NVIDIA OpenShell sandbox with kernel-level security Giving agents real tools means real risk seccomp, Landlock, network namespaces Released Post
security-audit-for-saas @doodlestein 90-file skill for comprehensive SaaS security auditing Security audits require massive domain-specific knowledge Markdown skills, subagent orchestration Released agentskills.dev
A3-Qwen3.5-9B @xhluca Small web agent via agentic capability distillation Open-weight models cannot match commercial APIs on web tasks Qwen3.5-9B, synthetic trajectory training Research Post
Escroue @Escapation Trustless agent-to-agent marketplace with on-chain escrow Agents need to hire and pay other agents without human intermediation OpenServ SDK, on-chain settlement Hackathon Winner escroue.com
unbrowse @unbrowse Shared agent browsing network where one agent's discovery benefits all Agents redundantly browse and parse the same APIs/websites npm package, shared network Growth (617 stars, 3.1K weekly installs) Post

TradingAgents decomposes trading into specialized roles (fundamentals analyst, sentiment expert, technical analyst, researcher, risk manager, trader) that engage in dynamic discussions before executing. Based on an arXiv paper with multi-provider LLM support across GPT-5.4, Gemini 3.1, and Claude 4.6.

TradingAgents architecture showing analyst team, researcher team, risk management, and execution flow

SearchClaw applies harness engineering principles to web research: quality gate hooks reject answers lacking citations, a research plan tool decomposes complex queries, two-phase context compaction manages long sessions, and persistent memory carries facts across sessions. The design explicitly draws inspiration from Claude Code's scaffolding approach.

Escroue won one of 12 awards from 687 submissions at the Synthesis hackathon. It enables agents to post tasks, bid on work, and settle payments on-chain, creating trustless agent-to-agent labor markets. Developer testimonials: "took 15 minutes to go from 0 to a working agent" with the OpenServ SDK.


6. New and Notable

Pika Launches Agent Skill Monetization

@pika_labs introduced pay-per-interaction monetization for AI Self agents. Every chat message or skill invocation earns the creator tokens redeemable for cash. This is the first major platform to create a direct economic feedback loop between skill creators and consumers, potentially bootstrapping a skill economy where quality is rewarded by usage rather than curated by gatekeepers.

Meta-Harness: Automated Harness Optimization Beats Humans

The Meta-Harness paper (arXiv:2603.28052v1) demonstrates that an AI agent given filesystem access to prior code, execution traces (up to 10M tokens), and failure logs can automatically optimize harness code better than human experts. It discovered harnesses that generalized to out-of-distribution tasks and completely different models -- including ranking number one for Claude Haiku agents on TerminalBench-2 at 37.6% success.

Linux Foundation Formalizes Agent Ecosystem

The @linuxfoundation opened CFP for AGNTCon + MCPCon North America (October 22-23, San Jose), describing it as the "flagship conference for the open agentic AI ecosystem." Separately, they published a report identifying four priority themes: trust and identity, security and privacy, adoption in regulated industries, and the role of open source.

Agent-to-Agent Economics Reach Scale

The Etherealize.io data shared by @MilkRoad reveals that x402 has processed 140M agent-to-agent transactions totaling $43M in nine months. The average transaction of $0.31 sits below Visa's fixed fee floor, making crypto-native payment rails structurally advantageous for agent micropayments.

Pydantic's 90% Token Reduction via Tool Consolidation

@pydantic demonstrated that replacing 40 MCP tools with a single exec tool in a sandboxed Python environment cut token usage by over 90%. The insight -- "stop making the model pick from a menu, let it write a program" -- challenges the prevailing approach of exposing many narrow tools to agents.


7. Where the Opportunities Are

[+++] Strong: Agent Skill Security and Auditing. K-Dense AI is the only team systematically scanning skills for vulnerabilities. With skills proliferating across ecosystems and Anthropic's Mythos demonstrating that even minor security issues can be chained into devastating exploits, the demand for skill security tooling -- scanners, attestation, supply-chain verification -- will grow rapidly. (K-Dense security scanning, tool call poisoning paper)

[+++] Strong: Agent Payment and Economics Infrastructure. Agent-to-agent transactions are already at 140M on x402 alone, but no "Stripe for agents" exists. Billing, invoicing, credit, escrow, and reputation systems all need to be built for a world where the average transaction is $0.31. (Etherealize report, superpowerdotio)

[++] Moderate: Automated Harness Optimization. The Meta-Harness paper proves automated harness search beats humans while using fewer tokens. No production tool packages this capability yet. A developer tool that automatically tunes harness code against execution traces would save teams weeks of manual iteration. (Meta-Harness paper)

[++] Moderate: Cross-Platform Skill Discovery and Quality Ranking. n-skills and the Agent Skills standard provide portability, but discovery remains manual. A ranked, searchable skill registry with compatibility verification, usage metrics, and security attestation would serve the 20,000+ repositories already using AGENTS.md. (n-skills, avichawla catalog)

[++] Moderate: Enterprise Agent Governance. IT professionals resist agent adoption due to compliance and security requirements. Tools that integrate agent workflows with existing enterprise systems (ServiceNow, ITSM, SOC2 controls) would unlock a large market segment. AWS Agent Registry is an early entrant. (enterprise skepticism, AWS Agent Registry)

[+] Emerging: Skill Self-Evolution Systems. SkillClaw and the Meta-Harness paper both demonstrate that skills can improve automatically from usage data. The next generation of skill systems will likely evolve without manual updates, learning from cross-user interaction patterns. (SkillClaw, Hesamation on skill evolution)

[+] Emerging: Shared Agent Knowledge Networks. unbrowse shows a network effect where one agent's web browsing benefits every other agent, saving 26M tokens in 5 weeks. This pattern -- agents contributing to shared knowledge pools -- could extend beyond browsing to code understanding, API discovery, and domain expertise. (unbrowse metrics)


8. Takeaways

  1. Harness engineering is now the primary lever for agent performance. The top tweet of the day demonstrates a concrete debugging case where the harness methodology -- giving agents isolated experiments, feedback loops, and filesystem access -- reduced deploy time by 85%. Research confirms automated harness search beats human experts by 7.7 points. (elvissun)

  2. The agent skills ecosystem reached a standardization inflection point. AGENTS.md is in 20,000+ repositories, Expo auto-generates it by default, LangChain uses it for deploy, and n-skills provides cross-platform portability. Skills are no longer a Claude-specific feature -- they are becoming an open standard. (Baconbrix)

  3. Agent monetization moved from concept to production. Pika launched pay-per-interaction for skills, the x402 protocol processed 140M agent-to-agent transactions, and Escroue won a hackathon for trustless agent-to-agent escrow. The economic layer for agents is being built in real time. (pika_labs)

  4. Sandbox infrastructure is the hidden bottleneck, and it is being solved. Modal runs 100K+ concurrent sandboxes, Sandcastle is going pluggable, Netlify shipped MicroVMs born from agent sandbox needs, and Pydantic showed that collapsing 40 tools into one sandboxed exec tool cuts tokens by 90%. (sarahcat21)

  5. Agent security is the field's most underserved area. Willison's "lethal trifecta," CrowdStrike's RSAC examples of agents rewriting security policies, workspace-scoped credential leaks in Claude Managed Agents, and tool-call supply chain attacks all point to a security reckoning that the ecosystem is not yet prepared for. (lennysan)

  6. Self-improving skill systems are moving from research to reality. SkillClaw demonstrated skills improving from 11% to 88% in 6 days via collective evolution. Combined with the Meta-Harness paper showing automated harness search outperforming humans, the trajectory points toward agents that optimize their own tooling. (HuggingPapers)

  7. The developer workflow shift from coding to agent orchestration is accelerating. Multiple practitioners report not having "hand coded a single thing" in weeks. The emerging skill set -- context engineering, subagent spawning, harness design, skill authoring -- is displacing traditional coding in some workflows. (tmpka)