Twitter AI Agent - 2026-05-11¶

1. What People Are Talking About¶

1.1 Harness engineering is hardening into an operating discipline 🡒¶

The largest cluster of evidence said the hard part of agent work is now the system around the model: caching, routing, evals, observability, search, and context assembly. Compared with May 10, the language stayed similar but the posts on May 11 were more operational and less slogan-driven.

@akshay_pachaar published the strongest checklist of the day, explicitly putting harness engineering ahead of prompt engineering and naming prompt caching, semantic caching, KV-cache management, structured-output fallbacks, evals, cost attribution, guardrails, observability, routing, and fine-tuning tradeoffs as the core skill set (post link). A reply from @0xNeoArch sharpened the same point by adding prompt-injection defense, distillation, and knowing when not to use an LLM at all.

@aiDotEngineer narrowed that stack further by arguing that context engineering is "about 80% agentic search," then pointing people to a workshop on when shell tools, semantic search, general-purpose query execution, and skills each break (post link). @DataScienceDojo visualized the same mental model as a context pipeline that merges prompt, memory, vector search, web search, and a reviewer loop before and after the model call (post link).

Diagram showing context inputs, web search, merged context, model output, and reviewer loop as the core pipeline for agent quality

@hasantoxr turned the phrase into a concrete tool bundle by recommending Harness, PostHog, GrowthBook, Chaos Mesh, and Kubescape as repos to study (post link). The linked repos reinforce the systems angle: Harness presents itself as an open-source development platform with code hosting, pipelines, Gitspaces, and registries; PostHog bundles analytics, session replay, feature flags, experiments, and LLM analytics; and GrowthBook combines feature flags, experimentation, analytics, and an MCP server.

Discussion insight: The most useful replies did not disagree with the harness framing; they extended it. The pushback was that classic engineering concerns like search quality, injection defense, routing, and cost control still dominate, just in a probabilistic runtime.

Comparison to prior day: May 10 established harness engineering as a shared label. May 11 moved that label closer to an implementation checklist and a repo/tool buying list.

1.2 Agent operators want thinner surfaces, tighter scope, and live traces 🡕¶

A second theme was that people want agents to work in smaller, more legible environments. The common ask was not more raw autonomy; it was clearer contracts, narrower working sets, and direct evidence of what the agent is doing right now.

@ctatedev argued that fully agent-written frontends may work better if builders start from index.html, browser primitives, Web Components, and strict conventions for routing, rendering, state changes, and data handling instead of a heavyweight framework (post link). The replies added two concrete directions: @bytecrafter_1 said the harder contract is often between agents rather than UI primitives, and @noname_oni pointed to the W3C UI Specification Schema Community Group as a way to make UI structure machine-readable. The adjacent Arrow framework page fit the same instinct by stressing no build step, three core functions, and docs small enough for an agent to hold in context.

@pvncher made the scope problem concrete: opening a root with multiple unrelated projects and worktrees hurts coding agents because the model treats too much of the tree as relevant (post link). In replies, he described a case with 12 accessible worktrees that created 13 copies of files, wasted tokens, and throttled search performance.

@zaimiri supplied the clearest observability example by praising Hermes Agent for showing which skills and tools are firing in real time, after repeated experiences of OpenClaw appearing idle for minutes and then failing silently (post link).

Runtime trace showing skills, memory, todo updates, file reads, and writes emitted live while an agent works

@RhysSullivan extended the same theme into supply-chain control with a prompt that tells a coding agent to configure a three-day minimum release age for package installs, exempt workspace scopes, and verify the exact package-manager setting before writing config (post link).

Discussion insight: The replies repeatedly reframed "agent-friendly" as "debuggable." Users want protocols, ignore boundaries, and real-time traces that let them inspect a system, not just wait for a blob of output.

Comparison to prior day: May 10 emphasized thinner frontends and better scaffolding. May 11 added working-set discipline, live execution traces, and package-policy controls.

1.3 Skills are becoming portable assets with their own lifecycle 🡕¶

The skill conversation also got more concrete. Instead of only talking about marketplaces, builders showed how skills can be installed, synced, extracted from experience, versioned, and reused across agents.

@brian_lovin shared notion-skills, a tool that treats a Notion database like an app store for skills and syncs selected skills down to local agent folders for Claude, Codex, and others (post link). The public repo says it supports install, sync, publish, feed, feedback, and audit flows, and that the same skill can be symlinked into multiple agent CLIs. In replies, Brian said agents write most of his skills and that Claude Code and Notion can already talk through MCP plus CLI.

@tom_doerr highlighted AutoSkill, which the repo describes as experience-driven lifelong learning: extracting reusable skills from real interactions, archived conversations, documents, and trajectories, then evolving them through merge and version updates (post link). The README also notes a newly added local skill manager for triage, similar-skill search, and decisions such as discard, improve, merge, and create.

AutoSkill diagram showing query rewriting, skill retrieval, skill extraction, merging, and management decisions in a closed evolution loop

Discussion insight: The value proposition is shifting from "find more skills" to "keep the right skills current." The strongest evidence centered on sync, feedback, versioning, and extraction from real use rather than simple catalog size.

Comparison to prior day: May 10 surfaced marketplaces and catalogs. May 11 made skill operations look more like package management and lifecycle management.

1.4 Multi-agent memory and machine-payable rails are shipping as products 🡕¶

The final strong theme was productization. Several posts moved beyond theory and showed concrete interfaces, deployment metrics, and payment rails for agents that persist, collaborate, or transact.

@owenbjennings said Mongoose is nearly ready as a cloud multi-agent layer where "mongeese" share context from web, calendar, Slack, email, and docs, debate with each other, and compound memory on top of Goose OSS (post link). The attached shell screenshot makes that claim tangible by showing summon/build/skills/sentries/settings commands in one operator surface.

Mongoose shell interface showing summon, build, skill, sentry, and settings commands for coordinating a pack of agents

@vercel_dev pointed to Superset, a multi-agent IDE that the linked Vercel write-up says runs up to 12 agents in parallel with 1,000 to 1,400 deployments per week, roughly 600 preview deployments per day, and about 30-second average build times (post link; blog). The product page says each agent runs in its own isolated git worktree and works with any CLI agent.

@circle launched Circle Agent Stack as financial infrastructure for agents, built around Agent Wallets, an Agent Marketplace, and Circle CLI for repeatable financial actions under permissions and guardrails (post link). Circle's docs say the stack lets agents hold and transact tokens, discover and pay for x402 services, and operate within built-in compliance guardrails. In parallel, @graphprotocol said the Subgraph Gateway now accepts x402 payments for onchain-data queries, and The Graph's guide says agents can pay in USDC over HTTP with no API keys, accounts, or sessions (post link).

Circle marketplace view listing agent-accessible services, endpoint counts, and per-request prices

Discussion insight: The replies around Circle focused on the orchestration layer and CLI more than the wallet itself. That suggests the market sees repeatable, policy-bound actions as the real wedge, not just agent-held balances.

Comparison to prior day: May 10 had more discussion of governance and identity in principle. May 11 added published docs, a pricing surface, and operating metrics from shipping products.

2. What Frustrates People¶

Harness complexity keeps swallowing the model win¶

The most common frustration was that better models do not remove the need for stronger systems engineering. @akshay_pachaar listed caching, evals, routing, observability, and fallbacks as required skills, while @aiDotEngineer reduced context engineering to search and tool choice. @dair_ai added a research-backed failure mode: the linked PwC paper says goal clarification loses most of its value after the first 10% of execution, input clarification stays helpful longer, and late clarification can do worse than never asking at all. Severity: High. The workaround today is more harness work, not less.

Over-scoped workspaces and silent runtimes still waste time¶

People are still hitting failure modes that have nothing to do with model IQ. @pvncher described agents getting confused by roots with many unrelated projects and duplicate worktrees, and in replies said one setup had 13 copies of every file available to the model. @zaimiri complained that some agents appear idle for six minutes and only later reveal they did nothing. Severity: High. Current coping behavior is to narrow scope, add ignore rules, and demand live traces.

Safety and policy controls still lag autonomy¶

As agents take on more actions, operators want sharper defaults. @RhysSullivan framed package release-age controls as an obvious overdue safeguard against supply-chain attacks. @circle emphasized permissions and guardrails in financial actions, and Circle's docs present compliance and spending controls as part of the product. Severity: High. The workaround remains manual hardening and after-the-fact policy layers.

3. What People Wish Existed¶

Thin, agent-first UI and runtime conventions¶

@ctatedev asked for browser-native frontend conventions that agents can route and mutate reliably, while replies asked for systematized component definitions and pointed at the W3C UI Specification Schema Community Group. The demand is practical, not aspirational: people want smaller, more machine-readable surfaces than today's default framework stacks. Opportunity: direct.

Skill lifecycle tooling that compounds over time¶

The strongest need in the skills cluster is not just discovery. @brian_lovin wants shared install/sync/publish/feedback flows, and AutoSkill pushes toward automatic extraction, merging, and versioned evolution from actual interactions. What people seem to want is a package manager plus changelog plus feedback loop for agent capability. Opportunity: direct.

Persistent memory that stays useful without losing operator trust¶

@owenbjennings pitched persistent shared context across work and communication surfaces, while OpenHuman promises local SQLite-backed memory trees, an Obsidian-compatible vault, 118+ integrations, and 20-minute auto-fetch loops. The need is both practical and emotional: people want agents that remember enough to stay useful, but in a form they can inspect and control. Opportunity: direct.

Credential-light payment and service discovery for agents¶

Circle Agent Stack and The Graph's x402 Subgraph Gateway describe the same missing layer from two sides: agents need a standard way to discover services and pay per request without storing long-lived credentials. The Graph's guide is explicit that x402 access works with no API keys, accounts, or sessions, which makes it well suited to autonomous or short-lived processes. Opportunity: competitive.

4. Tools and Methods in Use¶

Tool	Category	Sentiment	Strengths	Limitations
Harness engineering / context engineering	Method	(+)	Gives builders a shared frame for caching, evals, routing, observability, guardrails, and cost attribution	Still broad enough that different people mean different layers by it
Agentic search	Retrieval method	(+)	Treats search and tool choice as the center of context assembly, not an afterthought	Tool selection can fail badly when the runtime picks the wrong abstraction
Browser primitives + Web Components + Arrow	Frontend method	(+/-)	Smaller surface area, no-build options, and simpler conventions that fit agent-written UIs	Pushes more contract design onto the builder; replies say protocol boundaries still matter more
Scoped roots / worktree filtering	Workspace method	(+)	Cuts token waste, duplicate-file confusion, and search slowdown in coding agents	Harder when work spans tightly coupled repos
Real-time tool and skill traces	Observability	(+)	Makes agents debuggable while they run instead of only after a timeout	Evidence is still ecosystem-specific and early
Notion Skills	Skill distribution	(+)	Shared store, selective installs, two-way sync, publishing, feedback, and audits across multiple agents	Early-stage tool tied to Notion as the source of truth
AutoSkill	Skill evolution	(+)	Extracts reusable skills from interactions, documents, and trajectories, then merges and versions them	Still research-leaning and dependent on good triage of what is truly reusable
Circle Agent Stack	Agent finance infrastructure	(+)	Adds wallets, marketplace discovery, CLI actions, and guardrails for repeatable money movement	Ecosystem-specific and early compared with general developer tooling
The Graph x402 gateway	Agent API access	(+)	Per-query USDC over HTTP with no API keys, accounts, or sessions	Best fit for x402-compatible ecosystems rather than universal service access
Clarification timing framework	Agent-eval method	(+)	Gives concrete guidance on when clarifying questions still help long-horizon agents	Research result, not an off-the-shelf runtime feature

Summary: Positive sentiment concentrated around tools that make agents easier to scope, inspect, and reuse. The clearest migration path is away from prompt-only thinking and toward context assembly, skill packaging, runtime visibility, and policy controls. Competitive pressure is splitting between open builder tooling on one side and managed payment/governance layers on the other.

5. What People Are Building¶

Project	Who built it	What it does	Problem it solves	Stack	Stage	Links
Notion Skills	@brian_lovin	Uses Notion as a shared skill store that syncs selected skills into local agent folders	Skill files fragment across machines, folders, and teammates	Notion, Node.js, symlinked `SKILL.md` installs	Alpha	repo, post
AutoSkill	ECNU-ICALK	Extracts reusable skills from interactions, docs, and trajectories, then evolves them	Good agent behaviors disappear after one-off sessions	`SKILL.md`, offline extraction, local skill manager, versioned skill updates	Alpha	repo, post
Mongoose	@owenbjennings	Cloud multi-agent orchestration with shared context and compounding memory	Assistants usually lose cross-surface context and team continuity	Goose OSS, shared-context orchestration, persistent memory	Alpha	post
Superset	Superset	Multi-agent IDE with isolated worktrees and live preview infrastructure	Parallel coding agents serialize when branches and environments queue up	AI SDK, AI Gateway, Blob, Fluid Compute, git worktrees	Shipped	site, blog, post
Circle Agent Stack	Circle	Gives agents wallets, marketplace discovery, and CLI-driven financial actions	Agents need machine-payable services and guarded money movement	Agent Wallets, Circle CLI, x402 services, USDC	Beta	docs, post
OpenHuman	tinyhumansai	Local-first personal agent with memory trees, Obsidian vault, integrations, and model routing	Most agents start cold and keep context fragmented across apps	SQLite, Markdown/Obsidian, OAuth integrations, TokenJuice, optional Ollama	Beta	repo, post
Voice ordering agent	@EstebanSuarez	Voice workflow that builds a cart, sends an order, captures email, and sends a receipt	Shows how voice agents can complete structured actions instead of just chatting	v0, Grok Voice Think Fast 1.0, Resend, six custom function tools	Alpha	post

The projects cluster into three repeated patterns. One group is building agent infrastructure that compounds: skills, memory, orchestration, and worktree isolation. Another group is trying to make agents financially capable with wallets, per-request pricing, and service discovery. A third group is using those primitives to show narrower applied workflows, such as voice ordering, where the point is reliable completion rather than conversation quality.

6. New and Notable¶

Recursive delegation is becoming a training target¶

@gneubig highlighted the new Recursive Agent Optimization paper, which trains agents to spawn and coordinate recursive sub-agents rather than treating delegation as a hand-written inference trick (post link; paper). The abstract says recursive agents trained this way can scale to tasks beyond a single context window and reduce wall-clock time relative to single-agent systems.

Clarification timing now has a quantitative curve¶

@dair_ai surfaced a PwC paper on clarification timing for long-horizon agents (post link; paper). The paper reports 84 task variants and 6,000+ runs, and says goal clarification loses most of its value after 10% of execution, input clarification stays useful much longer, and past the midpoint any clarification can underperform never asking.

Package-age policy is entering the coding-agent playbook¶

The minimum-release-age post from @RhysSullivan stands out because it turns a general supply-chain concern into a concrete operator habit for coding agents. The notable part is not the joke replies; it is that people increasingly expect agents to configure, verify, and enforce these hardening rules for them.

7. Where the Opportunities Are¶

[+++] Agent runtime control planes — The strongest evidence spans scoped roots, live tool traces, supply-chain policy, and financial guardrails. Builders want agents that can be supervised and constrained as much as they want agents that can act.

[+++] Skill lifecycle infrastructure — Notion Skills and AutoSkill both point to a missing layer for installing, syncing, reviewing, versioning, and automatically extracting reusable capability.

[++] Persistent memory systems with operator visibility — Mongoose and OpenHuman show demand for agents that warm up fast and remember across tools, but in a form users can inspect and steer.

[++] Credential-light service discovery and payments — Circle Agent Stack and The Graph x402 flow both point to a growing market for pay-per-use agent infrastructure without long-lived keys.

[+] Agent-first application conventions — Thin frontend stacks, UI-spec efforts, and no-build frameworks suggest an emerging opportunity to design application surfaces specifically for machine authorship and review.

8. Takeaways¶

The agent conversation stayed infrastructure-first. The day's highest-signal posts were about harnesses, search, traces, skills, and control surfaces rather than raw model comparisons. (source)
Legibility is becoming part of product quality. Posts about thinner frontends, scoped roots, real-time traces, and package-age controls all point to the same standard: users want to see and constrain what the agent is doing. (source)
Skills are being treated less like prompts and more like software artifacts. Install, sync, publish, feedback, extraction, and versioning all showed up as first-class behaviors in the strongest skill-related items. (source)
Productized multi-agent and payment rails are no longer hypothetical. Superset, Mongoose, Circle Agent Stack, and The Graph all supplied concrete interfaces or operating metrics instead of just category rhetoric. (source)