Twitter AI Agent - 2026-05-28¶
1. What People Are Talking About¶
1.1 Harness engineering solidified into a named systems discipline 🡕¶
The clearest shift on May 28 was that harness engineering stopped reading like loose jargon and started showing up as a shared map for how agent systems are actually built. At least four retained items supported this theme.
@_vmlops shared (53 likes, 2 replies, 3,907 views, 73 bookmarks) the awesome-harness-engineering collection. The public repo defines harness engineering as the scaffolding around an agent - context delivery, tool interfaces, planning artifacts, verification loops, memory systems, and sandboxes - and organizes the space into concrete primitives rather than model hype.

@koylanai said (33 likes, 4 replies, 1,243 views, 35 bookmarks) that a new survey paper reviewed 170+ open-source projects and production lessons from OpenAI, Anthropic, and LangChain. The attached figure is the most useful part: it explicitly separates prompt engineering, context engineering, and harness engineering, and adds a seven-layer taxonomy for execution environment, tool interface, context management, lifecycle/orchestration, observability, verification, and governance.

@MrAhmadAwais argued (43 likes, 2 replies, 4,475 views, 8 bookmarks) that Claude Code still has a major tool-call problem for open models, and that most of the fix lives in harness repairs rather than the model itself. That paired cleanly with the public Outcome School explainer, which describes the harness as the control layer that manages tools, memory, errors, evaluation, and guardrails around the model.
Discussion insight: The strongest disagreement came from people like @ricklamers, who countered (4 likes, 1,154 views, 6 bookmarks) that simpler systems can sometimes generalize better than heavily engineered harnesses. Even that critique still accepted the day's premise: the argument was over how much harness is enough, not whether the harness matters.
Comparison to prior day: May 27 framed harness work through sandboxes, memory freshness, and telemetry. May 28 pushed it into a fuller discipline with a curated canon, a survey taxonomy, and examples where operators were fixing specific tool-call failures.
1.2 Persistent agents needed shared state, indexing, and cross-module continuity 🡕¶
The second strong theme was that once agents cross module, repository, or application boundaries, the hard part becomes shared state rather than raw model capability. At least three retained items supported this theme.
@chamath announced (113 likes, 20 replies, 64,346 views, 44 bookmarks) that 8090 Software Factory launched a unified agent that persists across modules, with multi-repo indexing scheduled next. The public 8090 site describes Software Factory as an AI-native SDLC control plane built around documentation, oversight, and full audit trails, which makes the unified-agent pitch less about chatbot convenience and more about keeping context alive across an enterprise workflow.
@BankQuote described (200 likes, 22 replies, 5,312 views, 27 bookmarks) KasGraph as the shared indexing layer Kaspa needs for wallets, explorers, DeFi interfaces, and AI agents. The post is unusually detailed: GraphQL SDL for schemas, TypeScript event handlers compiled to WASM through AssemblyScript, indexed data materialized into Postgres, proof-of-indexing checkpoints, and access through GraphQL, MCP, gRPC streaming, and WebSocket subscriptions. Replies under the thread made the demand explicit, with @Kas_Ranks saying they had been waiting for a unified indexer.
@GuappGet positioned (25 likes, 5 replies, 1,292 views) Goblintown as a local-first desktop app for multi-agent orchestration, with separate worker, critic, memory, judge, and delivery roles. That is the same pattern in a different wrapper: once work becomes long-running, teams want role separation, resumability, and a shared context surface instead of a single undifferentiated chat loop.
Discussion insight: The replies here were mostly about handoff cost, coordination, and reuse. The concrete ask was not a smarter isolated model. It was a persistent surface that does not lose the state, indexes, or routing logic every time work moves across modules or agents.
Comparison to prior day: May 27 emphasized agent org charts and background desktop control. May 28 made the continuity layer more explicit through unified agents, shared indexers, and local-first orchestration shells.
1.3 Skills were treated as reusable assets that agents can write, audit, and benchmark 🡕¶
Skills also moved one step further from static prompt files toward reusable operating assets that agents can generate and humans can review. At least three retained items supported this theme.
@MervinPraison reported (5 likes, 2 replies, 349 views, 7 bookmarks) that a Hermes agent read CLI help and generated its own SKILL.md, scripts, and triggers correctly. That matters because it treats skills as machine-produced working documents rather than hand-authored prompt cargo cult.
@bradmillscan shared (8 likes, 3 replies, 1,036 views, 6 bookmarks) a starter set of Hermes skill proficiencies such as Tidy, Methodical, Helpful, and Thorough. The linked repo turns those traits into inspectable artifacts for workspace hygiene, SOP generation, and publishing, which is more operational than vague “best practices” advice.
@koylanai added (33 likes, 4 replies, 1,243 views, 35 bookmarks) that the new survey explicitly cited Agent Skills for Context Engineering, reinforcing that skills are now being discussed as part of the production harness rather than as an afterthought around the model.
Discussion insight: The common preference was for visible, portable skill files that can be reviewed, edited, and carried across runtimes. That is materially different from treating prompts as hidden implementation detail.
Comparison to prior day: May 27 centered on editor-native skills and skill optimization research. May 28 pushed the idea toward self-generated skills, proficiencies, and research-backed treatment of skills as a core harness primitive.
1.4 Agent-economy marketplaces spread faster than hard evidence 🡒¶
The most speculative cluster was around marketplaces, monetization, and the “agent economy.” The theme was real, but the public evidence was thinner than in the harness and infrastructure posts. At least four retained items supported it.
@swarms_corp pitched (39 likes, 6 replies, 1,274 views) an autonomous marketplace where micro-agents discover, evaluate, deploy, and trade intelligence. @agentranking described (21 likes, 4 replies, 348 views) a marketplace for paid API calls, subscriptions, MCP servers, and machine-payable tools ranked by reputation.
@nottellingyou73 claimed (22 likes, 6 replies, 6,692 views, 6 bookmarks) early access to a Robinhood-linked trading-agent marketplace, while @trythreews launched (7 likes, 1 reply, 40 views) an AWS Marketplace listing for three.ws, which says it lets builders embed 3D AI agents on-chain and on any webpage with a single tag.
Discussion insight: Compared with the harness and data-layer discussions, these marketplace posts carried more future-state language and less operational detail. The concept clearly has attention, but the strongest evidence for durable value still sat with the boring infrastructure underneath.
Comparison to prior day: May 27 already had specialist operators and workspaces. May 28 wrapped that energy in agent-economy and monetization narratives, even when the public proof remained early.
2. What Frustrates People¶
Tool-call reliability is still brittle in real agent stacks¶
Severity: High. The most practical frustration was that open-model agents still fail at the harness layer before users even get to evaluate the model itself. @MrAhmadAwais said (43 likes, 2 replies, 4,475 views, 8 bookmarks) Claude Code has a major tool-call problem on open models, and @_vmlops framed (53 likes, 2 replies, 3,907 views, 73 bookmarks) those failures as scaffolding problems around context, tools, memory, and verification. Even the skeptical reaction from @ricklamers argues that overbuilt harnesses can hurt generalization. The workaround today is not trust. It is continuous harness tweaking, simpler loops where possible, and explicit evaluation. This is worth building for because the failure happens before users can get dependable work out of the agent.
Governance and privilege boundaries are still under-specified¶
Severity: High. The strongest security-flavored complaint was not abstract fear of agents. It was uncertainty about what happens when multi-agent systems can touch real tools and real infrastructure. @VivekIntel showed (30 likes, 2 replies, 1,427 views, 31 bookmarks) PentestAgent as an autonomous penetration-testing framework, and the most useful reply immediately asked how tool execution and privilege boundaries are constrained. @anton_chuvakin highlighted (8 likes, 929 views, 9 bookmarks) the paper Agent Security is a Systems Problem, while @WisemanCap noted (26 likes, 2 replies, 4,013 views) that Snowflake is acquiring Natoma for agent governance. The coping pattern is to add policy, audit, and privileged access layers around agents rather than let autonomy stand alone.
Teams are still rebuilding the same indexing, memory, and coordination surfaces¶
Severity: Medium-High. The KasGraph and Software Factory posts expose a repeated infrastructure tax: every serious agent workflow seems to need its own indexer, continuity layer, or routing surface. @BankQuote spelled out (200 likes, 22 replies, 5,312 views, 27 bookmarks) the exact duplicated questions every Kaspa app eventually needs answered, while @chamath sold (113 likes, 20 replies, 64,346 views, 44 bookmarks) a unified agent partly on removing module handoffs. @GuappGet added (25 likes, 5 replies, 1,292 views) a local-first multi-agent shell with dedicated memory and reviewer roles. The frustration is not glamorous, but it is structural and recurring.
Marketplace rhetoric is ahead of trust and verification¶
Severity: Medium. The agent-marketplace posts created attention, but the evidence density lagged the harness and infrastructure cluster. @swarms_corp talked (39 likes, 6 replies, 1,274 views) about the agent economy, @agentranking promoted (21 likes, 4 replies, 348 views) reputation-ranked monetized capabilities, and @nottellingyou73 offered (22 likes, 6 replies, 6,692 views) early access to trading agents. What was missing was the same level of public detail on failure handling, governance, or sustained operator value. That leaves a credibility gap for anyone trying to build the commercial layer.
3. What People Wish Existed¶
Portable harnesses that can be repaired and measured¶
The strongest wish in the feed was not for a new frontier model. It was for harnesses that can show whether a change actually helped. The harness-engineering repo, the new survey, and the Command Code tool-call complaints all point to the same gap. Builders want measurable gains in tool use, context assembly, verification, and safety without guesswork. Opportunity: direct.
Shared data and continuity layers for persistent agents¶
KasGraph, Software Factory, and Goblintown all imply the same unmet need: a reusable control and data plane for agents that have to persist across modules, repos, or products. Today, many teams are still building their own indexers, memory layers, and coordination shells from scratch. Opportunity: direct and competitive.
Default-secure governance for MCP, tools, and privileged actions¶
The security conversation was consistent: once agents touch live tools, teams want strong boundaries, not prompt-only caution. PentestAgent's own reactions, the Natoma acquisition signal, and the systems-security paper all point to a demand for identity, policy, audit, and least-privilege controls that fit agent workflows naturally. Opportunity: direct.
Skill systems that agents can generate, review, and improve¶
The Hermes and skill-proficiency posts suggest that teams want more than a folder of static prompts. They want skills that can be generated from real work, inspected like code, benchmarked, and carried across runtimes. This feels practical rather than aspirational. Opportunity: direct and competitive.
Trust and reputation layers for agent marketplaces¶
Marketplace posts are multiplying, but the trust layer is visibly underbuilt. If agent capabilities are going to be bought, sold, subscribed to, or invoked machine-to-machine, builders will need reputation, sandboxing, policy, and billing systems that can stand up to real operator scrutiny. Opportunity: emerging.
4. Tools and Methods in Use¶
| Tool | Category | Sentiment | Strengths | Limitations |
|---|---|---|---|---|
| awesome-harness-engineering | Reference collection | (+) | Gives builders one map of context, tools, memory, verification, and sandboxes | It is a canon, not an executable runtime |
| Agent Harness Engineering: A Survey | Research framework | (+) | Formalizes a seven-layer taxonomy and maps 170+ open-source projects | Conceptual rather than turnkey |
| 8090 Software Factory unified agent | Agent platform | (+/-) | Persistent cross-module agent flow, audit trail, documentation, and oversight | Public technical detail is still thinner than the product framing |
| KasGraph | Data/indexing layer | (+) | GraphQL, WASM, Postgres, MCP, gRPC, and WebSocket surfaces for reusable state access | Early and ecosystem-specific |
| PentestAgent | Security operator | (+/-) | Assist/agent/crew modes, Docker or Kali isolation, report generation, MCP support | Sensitive domain where permissions and scope control matter a lot |
| Hermes proficiencies | Skill package | (+/-) | Turns workspace discipline and SOPs into portable skill assets | Evidence today is still thread-level and early |
| three.ws | Agent surface / monetization | (+/-) | Embeddable 3D agents, MCP and A2A support, portable identity and billing hooks | More interface and monetization proof than operator proof so far |
Overall sentiment favored tools that made the control surface more explicit. Collections, taxonomies, and infrastructure layers got the cleanest positive reaction because they made agent behavior easier to reason about. Sentiment was more mixed around end-user marketplaces and around offensive-security operators, where the upside is obvious but the governance and trust requirements are much higher. The day's migration pattern was from prompt-centric thinking toward scaffolding-centric thinking: context, policy, orchestration, repair loops, and reusable skills.
5. What People Are Building¶
| Project | Who built it | What it does | Problem it solves | Stack | Stage | Links |
|---|---|---|---|---|---|---|
| KasGraph | @BankQuote | Subgraph-style indexing layer for Kaspa apps and agents | Every app otherwise has to rebuild token, covenant, NFT, transfer, and proof indexing on its own | GraphQL SDL, TypeScript, AssemblyScript/WASM, Postgres, MCP, gRPC, WebSocket | Alpha | post |
| 8090 Software Factory unified agent | 8090 | Persistent agent that operates across modules inside an AI-native SDLC platform | Cross-module handoff cost and context loss in enterprise software delivery | AI-native SDLC control plane, documentation layer, audit trail, multi-repo indexing roadmap | Beta | site, post |
| PentestAgent | GH05TCREW | Autonomous penetration-testing framework with assist, agent, and crew modes | Repeatable recon, exploitation, and reporting workflows are still labor-heavy | Python, LiteLLM-supported models, terminal and browser tools, Docker or Kali, MCP | Shipped | repo |
| Goblintown | @GuappGet | Local-first desktop app for single-agent and multi-agent orchestration | Complex work needs role separation, resumability, budget controls, and a reviewer loop | Local-first desktop app, multi-agent roles, signed outputs, budget controls | Alpha | post |
| three.ws | three.ws | Lets builders embed fully embodied 3D AI agents on-chain and on webpages with one tag | Agents still lack a portable front-end identity and distribution surface | Web components, GLB/gltf rendering, Solana, Metaplex Core, MCP, A2A | Beta | site, post |
KasGraph and 8090 Software Factory point to the same build pattern from opposite directions. One starts from blockchain infrastructure and one from enterprise SDLC, but both spend most of their energy on continuity: shared state, reusable indexing, and fewer handoffs.
PentestAgent is the clearest operator build of the day. Its public repo already documents multiple execution modes, Docker and Kali isolation, MCP support, and built-in reporting, which makes it more concrete than the many marketplace pitches layered on top of “agent economy” language.
Goblintown and three.ws show where builders are experimenting on the interface side. Goblintown turns orchestration into a role-based desktop shell, while three.ws turns agents into embeddable front-end surfaces with portable identity and monetization hooks. Both are interesting, but the evidence today is still earlier than the underlying control-plane work.
6. New and Notable¶
Harness engineering got its own public canon¶
The combination of the awesome-harness-engineering repo, the new survey, and supporting explainers made harness engineering feel like a stabilized term rather than a passing meme. That matters because it gives builders a common vocabulary for context, tools, verification, and governance. (repo)
Unified agents started replacing per-module agents in public product messaging¶
8090's unified-agent launch was a meaningful change in framing. The promise was not “more agents,” but one persistent agent that survives module boundaries and keeps conversation history, skills, and alerts available across the workflow. (source)
Self-writing skills moved from theory to everyday demos¶
Hermes generating its own SKILL.md, scripts, and triggers is a small post with an outsized implication: skills are now being treated as working outputs that an agent can author after doing the task, not just static human-authored setup. (source)
7. Where the Opportunities Are¶
[+++] Agent control and data planes for persistent work — Software Factory, KasGraph, and Goblintown all point to the same structural need: durable context, shared indexing, orchestration, and fewer handoffs once agents leave a single-task loop.
[+++] Governance and repair tooling for agent harnesses — Tool-call failures, the Natoma acquisition signal, PentestAgent's privilege questions, and the systems-security paper all show demand for audit, policy, scope control, and measurable harness repair.
[++] Skill lifecycle platforms — The survey citation, Hermes self-generated skills, and Hermes proficiencies all suggest teams want a way to create, benchmark, review, and port skills across runtimes.
[+] Trust and reputation infrastructure for agent marketplaces — Swarms, AgentRanking, trading-agent pitches, and three.ws show the commercial layer forming, but the proof of durable trust and operator-grade controls is still early.
8. Takeaways¶
- Harness engineering was the dominant explanatory frame for AI agents on May 28. The clearest evidence came from the
awesome-harness-engineeringrepo and the new survey paper, both of which defined the work as context, tools, verification, orchestration, and governance rather than model tuning alone. (repo) - Persistent agents increasingly need shared state and indexing more than more model novelty. 8090's unified agent and KasGraph's indexer both sold continuity, not raw model IQ, as the real bottleneck. (source)
- Skills are turning into inspectable operating assets. Hermes generating its own skill and the Hermes proficiencies repo both point toward skills that can be reviewed, reused, and improved like code. (source)
- Marketplace narratives are expanding faster than publicly verified operator value. AgentRanking, Swarms, and trading-agent posts made the commercial layer visible, but the most concrete technical evidence still sat with the infrastructure underneath. (source)