Skip to content

Twitter AI Agent - 2026-05-04

1. What People Are Talking About

1.1 Harness Engineering Matures: Build vs Buy Sharpens πŸ‘’

The harness engineering debate continued from May 3 but shifted from "should you build your own?" toward practical implementation patterns. @_colemurray crystallized the anti-build position (51 likes, 50 bookmarks): "it is unlikely you will have novel ideas around sub-agent orchestration, compaction, progressive disclosure etc that are worth owning the entire harness. Spend your time investing in the pieces around the harness: execution infrastructure, custom tools/MCPs/skills, self improvement on trajectories."

@nichochar pushed back directly: "disagree. harnesses are quite functionally simple and they are really important to understand deeply since we delegate work to them. all builders should be familiar with how their tools work and maintaining a harness in 2026 is the best way to do this." @skastr052 added a nuanced take: "model-harness coupling will tend to increase -- for example, the recent code's /goal command only really works with codex and codex's compaction backend."

Meanwhile, the Agentic Harness Engineering (AHE) paper gained traction through @DataScienceDojo coverage (23 views): a closed-loop system that automatically evolves all scaffolding around a coding agent without touching the base model. Key result: 10 iterations lifted pass@1 on Terminal-Bench 2 from 69.7% to 77.0%, with 12% fewer tokens on SWE-bench-verified.

@MrAhmadAwais captured the build-in-house sentiment (14 likes) with a meme video: "'we can build a coding agent harness in house.'" @dexhorthy maintained his backlash posture (70 likes): "hey Nikita I really wanna see more ai written hype slop about harness engineering can you help me out."

Comparison to prior day: May 3 moved from definitions to strategic build/buy decisions. May 4 deepens the split -- the debate now has published research results (AHE paper) backing the "automate harness evolution" camp, while practitioners like @nichochar defend understanding-through-building. The backlash fatigue from @dexhorthy persists as a steady counterweight.


1.2 Context Engineering Gains Structured Vocabulary πŸ‘•

Context engineering discourse moved beyond general advocacy into technical specifics. @_overment published a detailed breakdown (7 bookmarks) of multi-agent prompt architecture, identifying unsolved problems: "How do you keep the context cache when the settings change? How do you preserve important context beyond compression? How do you present results to the model when they go beyond the available context window?"

Multi-agent prompt architecture diagram showing cached prefix structure, runtime notes, catalogs, and cache breakpoints

@avrldotdev highlighted (8 bookmarks) a deep article on compaction covering 7 philosophies, their pros and cons. @petergyang shared (9 bookmarks) a free Context Engineering for Prototyping Workbook. @NalyMetaX framed the broader shift (56 likes): "The market is obsessed with prompt engineering, but prompts are just band-aids for a lack of context."

@DanIsBuilding captured the inflection in a reply to @_overment: "Personally believe we're about to transition from models to harness design being the bottleneck. Context engineering, tool disclosure and delegation are already holding back existing models in a big way."

Comparison to prior day: May 3 featured context engineering as a secondary theme within harness debates. May 4 elevates it as a standalone topic with structured materials (architecture diagrams, workbooks, taxonomy of compaction strategies).


1.3 Local Model Coding Agents on Consumer GPUs πŸ‘•

@witcheer detailed (64 likes, 78 bookmarks) how a 27B model on a single RTX 3090 now runs a full autonomous coding loop using Qwen3.5-27B with q4_0 KV cache at 262K context: "the model writes multi-file code, runs tests, fixes its own bugs, serves the result. all on 24GB VRAM." The thread included RTX 4060 Ti (8GB) and RTX 3060 (12GB) configurations.

@sakurayukiai praised: "q4 KV cache is single-handedly keeping consumer GPUs alive for agentic workflows." But @oppollo11 cautioned: "q4_0 for KV cache creates unreliable work. i would suggest not to go below q8." @EdgeDimi was blunt: "Its impossible to read a complex agentic workspace in good time let alone to do proper sequential tooling. I dont know whats the experience on 512gb ram hardwares... but its delusional to think these small parameter models can do anything of essence."

Comparison to prior day: May 3 discussed local model cost advantages (Mac Mini setups). May 4 narrows to specific GPU configurations and quantization trade-offs, with real benchmarks on consumer hardware.


1.4 Hermes Agent Ecosystem Expands Rapidly πŸ‘•

The Hermes Agent ecosystem continued its breakout trajectory from May 3's Kanban launch. @shmidtqq catalogued (58 likes, 31 bookmarks) the full feature set: "persistent memory, filesystem rollback, session branching, 17-platform reach, realtime voice on 4 of them, multi-provider model swap, cron + webhooks, 100+ skills as slash commands."

Hermes Desktop v0.6.0 Kanban board showing multi-agent task orchestration with Ready, Blocked, and Done columns

@DODOREACH shipped Hermes Desktop v0.6.0 (36 likes, 24 bookmarks) with Kanban orchestration support. @outsource_ launched HermesWorld (11 likes) -- an MMORPG plugin for Hermes agents. @WesRoth covered (24 likes) Shopify's dedicated Hermes Agent skill for autonomous storefront management. @MystiqueMide documented a full migration to VPS-hosted Hermes running 24/7 on Telegram and Discord.

Comparison to prior day: May 3 saw the Kanban launch itself. May 4 shows the ecosystem buildout -- desktop apps, third-party skills (Shopify), community plugins (HermesWorld MMORPG), and user deployment guides. The velocity of ecosystem development is accelerating.


1.5 Agent Skills and Skill Marketplaces Proliferate πŸ‘•

Agent skills emerged as a distinct product category with multiple launches. @higgsfield released (526 likes, 440 bookmarks, 17,830 views) Higgsfield CLI + Marketing Skills -- the day's highest-scoring AI-relevant post: "Instead of burning tokens on bloated schemas, or shipping broken creative at scale, the CLI keeps agent spend lean and Skills keep output high quality. Pairs with Codex, Claude Code, Openclaw etc."

@tom_doerr shared autoskills (10 likes, 5 bookmarks) -- a CLI that scans your tech stack and auto-installs the right agent skills. @xdotli celebrated (24 likes) SkillsBench reaching 1.1K GitHub stars in 2 months, with 65% of agent skills research now citing their paper.

GitHub star history chart comparing SkillsBench, SWE-bench, and Terminal-Bench growth rates

@Shruti_0810 highlighted (17 likes) TinyFish making web search and fetch free for every AI agent, working across REST API, MCP, Python, TS SDKs, and Claude Code skills.

Comparison to prior day: May 3 mentioned skills within the harness framework context. May 4 shows skills becoming a standalone product category with their own marketplace dynamics, benchmarks, and auto-installation tooling.


1.6 Enterprise Agent Governance Goes Live πŸ‘’

@WesRoth covered (22 likes) Microsoft Agent 365 going generally available as of May 1: "a centralized control plane that allows IT and security teams to observe, govern, and secure AI agents across the enterprise," including a unified registry, visual agents map, and support for both delegated and autonomous agents.

@RavenOfSpace raised a sharp objection: "Autonomous agents with own credentials = blast radius problems IAM wasn't built for. Microsoft's framing admits the gap." @ItsKondrat identified the real bottleneck: "control plane is solved. the unsolved part: what fields you actually fill in. manager-of-record: blank. PM owns that spec, not IT."

@TheWhizzAI summarized (47 likes, 23 bookmarks) Google DeepMind's "AI Agent Traps" paper documenting six attack types that hijack AI agents against their own users, with specific concern about approval fatigue: "You have seen 50 AI recommendations today. You approved 49. You stop reading carefully. That is when the trap triggers."

Comparison to prior day: May 3 featured Google Agent Anomaly Detection as an emerging signal. May 4 adds Microsoft's Agent 365 GA and DeepMind's adversarial research, establishing agent governance as a multi-vendor priority rather than a single company's initiative.


1.7 Research: Harness Evolution, Multi-Agent Search, and Skill Structure πŸ‘’

@dair_ai published (135 likes, 132 bookmarks) the top AI papers of the week including Latent Agents, RecursiveMAS, OneManCompany, AgenticQwen-30B-A3B, Agentic World Modeling, and Agentic Harness Engineering.

@HuggingPapers and @_akhaliq both covered Web2BigTable -- a bi-level multi-agent framework for internet-scale information extraction that achieves 7.5x SOTA on WideSearch (8 bookmarks).

Web2BigTable architecture diagram showing orchestrator-worker coordination via shared workspaces with self-evolving skill banks

@EmpathYang announced (15 likes) PlugMem accepted to ICML 2026 -- a plug-and-play memory module that turns raw trajectories into a knowledge graph, achieving SOTA on LongMemEval and HotpotQA. @fly51fly shared the Peking University paper "From Skill Text to Skill Structure" on scheduling-structural-logical representations for agent skills.

Comparison to prior day: May 3 featured the same paper batch in preview. May 4 adds Web2BigTable (multi-agent search at scale) and PlugMem (ICML-accepted agent memory), deepening the research pipeline.


2. What Frustrates People

Token Waste and Cost Opacity

@higgsfield framed the top post around "burning tokens on bloated schemas." @AINativeLang quantified the gap: "$870 in total AI spend vs $3,000+ on traditional agent loops" over 7 weeks -- a 71% cost reduction by compiling the orchestration layer rather than running it through LLM reasoning.

Multi-Agent Coordination Complexity

@hosseeb asked (37 likes, 61 bookmarks): "What's the craziest example of a multi-agent workflow/setup you've seen?" The replies revealed that most setups are still primitive. @0xgilbert admitted: "I run 5-7 agents directly all day. It feels pretty timid compared to what a deliberate orchestrated system should be able to manage." @ercwl described peak multi-agent as deliberately absurd: "Peak performance is letting Claude control your Codex desktop application via computer use through Dispatch from your mobile phone."

Harness Discourse Fatigue

@dexhorthy expressed (70 likes) exhaustion with "ai written hype slop about harness engineering," and later noted being "3 months early to context engineering, 6 months early to lights off software factory, 6 months early on 'OH f**** TURN THE LIGHTS BACK ON.'" The sustained backlash signals the discourse has passed peak novelty.

Agent Security Gaps

@dipsybitsy commented (9 likes) on someone using morse code to drain $200K from agents: "we are so early on agent security its not even funny. next week theyll be using smoke signals. who's building the real guardrails?"


3. What People Wish Existed

Standardized Agent Identity and Authorization

@Aiagent_s described the gap: "Enterprises are deploying AI agents with no identity framework. No permission scoping. No audit trail. No revocation mechanism." Microsoft Agent 365 addresses part of this, but @RavenOfSpace noted IAM was not built for autonomous agents with their own credentials.

Headless Agent Execution Without Sandboxes

@hwchase17 responded (33 likes, 18 bookmarks) to demand for running harnesses without disk or bash access: "deepagents you can run with a 'virtual filesystem' lets do lots of great context engineering tricks, without requiring an actual sandbox environment!" @BrandGrowthOS called it "clever -- so you're basically mocking the file system calls instead of spinning up containers."

Agent Infrastructure as a Service

@cycle_vega identified the recurring pain: "Every vertical SaaS is building its own AI agent right now. But triggers, session persistence, API orchestration, data security -- it's a full infra project every time. Stripe didn't ask every startup to build their own payment rails."

Agent Memory That Actually Works

@EloPhanto articulated the standard in a reply to RunTrim: "For coding agents, 'memory' only earns the name if it can answer: what did I touch, what did I promise not to touch, what changed since last run, and what proof says I'm done?"


4. Tools and Methods in Use

Tool Category Sentiment Strengths Limitations
Hermes Agent Agent framework Positive 100+ skills, multi-platform, Kanban orchestration, self-improving, persistent memory New ecosystem, community-driven quality variance
Claude Code Coding agent Positive Deep reasoning, skill system, subagents, hooks Credit limits (@nichxbt: "$120 on top of $200/month plan")
Flue / PyFlue Agent harness Positive Programmable harness, Model+Harness+Memory formula, cross-language Python port is new, still catching up to TS original
LangChain Deep Agents Agent harness Positive Virtual filesystem, context engineering, batteries included More complex than minimal harnesses
Qwen3.5-27B (q4_0 KV) Local model Mixed 262K context on 24GB VRAM, zero speed penalty Quality concerns with q4_0 quantization, unreliable for complex tasks
OpenClaw Agent platform Positive 100+ skills, slash commands, plugin ecosystem Framework eating context budget (per May 3 feedback)
Codex Coding agent Positive /goal command, cloud execution, compaction backend Model-harness coupling increasing
Gemini CLI Coding agent Emerging Google integration, growing skill support Less mature ecosystem
TinyFish Web fetch for agents Positive Free tier, clean markdown output, lower token costs New service, limited track record
MCP Tool protocol Positive Standard for agent-tool connection, wide adoption Not a runtime -- still need a harness

5. What People Are Building

Project Who built it What it does Problem it solves Stack Stage Links
Higgsfield CLI + Marketing Skills @higgsfield Agent skills for marketing creative at scale Token waste in creative workflows Skills SDK, Codex/Claude/OpenClaw Shipped post
PyFlue @Shashikant86 Python port of Flue agent harness framework No Python harness equivalent to Flue Python, LangChain Deep Agents Shipped post
Hermes Desktop v0.6.0 @DODOREACH macOS desktop app for Hermes Agent with Kanban CLI-averse users need GUI for agent management Electron, SSH, Hermes Shipped post
HermesWorld @outsource_ MMORPG plugin for Hermes Agent dashboard Community engagement and skill discovery Hermes plugin system Shipped post
RunTrim @MichelLeoAnt Memory, scope, and control layer for coding agents No cross-agent run history or forbidden-file tracking CLI, dashboard Shipped post
AgenC Marketplace Agent Kit @tetsuoarena On-chain task marketplace for AI agents Separating wallet authority from task execution Solana, Claude Code, MCP Devnet testing post
PlugMem @EmpathYang Plug-and-play memory module using knowledge graphs Agents lose memory between sessions Knowledge graphs, OpenClaw/Claude Code Research (ICML 2026) post
Anvia @indrazulfi Open-source TypeScript AI agent framework with Studio Agent inspection and debugging TypeScript, 13 npm packages, MIT Shipped post
RustyClaw @webxos Terminal-based local-only agent harness in Rust Need for minimal, single-binary agent harness Rust, Ollama, TUI Shipped post
Decode MCP Server @seflless Generate and edit mermaid diagrams via coding agents Agents cannot create visual diagrams for planning MCP, Mermaid.js Shipped post
headless-cli @RobertTLange Unified interface for headless coding agent execution Running multiple agent CLIs requires different commands TypeScript, npx Shipped post
Veritas Kanban v4.1 @BradGroux OpenClaw task manager with semantic search Duplicate detection and context injection for multi-agent work OpenClaw, QMD, SQLite Shipped post

6. New and Notable

Shopify Ships Official Hermes Agent Skill

@WesRoth reported (24 likes) that Shopify released a dedicated Hermes Agent skill enabling autonomous storefront management -- products, inventory, orders, and fulfillments. This is notable as a major commerce platform building native integration with an open-source agent framework rather than a proprietary solution.

Google Remy Agent Tab Spotted in Gemini iOS Redesign

@Lentils80 found (41 likes) a new "Agent" tab in the upcoming Gemini app redesign, showing Tasks, Skills, and Schedules sections -- confirming Google is building persistent agent capabilities directly into Gemini under the "Remy" codename.

Gemini iOS redesign showing Agent tab with Tasks, Skills, and Schedules

Anthropic Publishes AI Company Blueprint

@VaibhavSisinty described (21 likes, 36 bookmarks) Anthropic's published blueprint for building an "AI company" using Claude + Google Cloud's Agent Stack: "1 human CEO, multiple AI employees, agents divide work among themselves, long-term memory across sessions, one command deploys to production." The framing: "The shift of 2024 was tools. The shift of 2025 was agents. The shift of 2026 is org charts."

HeyGen Agent Ships Inside Superhuman Go

@HeyGen announced (59 likes, 33 quotes) integration with Superhuman Go, turning written updates into video or voice. Notable for the high quote count -- at least 15 accounts amplified this, suggesting a coordinated launch. The integration joins Canva, Granola, and RecitalApp as agents inside Superhuman Go.

In-Context Recursive Self-Improvement Demonstrated

@doodlestein argued (14 likes) that recursive self-improvement already exists through custom agent CLI tooling and skills: "I had Claude Code take the plan it came up with and use it with ntm itself to manage the agent swarm to implement the plan." The skill creates a TUI inspector, the agent uses it to verify its own visual output, and feeds improvements back into the skill.


7. Where the Opportunities Are

[+++] Agent memory and context persistence layers. Multiple signals converge: RunTrim ships memory/scope/control for coding agents. PlugMem gets ICML acceptance. @_overment identifies unsolved context caching problems. @EloPhanto defines what agent memory must prove. The gap between "stateless agent session" and "persistent, auditable agent workspace" is the most frequently referenced unmet need.

[+++] Agent skills marketplace and distribution. Higgsfield CLI (526 likes, 440 bookmarks) leads the day. SkillsBench reaches 1.1K stars. Autoskills auto-detects and installs skills. Shopify ships official Hermes skills. The "npx skills add" pattern is becoming the npm of agent capabilities -- whoever builds the discovery and quality layer wins.

[++] Agent governance and security tooling. Microsoft Agent 365 goes GA. DeepMind publishes agent attack taxonomies. @dipsybitsy reports $200K drained via morse code exploits. @xBalbinus frames the opportunity: "The next breakout AI product might not be smarter agents, it might be safer ones." Enterprise demand is clear; supply is fragmented.

[++] Compiled orchestration (zero-cost coordination). AINL demonstrates 71% cost reduction by compiling orchestration logic rather than running it through LLM reasoning at each step. As agent workloads grow, the "orchestration tax" becomes the dominant cost. Tools that eliminate per-step reasoning overhead have a structural cost advantage.

[+] Local model agent infrastructure. Consumer GPU configurations (RTX 3060-3090) running 27B models with 262K context for autonomous coding loops. The gap between "it technically works" and "it's reliable" creates opportunity for local-first agent tooling that handles quantization trade-offs, model selection, and quality monitoring automatically.

[+] Agent-native SaaS infrastructure. @cycle_vega's observation that every vertical SaaS rebuilds triggers, session persistence, and API orchestration from scratch each time. A "Stripe for agent infrastructure" -- standardized triggers, state management, and tool connection -- would capture the repeating horizontal layer.


8. Takeaways

  1. The skills layer is eating the agent stack. Higgsfield CLI's 526 likes and 440 bookmarks (day's top AI post) plus Shopify's official Hermes skill signal that installable, composable skills -- not monolithic frameworks -- are becoming the primary unit of agent capability distribution. (source)

  2. Harness engineering has reached the "boring infrastructure" phase. The debate has shifted from "what is it?" to "should you automate its evolution?" The AHE paper's 77% pass@1 result proves automated harness optimization works, while @nichochar and @_colemurray's disagreement shows practitioners are still divided on build-vs-buy. (source)

  3. Context engineering is the new bottleneck, not model quality. @_overment's detailed architecture diagram, @DanIsBuilding's "models to harness design" transition call, and 12 "context engineering" phrase matches in the dataset all point to context management surpassing model capability as the binding constraint. (source)

  4. The Hermes Agent ecosystem is building faster than any competing open-source agent framework. Desktop app, Shopify skill, HermesWorld MMORPG plugin, VPS deployment guides, and Kanban orchestration all shipped or launched on a single day. The breadth of community contributions signals genuine adoption beyond hype. (source)

  5. Agent governance is moving from "nice to have" to enterprise requirement. Microsoft Agent 365 GA, DeepMind's agent trap taxonomy, and practitioner reports of $200K agent exploits form a triangle of supply (vendor tools), research (attack frameworks), and demand (real losses). The governance stack is likely to become mandatory for enterprise agent deployments. (source)

  6. Local model coding agents are viable but not yet reliable. The Qwen3.5-27B on RTX 3090 setup demonstrates technical feasibility at 262K context, but the q4_0 vs q8_0 debate and @EdgeDimi's pushback confirm the gap between "it runs" and "I trust it with production code." (source)

  7. The agent marketplace pattern has moved from concept to competitive. AgenC on Solana devnet, Hermes skills marketplace, Swarms Marketplace, agent-native hackathon tracks -- at least 5 independent teams are building agent work marketplaces simultaneously. The concept is validated; the question is now execution and distribution. (source)