Skip to content

Twitter AI Agent - 2026-05-01

1. What People Are Talking About

1.1 Flue Ships the First Agent Harness Framework πŸ‘•

The day's highest-signal item was @FredKSchott launching Flue (999 likes, 1,255 bookmarks, 102,567 views) -- a TypeScript framework for building agents around a built-in agent harness. Flue is positioned as "Claude Code, but 100% headless and programmable" with no assumption of a human operator. Unlike AI SDKs, Flue is a runtime-agnostic framework: write once, build, and deploy agents anywhere (Node.js, Cloudflare, GitHub Actions, GitLab CI/CD). First-class concepts include sessions, subagents, built-in sandboxes, and skills defined in Markdown. The framework originated from powering AI workflows inside the Astro GitHub repo.

Flue code example showing agents/triage.ts with session.skill(), session.prompt(), and session.shell() APIs demonstrating structured output and sandbox execution

@FredKSchott clarified the positioning: "Framework vs. SDK. Kind of like how Next.js/Astro is built on top of React, Flue is built on top of pi-agent-core to power our harness."

Discussion insight: @LeoTava8 captured the appeal: "The industry's obsession with prompting inside loops ignores the lessons we learned from infrastructure. We need explicit contract layers and proper stateful orchestration. Flue looks like exactly the right primitive -- separating the 'what' from the 'how'."

Comparison to prior day: April 30 saw Cursor publish its internal harness methodology and the community converging on harness engineering as a discipline. May 1 delivers the first dedicated open-source framework built around that thesis, moving from methodology to deployable artifact.


1.2 Microsoft Agent 365 Reaches General Availability πŸ‘•

@satyanadella announced (367 likes, 22,451 views) that Agent 365 is now generally available as a single control plane to observe, govern, and secure agents and their interactions across the enterprise. The system extends existing identity, security, governance, and management workflows to every AI agent -- including agents built with Microsoft AI and third-party ecosystem agents. New previews include observability for agents operating with their own credentials, shadow AI discovery via Microsoft Defender and Intune, and Windows 365 for Agents (managed sandboxed environments).

Discussion insight: @GroverLovesh provided practitioner context: "Most agent failures I've debugged in the last 60 days were identity/permission issues, not model issues. Microsoft is solving the boring half. The boring half is bigger. Indie agent stacks shipping without per-agent identity hit a wall at the first compliance review."

Comparison to prior day: April 30 focused on agent framework maturation in the developer space. May 1 shows the enterprise governance layer catching up -- the "boring half" that unlocks production deployment at institutional scale.


1.3 Harness Engineering Solidifies as the Core Discipline πŸ‘’

Multiple high-signal items reinforced harness engineering as the dominant paradigm:

@Vtrivedy10 explained (81 likes, 68 bookmarks) how LangChain's Deep Agents are built on create_agent, a single primitive supporting filesystem tools, bash, compaction, subagents, skills, memory, and hooks. He emphasized the primitive's extensibility as the foundation for all Deep Agents engineering.

LangChain documentation showing create_agent with ReAct loop diagram and API surface including Model, Tools, System prompt, Memory, Middleware

@Vtrivedy10 separately demonstrated that a single steering instruction with GPT-5.5 produces a 12% change in Terminal Bench Score, providing quantified evidence that "Prompt and Harness Engineering still matter A TON today."

@dntyk analyzed (33 likes, 25 bookmarks) the AHE paper showing auto-evolved harnesses reaching 77.0% on Terminal-Bench 2 (vs 71.9% for hand-designed Codex-CLI), with the key finding that "gains come from tools, middleware, and long-term memory. System prompt alone regresses."

AHE paper title: Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses by Lin et al

@davidfowl (116 likes, 15,537 views) captured the practitioner mood: "I'm gearing up to build my own agent orchestration system. Are we all doing this now?? What stage of grief is this?" Reply from @stackbenchdev: "every harness encodes the dev's theory of how their agents should run. That's why portable templates haven't crystallized yet."

Comparison to prior day: April 30 saw Cursor publish its methodology and convergent patterns identified. May 1 adds empirical proof (AHE paper benchmarks, 12% swing from single instruction) and practitioner frustration (Fowl building his own) -- the discipline is validated but the tooling gap persists.


1.4 Google COSMO Leak Reveals Android Agent OS πŸ‘•

@minchoi reported (133 likes, 52 bookmarks, 8,758 views) that Google leaked and then removed "COSMO" -- a comprehensive agent system for Android featuring local Gemini Nano, screen access, voice match, recall, browser agent, and deep research capabilities.

Eight Android screenshots of Google COSMO showing settings with Fulfillment Model options (Hybrid, P1 Only, Nano Only), Ambient Awareness toggles, and Skills discovery including Browser Agent, Deep Research, Recall, and Document Writer

The screenshots reveal a skill-based architecture with categories (Productivity, Information, Conversation, AI), a hybrid fulfillment model (cloud P1 when online, Nano offline), ambient awareness features (screen context, audio, interactions), and voice match for user authentication.

Discussion insight: @AdityaKTech summarized the sentiment: "Your phone is finally becoming the 'JARVIS' we were promised years ago."

Comparison to prior day: No prior coverage. This is the first evidence of Google building a full agent OS into Android with hybrid inference and extensible skills, positioning against Apple's on-device intelligence strategy.


1.5 Cross-Agent Skill Distribution Emerges as Category πŸ‘•

Three independent projects addressed skill portability across agents:

@aidenybai launched (89 likes, 61 bookmarks) agent-install, an open-source tool to install skills and MCPs across 52 coding agents via API or CLI.

agent-install code showing skill.add() and mcp.add() APIs targeting multiple agents like cursor and claude-code, with equivalent CLI commands

@tiangolo (FastAPI creator) released (37 likes, 24 bookmarks) library-skills -- a tool that scans project dependencies and installs the AI skills bundled with libraries (like FastAPI). Skills are symlinked so library updates propagate automatically. Supports Python and Node.js, with a --claude flag for Claude Code compatibility.

@ElevenLabsDevs shipped (116 likes, 78 bookmarks) the Voice Changer Skill installable via npx skills add elevenlabs/skills, demonstrating capability vendors distributing through the skills ecosystem.

Discussion insight: @mylifcc identified the convergence challenge: "~/.claude/skills/SKILL.md is my Claude Code setup; Codex and Cursor use different paths. Curious whether agent-install converges those into one manifest or just copies files at path level."

Comparison to prior day: April 30's frustration was skill quantity outpacing quality. May 1 shifts focus to skill distribution infrastructure -- how skills reach agents across heterogeneous toolchains.


1.6 Multi-Agent Architecture Advances in Research and Practice πŸ‘’

@omarsar0 shared (40 likes, 39 bookmarks) the RecursiveMAS paper (arXiv:2604.25917, UIUC/Stanford/NVIDIA/MIT) proposing agents collaborate through recursive computation in shared latent space instead of text messages. Results: 8.3% accuracy gain, 1.2-2.4x speedup, and 34.6-75.6% token reduction across 9 benchmarks.

RecursiveMAS paper showing scaling law charts across MATH500, AIME2025, GPQA-D, and Code Gen, plus collaboration pattern comparisons (Mixture, Deliberation, Distillation styles)

@aakashgupta described a 21-agent dev team inside Claude Code that shipped a hockey rules app to TestFlight in 2 hours 13 minutes: "The spec is now the bottleneck. Clarity of what you actually want determines everything downstream."

Discussion insight: @haowjy raised a key question about RecursiveMAS: "I wonder if it's possible to retrofit this for completely different models? The main strength of multi-agent systems is the ability to have models trained on completely different techniques." @FiftyOne_50_ raised the control concern: "Latent agent coordination may cut tokens. It also hides more of the path."

Comparison to prior day: April 30 featured multi-agent patterns in hackathon winners (observation + verification). May 1 advances the theory (latent-space recursion replacing text coordination) and practice (21-agent teams shipping apps).


2. What Frustrates People

Everyone Building Their Own Orchestration -- Severity: High

@davidfowl (Microsoft .NET architect) expressed (116 likes, 46 replies, 15,537 views) the core frustration: "I'm gearing up to build my own agent orchestration system. Are we all doing this now?? What stage of grief is this?" Replies confirmed the pattern is widespread. @stackbenchdev: "every harness encodes the dev's theory of how their agents should run. That's why portable templates haven't crystallized yet -- the ones in the wild all look different because the workflows do." @buildwithparas: "bargaining, acceptance is when you stop calling it temporary."

The frustration spans the stack: the AHE paper shows auto-evolved harnesses outperform hand-designed ones, yet practitioners must still build from scratch because no framework captures the full harness surface area. Flue is the first attempt to address this at the framework level.

Agent Identity and Permission Failures -- Severity: High

@GroverLovesh reported: "Most agent failures I've debugged in the last 60 days were identity/permission issues, not model issues. Indie agent stacks shipping without per-agent identity hit a wall at the first compliance review." Microsoft's Agent 365 GA is the first enterprise answer, but leaves indie and cross-cloud deployments unsolved. @OfficialTravlad noted: "step outside GCP and the identity disappears. No portability, no accountability."

Harness Configuration Sensitivity -- Severity: Medium

@Vtrivedy10 demonstrated that a single tool_choice: "none" setting in GPT-5.5 injects a steering instruction that produces a 12% swing in Terminal Bench Score. @ValsAI confirmed OpenAI injects the instruction "in a way that tools: [] does not." Practitioners cannot predict how minor configuration changes interact with model internals, making harness tuning fragile and opaque.


3. What People Wish Existed

Portable Agent Orchestration Framework

@davidfowl and 46 replies confirm: developers want orchestration they do not have to build themselves. The gap is a framework where model choice is configuration, workflows are declarative, and the harness captures the full surface (tools, middleware, memory, subagents). Flue is the first dedicated attempt but is explicitly early. Practitioners are building bespoke solutions because nothing captures their workflow theory portably.

Urgency: High -- Opportunity: direct

Cross-Agent Skill Manifest Standard

@mylifcc identified the gap: "~/.claude/skills/SKILL.md is my Claude Code setup; Codex and Cursor use different paths. Curious whether agent-install converges those into one manifest or just copies files at path level. MCP stdio vs sse is the trickiest cross-agent piece." Three independent projects (agent-install, library-skills, skills.sh) all solve distribution but no standard manifest exists across the 52+ agents.

Urgency: High -- Opportunity: infrastructure

Transparent Harness-Model Interaction Debugging

The 12% Terminal Bench swing from a single steering instruction and the AHE paper's ablation results reveal that practitioners cannot observe how harness configurations interact with models. @InsiderPresider asked: "create_agent is definitely a solid primitive but I wonder if the abstraction layer is hiding too much complexity for production agent safety." What is needed: observability into how harness choices (tools, middleware, prompts) affect model behavior at evaluation time.

Urgency: Medium -- Opportunity: tooling


4. Tools and Methods in Use

Tool Category Sentiment Strengths Limitations
Flue Agent framework (+) First harness framework; runtime-agnostic; built-in sandbox; sessions/subagents first-class Brand new; early
LangChain create_agent / Deep Agents Agent framework (+) Extensible primitive; model-specific profiles; 10-20pt benchmark gains Requires significant extension for production
Cursor Harness Agent runtime (+) Published methodology; telemetry-driven optimization; model-specific tuning Proprietary product
Agent 365 Enterprise governance (+) Single control plane; identity/security/governance for all agents; shadow AI discovery Microsoft ecosystem-centric
DeepSeek-V4-Pro LLM (+) First open-weight model matching frontier in agentic coding; 1M context; CSA+HCA 10% KV cache Via Fireworks only currently
agent-install Skill distribution (+) 52 agents supported; API + CLI; open source No standard manifest yet
library-skills Skill distribution (+) Skills ship with libraries; auto-update via symlinks; Python + Node.js New; limited library adoption
Hermes Agent Coding/general agent (+) Telegram interface; VPS deployment; custom skills; multi-model via OpenRouter Crypto community overlap
ElevenLabs Skills Voice agent (+) Voice changer preserving emotion/timing; installable via skills.sh Voice domain only
Pi (coding agent) Agent runtime (+) Lightweight; works with DeepSeek-V4-Pro out of box Less featured than Claude Code
TradingAgents Multi-agent finance (+/-) 59K stars; full trading desk architecture; backtesting; Docker "Agent orchestration doesn't buy edge" per critics

DeepSeek-V4-Pro represents a notable shift: @omarsar0 reported it works in a basic harness "without any special configuration" -- the first time an open-weight model can be "plugged into a basic harness like Pi, and it just works." The hybrid CSA+HCA attention cuts KV cache to 10% and inference FLOPs by 4x at 1M context, making the agent loop "actually fast and cheap enough to run in practice."


5. What People Are Building

Project Who built it What it does Problem it solves Stack Stage Links
Flue @FredKSchott Agent harness framework with built-in sandbox, sessions, subagents No framework for agent harnesses TypeScript, pi-agent-core Alpha post
agent-install @aidenybai Install skills and MCPs across 52 coding agents Cross-agent skill fragmentation TypeScript, CLI/API Shipped post
library-skills @tiangolo Bundled library skills that auto-update with dependencies Agents using outdated library APIs Python, Node.js, symlinks Shipped GitHub
TradingAgents @sharbel Multi-agent trading desk (analysts, trader, risk, PM) Single-model trading underperformance LangGraph, multi-LLM, Docker Shipped post
User Support Triage Skill @doodlestein 85-file, 912KB universal support triage skill for 14 services Manual support queue management for SaaS Claude Code skills, subagents Shipped post
Personal AI Agent on VPS @thegreatola Personal agent: markets, trading, coding, content Multiple subscription costs Hermes, OpenRouter, Telegram, VPS Shipped post
HermesAgent SWARM v2.1 @outsource_ Multi-agent control with orchestrator, kanban, reports Single-agent limitations Hermes Agent, orchestrator chat Shipped post
Voice Changer Skill @ElevenLabsDevs Voice transformation preserving performance/emotion/timing No voice cloning as agent skill ElevenLabs, skills.sh Shipped post

The user-support-triage skill by @doodlestein represents the upper bound of skill complexity: 85 files, 912KB, with runbooks for GDPR DSARs, billing deep-dives, data loss, hostile users, and security disclosures. It includes a universal adapter contract normalizing 14 different support providers, subagents (draft-bundler, onboarding-cartographer, voice-analyst, correlator), and a safety architecture requiring owner approval before any customer-facing communication. The skill treats support as "an evidence pipeline, risk router, owner-approved communications system, product-intelligence engine, and compounding operational memory."

File tree showing 92-item user-support-triage skill structure with references, runbooks, scripts, and subagents totaling 912KB across 85 documents

@thegreatola documented a practical personal agent architecture: Hermes as reasoning layer, OpenRouter as model gateway ($5-15/month), Telegram as interface, custom skills loaded by context, running on a VPS 24/7. Reports replacing separate ChatGPT + Claude + Cursor + Perplexity subscriptions with a single API bill where "the agent picks the right model per task." First-month trading returns of approximately $1K.


6. New and Notable

Flue: First Dedicated Agent Harness Framework

Flue (1,255 bookmarks) is the first framework built around the agent harness concept rather than bolting it onto an SDK. From the Astro/Next.js creator, it positions sessions, subagents, and sandboxes as first-class framework primitives rather than SDK utilities. The "flue build" and "flue run" commands mirror web framework DX. If harness engineering is the discipline, Flue is the first attempt at making it a deployable, standard framework.

Signal strength: [+++]

Google COSMO Reveals Full Agent OS for Android

The leaked COSMO screenshots (52 bookmarks) show Google building a comprehensive agent operating system into Android: hybrid fulfillment (cloud P1 + local Nano), ambient awareness (screen context, audio, interactions, voice match), extensible skills with categories, and an agent architecture that treats the phone as a persistent agent runtime. This is Google's answer to the on-device AI agent question.

Signal strength: [+++]

Microsoft Agent 365 Reaches GA as Enterprise Agent Control Plane

Agent 365 addresses the "boring half" that blocks enterprise agent adoption: per-agent identity, governance, compliance, and shadow AI discovery. The timing confirms enterprise demand is now production-ready, not experimental.

Signal strength: [++]

RecursiveMAS Proposes Latent-Space Agent Coordination

The RecursiveMAS paper (arXiv:2604.25917) from UIUC/Stanford/NVIDIA/MIT introduces agents passing latent state instead of text messages, reducing token usage by 34.6-75.6% while improving accuracy by 8.3%. If agent-to-agent communication is the next bottleneck, latent-space recursion offers a path to scale collaboration without paying a token tax.

Signal strength: [++]

DeepSeek-V4-Pro Matches Frontier Models in Agentic Coding

@omarsar0 reported (33 bookmarks) that DeepSeek-V4-Pro is "the first open-weight model that genuinely feels like a Codex or Claude Code experience" -- working out of the box in a basic harness without special configuration. The hybrid CSA+HCA attention design enables 1M-token context with 10% KV cache overhead.

Signal strength: [+]


7. Where the Opportunities Are

[+++] Agent harness framework that captures the full orchestration surface -- @davidfowl (46 replies), the AHE paper benchmarks, and Flue's launch all confirm: practitioners need a framework, not an SDK, for agent harnesses. The gap between Flue (early, TypeScript-only) and what production teams need (multi-language, model-portable, observable) is wide. The first framework that achieves Cursor-level harness quality in an open, portable package captures the frustrated "building my own" crowd.

[+++] Cross-agent skill distribution standard -- Three independent projects (agent-install at 52 agents, library-skills with auto-updating symlinks, skills.sh ecosystem) validate demand for skill portability. But no standard manifest exists. The first team to ship a spec that Claude Code, Cursor, Codex, Hermes, and others all read natively becomes the package manager for agent skills.

[++] Agent identity and governance outside walled gardens -- Agent 365 solves governance inside Microsoft. @OfficialTravlad and @GroverLovesh confirm the gap: portable agent identity that travels across clouds, indie stacks, and protocols. Google spent $750M on agent identity within GCP. The cross-cloud, protocol-level solution remains open.

[++] Harness observability and debugging tools -- The 12% benchmark swing from a single steering instruction and the AHE ablation results (gains from tools/middleware/memory, not prompts) reveal that practitioners cannot predict or debug harness-model interactions. Purpose-built observability for harness engineering -- showing how configuration choices affect model behavior -- is an unserved need.

[+] Personal agent infrastructure replacing subscriptions -- @thegreatola demonstrated replacing $200+/month in AI subscriptions with a $5-15/month personal agent on a VPS. The pattern (Hermes + OpenRouter + Telegram + custom skills) is reproducible but requires significant setup. Productizing this into one-click personal agent deployment is an emerging opportunity.


8. Takeaways

  1. Flue (1,255 bookmarks, 102K views) ships the first agent harness framework, establishing "framework vs SDK" as the next frontier in agent infrastructure. From the Astro creator, it positions sessions, subagents, and sandboxes as first-class primitives with flue build and flue run CLI -- applying web framework patterns to agent development. (source)

  2. Microsoft Agent 365 reaches GA as the first enterprise control plane for agent identity, security, and governance, confirming that "most agent failures are identity/permission issues, not model issues." Extends existing security workflows to cover all agents including those with their own credentials, shadow AI discovery, and Windows 365 sandboxed environments. (source)

  3. A single harness steering instruction causes a 12% Terminal Bench swing, and the AHE paper proves auto-evolved harnesses outperform hand-designed ones by 5+ points, definitively establishing that harness engineering surpasses prompt engineering. The ablation shows gains come from tools, middleware, and long-term memory -- not system prompts. (source, source)

  4. Google's leaked COSMO reveals a full agent OS for Android: hybrid cloud/local inference, ambient awareness, voice match, extensible skills, and a browser agent -- positioning the phone as a persistent agent runtime. The skill-based architecture with Productivity/Information/Conversation categories mirrors the coding agent skill pattern adapted for mobile. (source)

  5. Cross-agent skill distribution crystallizes as a category with three independent launches (agent-install for 52 agents, library-skills with auto-updating symlinks, ElevenLabs via skills.sh), but no standard manifest exists across the ecosystem. The fragmentation of skill paths (~/.claude/skills vs ~/.agents/skills vs agent-specific locations) is the binding constraint. (source, source)

  6. DeepSeek-V4-Pro is reported as the first open-weight model that works in a basic agent harness "without any special configuration," matching Claude and Codex experience at a fraction of the cost with 1M-token context. The hybrid attention design cutting KV cache to 10% makes the agentic loop "fast and cheap enough to run in practice" -- shifting the open-weight competitive frontier. (source)