Twitter AI Agent - 2026-04-30¶
1. What People Are Talking About¶
1.1 Hermes Curator Ships Automatic Skill Lifecycle Management π‘¶
The day's most technically significant release came from @Teknium, who introduced Hermes Curator (333 likes, 211 bookmarks, 17,585 views) -- a built-in system that automatically consolidates and prunes agent-created skills. The Curator tracks usage frequency, runs weekly (configurable), and applies a two-phase process: deterministic transitions (skills unused for 30 days become stale; 90 days triggers archival) followed by an LLM review pass that consolidates overlapping skills or converts overly specific ones into references for broader skills. It never auto-deletes, never touches externally installed or pinned skills, and the worst outcome is recoverable archival.

@NousResearch also announced (546 likes, 274 bookmarks) that Hermes Agent can now use pretext for DOM-free text layout, and separately promoted (193 likes, 97 bookmarks) the Hermes Agent Creative Hackathon ending Sunday with skills for Manim, TouchDesigner, and ComfyUI.
Discussion insight: @PaulGugAI captured the practitioner dilemma: "Holy! That looks great. Add it to the list of the 1000 skills that Hermes has that I'm not yet using." @LyraSongstress asked the key question -- "how does it decide what to consolidate vs archive?" -- and @Teknium confirmed: "By comparing against umbrella classes of skills."
Comparison to prior day: April 29's top frustration was that coding agents don't codify feedback for reuse. April 30's Curator is the first shipped answer to the inverse problem: once skills accumulate, how do you prevent bloat? This directly addresses the skill quality gap identified in the SR-Agents paper.
1.2 Cursor Publishes Agent Harness Engineering Methodology π‘¶
@cursor_ai published a detailed blog on how their agent harness makes models faster, smarter, and more token-efficient. @jediahkatz elaborated (17 likes) on the six layers: orchestration, context, routing, transport, state, and execution, arguing that "there's a misconception that first-party harnesses from the labs will always outperform."
@Vtrivedy10 connected (46 likes, 48 bookmarks) the Cursor blog to convergent patterns across the industry: tuning different models with bespoke tools/prompts, using offline+online evals, working backwards from agent goals, and treating the context window as "a sacred boundary where computation happens."
@ghumare64 provided (36 likes, 55 bookmarks) the deepest practitioner synthesis, analyzing Tony Gentilcore's Glean post which frames the harness as "a distributed context management system." His key insight: PTC sandbox, subagents, compaction, and search-first skill discovery are all instances of the same primitive -- "a process that registers a function, exposes it by ID, isolates its execution context, and surfaces only the result back to the caller."
Discussion insight: @yoheinakajima offered (51 likes) a contrarian frame: "developers talk about agent harnesses like it's back-end architecture... but what if it's more like personal/organizational decision making processes and org charts, meant to be reflected upon and adjusted constantly." @psr_ai cautioned (41 likes): "LLMs are non-deterministic by nature. The engineering around LLMs should be given importance rather than over optimizing for context."
Comparison to prior day: April 29 produced the AHE paper formalizing self-improving harnesses. April 30 sees Cursor release its internal harness methodology publicly, making the discipline concrete and reproducible for practitioners.
1.3 Context Engineering Playbook Goes Viral via Karpathy Endorsement π‘¶
@Av1dlive shared (767 likes, 1,917 bookmarks, 120,045 views) a video playbook on context engineering, tool design, orchestrator-subagent patterns, evals, and the harness mindset, framing it as the path to becoming a "100x agentic engineer." The post quotes Karpathy: "10x engineers are normal. Real agentic engineers are 100x." Multiple accounts amplified the same video: @RoundtableSpace reposted (51 likes, 50,017 views), and @DivyanshT91162 created a summary thread.
@tom_doerr shared (24 likes, 35 bookmarks) NeoLabHQ's context-engineering-kit GitHub repo for advanced context engineering in coding agents.
Discussion insight: @Jmoon_174 identified a practical gap: "past 70%, the model starts dropping context silently. You don't know which rules got skimmed." @sandraaasol argued that "context engineering, proper tool design, and strict evals" is the only stack that survives a model swap.
Comparison to prior day: April 29 focused on context as an academic discipline (papers, workshops). April 30 packages it as a watchable playbook, reaching a 10x wider audience via the Karpathy signal boost.
1.4 Editframe Launches Agent-Native Video Format π‘¶
@yudDIDit announced (226 likes, 213 bookmarks, 29,827 views) that Editframe emerges from stealth as an agent-native video format: HTML/CSS to MP4, built for coding agents. The stack is framework-agnostic (HTML + CSS, works with React), uses true browser rendering (DOM + Canvas), offers cloud streaming previews and API rendering, and provides Lego-style bricks for custom editors. Install via npm create @editframe@latest and prompt Claude Code, Cursor, or Codex to get working video or interactive GUIs.
Discussion insight: The positioning is notable -- not a video editor with AI features, but a video format designed from scratch for agent consumption and production. This addresses a gap: agents can generate code, images, and text, but video has remained manual.
Comparison to prior day: No prior coverage. This is a new entrant targeting the "agents need media" gap.
1.5 Claude Code Hackathon Winners Demonstrate Multi-Agent Architecture Patterns π‘¶
@ClaudeDevs announced (295 likes, 115 bookmarks) results from the "Built with Opus 4.7" hackathon with 500 participants worldwide. Winners demonstrated distinct agent architecture patterns:
- MedKit (1st place): A Managed Agent plays the patient, observes, and grades the trainee, with prompts doubling as citation allowlists
- Wrench Board (2nd place): A Managed Agent with 4-layer memory and ~36 tools that re-orients across sessions by reading its own notebook
- Maieutic (3rd place): Students write the spec before code, then explain the diff, surfacing whether they understand their own changes

Discussion insight: The winning patterns all emphasize observation and verification over raw generation -- consistent with the verification-first thesis from Fowler's endorsement on April 29.
1.6 Cline Rewrites From Ground Up With SDK and Plugin Architecture π‘¶
@cline announced (51 likes) a complete rewrite: "We spent the last two months rewriting Cline from the ground up." The original architecture was tightly coupled to IDE semantics. The new version builds an SDK with plugin architecture for providers, models, LSPs, code search, and themes, then rebuilds both CLI and extension on top. Beta offers $20 in credits plus a bounty program for contributors.
Discussion insight: @Aqib__786Ai noted: "Big rebuilds like this are where agents usually either become 'demo tools' or real platforms -- plugin architecture + decoupling from IDE semantics is the right direction." The rewrite mirrors Cursor's own trajectory of separating agent runtime from editor UI.
1.7 Codex Becomes a General Work Surface Beyond Coding π‘¶
@aakashgupta analyzed (10 likes, 9 bookmarks) OpenAI's Codex update which added a role picker spanning Engineering, Product, Finance, Marketing, Sales, Operations, Design, Data Science, and Student. His thesis: "The coding tool just became the work tool. This is OpenAI building a harness layer on top of their own model... The harness is becoming the actual product."
@MindTheGapMTG pushed back: "Role picker is a UI for what we do with markdown files. Each agent gets a constraint file scoping it to one domain. Difference: our files encode 500 lines of production failures as rules. A dropdown can't capture 'never touch billing on Thursdays.'"
Comparison to prior day: April 29's Cursor SDK moved agents from IDE to infrastructure. April 30's Codex moves from coding to all knowledge work, confirming the convergent thesis that agents are becoming the interface layer.
2. What Frustrates People¶
AI Voice Agents Replacing Support Teams Without Disclosure -- Severity: High¶
@AbhinavXJ reported (98 likes, 4,302 views) that IndiaMart replaced their entire customer support with AI agents, costing thousands of jobs. The core frustration: "as a customer I would never want to talk with an AI if I have an issue, I need a human to answer me. AI can never be held accountable." @HrideshMg shared a firsthand experience: "got called by a hindi speaking woman from shiprocket... it took me until mid-conversation to realize I was talking to a fucking AI. There were absolutely no disclaimers." @AbuKhadeejah offered the builder perspective: "I'm building AI agent calling solution for a gym client here in mumbai and he's saving 40k per month on outbound calls."
TypeScript Agent Framework Fatigue -- Severity: Medium¶
@samuelcolvin asked (21 likes, 26 replies): "What's the best (least bad, most trendy) Agent Framework right now? Vercel AI, Mastra, Langpain-js?" Responses revealed frustration: @MindTheGapMTG runs "12 production agents: no framework. Each is a Claude Code session with a CLAUDE.md constraint file." @foundanand: "AI SDK by Vercel sucks. Breaks so much... so many updates it's crazy." @Shoeboom: "tanstack/ai has niceties but it's in alpha. Mastra if your usecase fits; otherwise it's too opinionated." No consensus emerged.
Skill Quantity Outpaces Skill Quality -- Severity: Medium¶
@aiedge_ promoted (40 likes, 62 bookmarks) SkillsMP claiming "over a MILLION agent skills." Immediate pushback: @rugbist_: "a million skills but how many of them are actually useful gotta dig through a lot of junk to find the gold." @coralflavorcom: "a million skills and none of them will think for themselves like an unfiltered LLM can." The quantity-vs-quality tension mirrors April 29's SR-Agents finding about indiscriminate skill loading.
3. What People Wish Existed¶
Automatic Skill Curation That Scales Beyond One Agent¶
The Hermes Curator addresses single-agent skill bloat, but the broader problem remains: across marketplaces with 1M+ skills, there is no quality scoring, no relevance gating, and no cross-agent curation standard. @rugbist_ and @nonStopEon both asked how skill quality is maintained as marketplaces grow. The Curator's heuristics (usage frequency + LLM review) work locally but have no cross-ecosystem equivalent.
Urgency: High -- Opportunity: infrastructure
Agent Framework That Survives Model Swaps¶
@samuelcolvin's poll and @sandraaasol's post confirm: practitioners want an agent framework where "context engineering, proper tool design, and strict evals" form the portable layer, and model choice becomes a config swap. Current options (Vercel AI SDK, Mastra, LangGraph) all couple too tightly to specific patterns or break too often.
Urgency: High -- Opportunity: direct
Personalized Skill Evolution From Agent Sessions¶
@HenryYe19352122 demonstrated VibeLens: turns agent sessions into personalized productivity tips, recommends and creates tailored skills, evolves them as habits change. This directly targets April 29's feedback-to-skill gap but is pre-traction (9 likes). The need is confirmed; the solution space is open.
Urgency: Medium -- Opportunity: product
4. Tools and Methods in Use¶
| Tool | Category | Sentiment | Strengths | Limitations |
|---|---|---|---|---|
| Hermes Agent + Curator | Coding/general agent | Positive | Skill lifecycle management; pretext support; creative hackathon ecosystem; tips for cost reduction | Skill count overwhelming for new users |
| Cursor SDK + Harness | Agent runtime | Positive | Published harness methodology; six-layer architecture; model-specific tuning | New; no community harness contributions yet |
| Claude Code | Coding agent | Positive | Hackathon ecosystem (500 participants); Claude Security public beta; skills growing | Enterprise-only for security features |
| Cline (rewrite) | Coding agent | Cautious | SDK-first; plugin architecture; decoupled from IDE | Beta; breaking changes expected |
| Codex | General work agent | Mixed | Role picker; general work surface beyond coding | "A dropdown can't capture production rules" |
| LiveKit | Voice agent infra | Positive | Structured data collection; Tasks/TaskGroups SDK; JSON output | Voice-domain specific |
| Editframe | Agent video format | Positive | HTML/CSS to MP4; framework agnostic; browser rendering | Brand new; no ecosystem yet |
| ElevenLabs (Stripe) | Voice/TTS | Positive | One-line Stripe Projects integration | TTS only |
| SkillsMP | Skills marketplace | Mixed | 1M+ skills listed; multi-agent support | Quality curation absent |
| context-engineering-kit | Dev toolkit | Positive | Open-source; advanced patterns | Early stage |
5. What People Are Building¶
| Project | Who | What it does | Problem it solves | Stack | Stage | Links |
|---|---|---|---|---|---|---|
| Hermes Curator | @Teknium | Automatic skill consolidation and pruning via usage analytics + LLM review | Skill bloat from self-improvement loops | Hermes Agent, config.yaml | Shipped | post |
| Editframe | @yudDIDit | Agent-native video format: HTML/CSS to MP4 | Agents cannot produce/edit video | HTML, CSS, React, DOM, Canvas | Shipped | post |
| SpatialMemory2 | @stash_pomichter | Multimodal latent-space memory for robot agents | Video/lidar/odometry too large for context | Latent embeddings, spatial search | Shipped | post |
| Cline SDK Rewrite | @cline | Plugin-based agent SDK decoupled from IDE | Original was tightly coupled to VS Code semantics | TypeScript, plugin arch | Beta | post |
| OCR-Memory | @dair_ai | Visual-modality memory for long-horizon agents | Summarization loses procedural detail | Image rendering, locate-and-transcribe | Research | post |
| TradingAgents | @quantscience_ | Multi-agent LLM trading framework mirroring hedge fund dynamics | Single-model trading underperforms | Python, multi-agent, fundamental/sentiment/technical analysts | Shipped | post |
| Kumo Coding Agent Skills | @jure | Skills that turn coding agents into predictive model experts | Agents lack domain knowledge for ML pipelines | Kumo SDK, Claude Code, Codex | Shipped | post |
| Sandcastle | @RoundtableSpace | Local coding agent orchestration library in TypeScript | Multiple agents stepping on each other | TypeScript, multi-agent | Shipped | post |
| tmux-IDE 2.0 | @ThijsVerreck | Turn any project into autonomous multi-agent IDE via YAML | Multi-agent terminal setup complexity | npm, tmux, YAML config | Shipped | post |
| VibeLens | @HenryYe19352122 | Personalized agent skill recommendations from session patterns | Agents repeat mistakes / forget preferences | Cross-agent session analysis | Alpha | post |
| Ramp Inspect | @zachbruggeman | In-house coding agent now writing ~70% of merged PRs | Manual code review bottleneck | Internal agent platform | Shipped | post |
| GBrain | @garrytan | Retrieval and memory layer for Hermes/OpenClaw with 75K+ markdown files | Agent memory at scale | MIT, GitHub | Shipped | post |
6. New and Notable¶
Hermes Curator Introduces Automatic Skill Lifecycle Management¶
The Curator (211 bookmarks) is the first production answer to skill accumulation -- the problem where self-improving agents create dozens of narrow near-duplicates that pollute context. The two-phase approach (deterministic staleness transitions + LLM-driven consolidation) creates a maintainable equilibrium between skill creation and skill pruning without requiring human intervention.
Signal strength: [+++]
Cursor Publishes Internal Harness Engineering Methodology¶
The blog post makes Cursor's harness approach reproducible: model-specific tool/prompt tuning, offline+online evals, dogfooding, and treating harness changes as measurable experiments. Combined with yesterday's SDK release, Cursor is now the most transparent agent infrastructure provider in terms of methodology disclosure.
Signal strength: [+++]
OCR-Memory Paper Proposes Visual Modality for Agent Memory¶
The OCR-Memory paper (arXiv:2604.26622) renders agent trajectories as images with visual anchors, then retrieves via locate-and-transcribe. SOTA on Mind2Web and AppWorld under strict context limits. The approach eliminates summarization-induced information loss while keeping token costs flat via thumbnail compression of older memories.
Signal strength: [++]
Ramp's Inspect Agent Now Writes 70% of Merged PRs¶
@zachbruggeman reported that Ramp's internal coding agent grew from 30% to ~70% of all merged PRs, extending beyond engineering teams. This is the strongest production-scale evidence of agent code contribution rates at a major fintech company.
Signal strength: [++]
Claude Security Enters Public Beta for Enterprise¶
@claudeai (231 likes) announced Claude Security in public beta for Enterprise customers -- scheduled scans, directory-level targeting, CSV/Markdown exports, webhook notifications, and persistent dismissals. Hundreds of organizations have used it since February's research preview, "catching issues existing scanners had missed."
Signal strength: [+]
Sakana AI and SMBC Deploy Multi-Agent System for Corporate Banking¶
@hardmaru announced (41 likes) a multi-agent system built with SMBC (one of Japan's largest banks) for corporate strategy proposals, reducing a one-to-two week workflow to hours. This is enterprise multi-agent deployment at institutional banking scale.
Signal strength: [+]
7. Where the Opportunities Are¶
[+++] Skill quality infrastructure at marketplace scale -- With SkillsMP claiming 1M+ skills and the Hermes ecosystem growing fast, the gap between skill availability and skill quality is widening. Hermes Curator solves the single-agent case. The marketplace case -- quality scoring, relevance gating, compatibility verification, and cross-agent curation -- remains entirely open. The first team to build "search quality" for skills captures the middleware layer of the agentic economy.
[+++] Model-portable agent framework -- @samuelcolvin's survey and the practitioner responses confirm: no TypeScript framework satisfies production teams. The gap is a framework where model choice is configuration, not architecture. Cursor's published six-layer model (orchestration, context, routing, transport, state, execution) provides the blueprint. The first open-source implementation that achieves model portability without sacrificing reliability captures the frustrated "no framework" crowd running 12 agents on CLAUDE.md files.
[++] Agent-native media production -- Editframe proves the category exists: video formats designed for agent authoring. The same logic applies to audio, interactive content, and presentation formats. Current creative tools assume human operators; agent-native creative formats that compose with existing skills pipelines represent a greenfield market.
[++] Cross-agent session intelligence -- VibeLens demonstrates the concept: analyze patterns across agent sessions to recommend skills, surface repeated mistakes, and evolve preferences. As practitioners run multiple agents (Hermes, Claude Code, Cursor, Codex), the meta-layer that observes all sessions and synthesizes actionable patterns becomes valuable infrastructure.
[+] Voice agent accountability and disclosure standards -- IndiaMart's silent replacement of human support and Shiprocket's undisclosed AI calls surface a regulatory and trust gap. The opportunity is disclosure/compliance infrastructure for voice agents: verification watermarks, real-time "you're speaking to AI" disclosures, and accountability logs that satisfy emerging regulation.
8. Takeaways¶
-
Hermes Curator (211 bookmarks) ships the first production solution to agent skill bloat: automatic usage tracking, staleness transitions, and LLM-driven consolidation that prevents self-improvement loops from polluting context. This directly answers the feedback-to-skill pipeline gap identified on April 29 -- not by improving skill creation, but by making skill maintenance automatic. (source)
-
Cursor publicly documents its six-layer agent harness methodology (orchestration, context, routing, transport, state, execution), making harness engineering reproducible beyond their own product. Combined with yesterday's SDK and the AHE paper, harness engineering has moved from tribal knowledge to published discipline in 48 hours. (source, source)
-
The context engineering playbook video reached 120K views and 1,917 bookmarks via Karpathy endorsement, establishing context engineering + tool design + orchestrator-subagent + evals as the canonical "100x agentic engineer" stack. The practitioner consensus is hardening: this stack survives model swaps; framework choices do not. (source)
-
Ramp's Inspect coding agent grew from 30% to 70% of all merged PRs, the strongest production evidence that agent-written code can become the majority of a company's output. This shifts the question from "can agents write production code?" to "what does engineering look like when agents write most of it?" (source)
-
The skills ecosystem enters its quality crisis: SkillsMP claims 1M+ skills while practitioners report "a lot of junk to dig through," and the Hermes ecosystem's own Curator exists because self-improvement loops create unsustainable accumulation. Quality infrastructure -- not quantity -- is now the binding constraint on skills adoption. (source, source)
-
Codex, Cline, and Cursor all ship major platform moves on the same day: Codex expands beyond coding to general work, Cline rewrites with an SDK-first plugin architecture, and Cursor publishes its harness internals. The coding agent market is simultaneously broadening (Codex to all roles), deepening (Cursor harness transparency), and restructuring (Cline modular rewrite). (source, source)