Twitter AI Agent - 2026-06-08¶

1. What People Are Talking About¶

1.1 Loop and harness engineering became the working language for reliable coding agents 🡕¶

The loudest June 8 shift was away from generic prompt talk and toward explicit loop, harness, and verification design. Five retained items supported this theme, and they described the same stack from different angles: long planning phases, ADRs, evals, traces, and reusable loop structures.

@KingBootoshi said (112 likes, 11 replies, 6,578 views, 179 bookmarks) that GPT-5.5 and Opus 4.8 made his old subagent-heavy workflow obsolete. He now spends most of his time aligning intent, lets one Codex agent read the raw code plus ADRs, then asks it to create a master PRD, run goal mode, dogfood the app, and rerun E2E tests; replies added that he still keeps Opus or Claude Code for frontend work and reserves Codex for critical backend changes.

@eng_khairallah1 shared (50 likes, 19 replies, 5,859 views, 70 bookmarks) Learn Harness Engineering, describing it as a free masterclass. The repository says the course ships 12 lectures, 6 projects, a harness-creator skill, and 14 language translations, while the attached diagram distills OpenAI, Anthropic, and ThoughtWorks harness ideas into one operator-friendly summary.

Hand-drawn harness engineering diagram comparing OpenAI, Anthropic, and ThoughtWorks approaches and listing shared principles

@adxtyahq added (64 likes, 3 replies, 1,223 views, 69 bookmarks) a resource stack that linked Huyen Chip, OpenAI eval guides, Anthropic's context-engineering article, and Claude cookbook material. The image matters because it shows the community's working loop as a system design: model, tool use, environment, memory, guardrails, and evaluation, rather than another prompt tip.

AI agent loop diagram showing model, tools, environment, memory, guardrails, and evaluation

@IntuitMachine summarized (33 likes, 1 reply, 2,778 views, 33 bookmarks) the new ladder as context engineering, then harness engineering, then intent engineering, then loop engineering, pointing to @steipete's tweet that engineers should be designing loops that prompt agents. @Vtrivedy10 argued (32 likes, 3 replies, 2,751 views, 16 bookmarks) that long-horizon improvement is fundamentally a data-mining problem built on traces, eval creation, and data curation.

Discussion insight: The useful nuance was not “prompt better.” It was “design the loop, keep the agent inside explicit constraints, and mine traces when it drifts.”

Comparison to prior day: June 7 already had a public harness course and a default improvement recipe. June 8 added explicit operator playbooks, loop-engineering jargon, and stronger emphasis on ADRs, traces, and eval-backed iteration.

1.2 Model routing and local swarms became a practical cost and performance strategy 🡕¶

The second cluster treated model choice as a routing problem instead of a loyalty contest. Four retained items supported this theme, and all four focused on how teams actually mix frontier models, cheaper models, local workers, and approval modes in production-like loops.

@levie said (118 likes, 31 replies, 30,808 views, 93 bookmarks) that the market is splitting between frontier intelligence for high-end work and much cheaper models for high-volume workloads. The quoted @brian_armstrong's post made the thesis sharper by estimating that 80 percent of workloads move to models that are 99 percent cheaper, while replies added that routing only works if instructions and task structure also adapt to the model.

@KimiDevs announced (346 likes, 25 replies, 11,343 views, 67 bookmarks) a major Kimi Code upgrade: one-line install, video input, plugin hooks, ACP editor support, and custom workflows. The GitHub README confirms MCP configuration, built-in coder, explore, and plan subagents, and marketplace installs, while one reply supplied the day’s most concrete caution by saying the agent looped on a prompt it could not resolve.

@DeRonin_ said (47 likes, 14 replies, 1,355 views, 14 bookmarks) that Kimi Work changes solo-agency economics because he can route roughly 90 percent of work to Kimi 2.6, keep Opus or GPT-5 for high-stakes architecture and security, and leave cleanup to local models. He was reacting to @Kimi_Moonshot's official announcement of a desktop agent with up to 300 parallel workers, browser use, and a running memory diary.

@reach_vb recommended (53 likes, 13 replies, 1,992 views, 13 bookmarks) switching Codex to “Approve for me” by default for long-running tasks. The public permission-mode docs show why that pattern resonates across coding agents: acceptEdits and other looser modes reduce prompt fatigue without removing protected-path safeguards or escalation points.

Discussion insight: The feed’s cost conversation was no longer “which model wins.” It was “which model plans, which one executes, how long can it run unattended, and where do humans still step in.”

Comparison to prior day: June 7 had cost-optimization recipes and a few planner-worker discussions. June 8 broadened that into a more explicit frontier-versus-cheap routing thesis and shipped local-swarm products.

1.3 Skills and operating layers moved into mainstream surfaces 🡕¶

The third cluster was about packaging context, capability, and control into reusable surfaces instead of building yet another chat shell. Five retained items supported this theme, spanning Slack, deployment marketplaces, workflow templates, declarative runtimes, and IDE-native skills.

@_avichawla argued (16 likes, 5 replies, 1,079 views, 16 bookmarks) that terminal-only coding agents stop at the boundary of the repo and miss the production alerts, PR history, and Slack decisions that actually explain incidents. The public CodeRabbit Agent for Slack docs confirm the product is designed to investigate codebases, generate coding plans, open PRs, and retain a scoped knowledge base across repositories, tickets, monitoring, and docs.

@gmi_cloud launched (59 likes, 27 replies, 50 quotes, 7,527 views) GMI Agent Box as “a complete infrastructure stack for production-ready AI agents.” The public overview and product page show the same packaging move: ready-to-use agents, deploy-and-list flows for builders, compute and model access, and marketplace trust badges.

@tonbistudio shared (24 likes, 1 reply, 1,750 views, 29 bookmarks) the open-source Hermes multi-agent workflow, whose README lays out a concrete path from intake and dedup through parallel research, routing, one human gate, and final fulfillment. @TheTuringPost described (11 likes, 3 replies, 1,064 views, 17 bookmarks) OpenProse as a “logical English” layer for reusable agent programs; the project’s own docs say Markdown contracts declare the desired world state and Reactor reconciles it while leaving durable receipts.

@luka_bernardi said (50 likes, 4 replies, 2,166 views, 15 bookmarks) that Xcode now ships agent skills for SwiftUI and can export them to any agent. Apple’s same-day newsroom post and Paul Hudson’s SwiftUI Agent Skill show the same direction: skills are becoming portable, installable best-practice bundles rather than one-off prompt lore.

Discussion insight: The common denominator was structured context packaging—skills, scopes, contracts, and deployment metadata—not another general agent demo.

Comparison to prior day: June 7’s skills chatter centered on standalone managers and debuggers. June 8 pushed skills into Xcode, Slack, workflow templates, and deployment stacks.

2. What Frustrates People¶

Approval fatigue and manual oversight still cap agent throughput¶

Severity: High. @KingBootoshi said (112 likes, 11 replies, 6,578 views, 179 bookmarks) that the discussion phase is the most crucial part of critical work and that a good run can still take one to two hours, which means misalignment is expensive. @reach_vb recommended (53 likes, 13 replies, 1,992 views, 13 bookmarks) “Approve for me” because approving every tool call breaks flow, and the public permission-mode docs confirm why: looser modes exist specifically to reduce prompt fatigue while keeping protected paths and escalation. People cope by front-loading plans, writing ADRs, and loosening permissions just enough to let long runs continue. This looks worth building for because the demand is not for reckless autonomy; it is for better middle-ground controls.

Context and memory gaps still break long-horizon continuity¶

Severity: High. @_avichawla argued (16 likes, 5 replies, 1,079 views, 16 bookmarks) that a coding agent living only in the terminal cannot see the production alert, the PR that caused it, or the Slack thread where the team decided not to touch retry logic. @Vtrivedy10 said (32 likes, 3 replies, 2,751 views, 16 bookmarks) that continual learning and long-horizon improvement are really a data-mining problem built on traces, eval creation, and data curation, while @Kimi_Moonshot's announcement for Kimi Work sold a “running diary” as a core feature rather than polish. Current workarounds are ADRs, Slack-native knowledge bases, memory browsers, and trace review. This looks worth building for because the feed kept treating continuity as missing infrastructure, not as a solved model feature.

Model routing still relies on hand-built heuristics¶

Severity: Medium. @levie said (118 likes, 31 replies, 30,808 views, 93 bookmarks) that the valuable new layer is routing workloads to the right model family, and replies added that task structure has to adapt to the model too. @DeRonin_ said (47 likes, 14 replies, 1,355 views, 14 bookmarks) he now routes about 90 percent of work to Kimi 2.6, keeps Opus or GPT-5 for high-stakes jobs, and leaves cleanup to local models, while @KingBootoshi said (112 likes, 11 replies, 6,578 views, 179 bookmarks) he still uses Opus or Claude Code for frontend work but Codex for critical backend changes. Even the upbeat Kimi Code launch had a reply saying the agent looped on a prompt. The workaround is a manual frontier-versus-cheap routing playbook, which suggests the economics are real but the control logic is still mostly hand-tuned.

3. What People Wish Existed¶

A durable control plane for planning, memory, and review¶

This was a practical need, not a vague aspiration. @KingBootoshi said (112 likes, 11 replies, 6,578 views, 179 bookmarks) that long runs only work after a detailed discussion phase, ADR alignment, and explicit testing goals, while @_avichawla argued (16 likes, 5 replies, 1,079 views, 16 bookmarks) that agents need access to repo history, incident context, and prior team decisions rather than terminal-only context. @Vtrivedy10 said (32 likes, 3 replies, 2,751 views, 16 bookmarks) traces are the lifeblood of continual learning. People clearly want a layer that holds the plan, remembers what happened, and knows when to escalate to a human. Opportunity: direct.

Portable skills and workflow contracts across agent surfaces¶

This need was both practical and competitive. @luka_bernardi said (50 likes, 4 replies, 2,166 views, 15 bookmarks) that Xcode now ships SwiftUI agent skills and can export them to any agent, and the public Agent Skills overview positions skills as a standardized way to add capabilities across Claude Code, Copilot, Cursor, VS Code, Gemini CLI, and more. @TheTuringPost described (11 likes, 3 replies, 1,064 views, 17 bookmarks) OpenProse as reusable workflow code rather than disposable prompts, and @tonbistudio shared (24 likes, 1 reply, 1,750 views, 29 bookmarks) a Hermes template that already externalizes routing and the human gate. The ask is clear: skills and workflow logic should travel with the work instead of being rebuilt per tool. Opportunity: competitive.

Automatic routing and budgeting across model families¶

This need was urgent and highly operational. @levie said (118 likes, 31 replies, 30,808 views, 93 bookmarks) that routing workloads to the right model family is becoming one of the hard problems in AI agents, while @DeRonin_ said (47 likes, 14 replies, 1,355 views, 14 bookmarks) he is already hand-splitting work across Kimi 2.6, Opus or GPT-5, and local cleanup models. @reach_vb recommended (53 likes, 13 replies, 1,992 views, 13 bookmarks) a middle-ground permission mode because unattended runs need budget and control policies as much as they need model selection. The open request is for a routing layer that chooses model, budget, and approval posture automatically without degrading quality. Opportunity: direct.

4. Tools and Methods in Use¶

Tool	Category	Sentiment	Strengths	Limitations
Harness and loop engineering	Method	(+)	Turns planning, verification, traces, and evals into an explicit operating loop	Needs ADRs, datasets, and more setup before the agent can run well
Codex mega-thread + goal mode	Coding workflow	(+)	Keeps full context in one thread, encourages dogfooding and E2E validation	Long discussion phases and multi-hour runs still require patience
Kimi Code	Coding agent	(+/-)	One-line install, video input, ACP editor support, MCP config, plugins, subagents	Replies still reported looping failures on some prompts
Kimi Work	Desktop swarm runtime	(+/-)	Parallel local workers, browser use, memory diary, strong cost story for volume tasks	Quality and routing gains still depend on manual role splits and oversight
Claude Code permission modes (`acceptEdits`)	Permission control	(+)	Reduces approval fatigue while preserving protected paths and escalation	Still requires explicit mode selection and trust calibration per session
CodeRabbit Agent for Slack	Ops / SDLC agent	(+)	Brings repos, tickets, docs, monitoring, and memory into one scoped agent surface	Requires Slack-centric workflow design and approved integrations
GMI Agent Box	Deployment / marketplace	(+)	Lets teams access, deploy, list, and monetize agents with compute and model infra	Adoption still depends on marketplace trust, badges, and hosting choices
Hermes multi-agent workflow	Orchestration template	(+)	Concrete intake, research, routing, human gate, and delivery pipeline	Template only; users still have to adapt config, auth, and domain logic
OpenProse	Workflow language	(+/-)	Reviewable Markdown contracts, reusable programs, durable receipts	Some practitioners still doubt how durable contracts stay under model drift
SwiftUI / agent skills format	Skill packaging	(+)	Packages reusable best practices that can move between Xcode and other agents	Standards and distribution are still fragmented across surfaces

The overall satisfaction spectrum was pragmatic rather than devotional. People were enthusiastic about tools that package structure—loops, skills, scopes, contracts, or deployment metadata—rather than promising one perfect autonomous agent. The clearest migration pattern was to keep premium models on planning and high-stakes work, push cheaper or local workers onto bounded execution, move missing operational context into Slack or marketplace layers, and loosen permission prompts just enough to keep long tasks moving (@levie said (118 likes, 31 replies, 30,808 views, 93 bookmarks); @DeRonin_ said (47 likes, 14 replies, 1,355 views, 14 bookmarks); @reach_vb recommended (53 likes, 13 replies, 1,992 views, 13 bookmarks); CodeRabbit docs; GMI Agent Box overview). The competitive battle is moving away from best model and toward best control layer.

5. What People Are Building¶

Project	Who built it	What it does	Problem it solves	Stack	Stage	Links
Kimi Code	@KimiDevs	Open-source coding agent with CLI, ACP editor support, video input, plugins, and subagents	Reduces setup friction for multimodal coding and custom workflow automation	TypeScript, CLI/TUI, ACP, MCP, Kimi models	Shipped	repo, site, tweet
Kimi Work	@Kimi_Moonshot	Desktop agent that runs large parallel swarms with browser use and a memory diary	Gives solo operators and small teams a local-friendly way to run broad research and execution loops	Desktop app, browser automation, swarm orchestration, memory system	Shipped	page, tweet
GMI Agent Box	@gmi_cloud	Platform to access, deploy, list, and monetize production-ready agents	Packages compute, model access, hosting, and distribution for agent builders	Docker, 200+ model access, regional compute, marketplace	Shipped	docs, page, tweet
CodeRabbit Agent for Slack	CodeRabbit	Slack-native engineering agent that investigates, plans, and opens PRs with persistent context	Bridges repo work with incidents, tickets, docs, and team memory	Slack, GitHub, Jira, Notion, Datadog, scoped knowledge base	Beta	docs, tweet
Hermes multi-agent workflow	@tonbistudio	Reusable autonomous triage pipeline with routing and one human approval gate	Gives teams a concrete orchestration skeleton instead of ad hoc multi-agent prompts	Python, YAML `triage.yaml`, Hermes Kanban	Shipped	repo, tweet
OpenProse	openprose	Declarative Markdown-contract language for AI sessions, paired with Reactor receipts and reconciliation	Turns fragile prompt workflows into reusable, reviewable agent programs	Markdown `.prose.md`, Reactor runtime, multi-harness support	Alpha	repo, site, tweet
Xcode SwiftUI agent skills	@luka_bernardi	SwiftUI-focused skills that can be exported from Xcode to other agents	Packages Apple-platform best practices into reusable agent guidance	Xcode, SwiftUI, agent skills	Shipped	Apple, tweet

Kimi Code and Kimi Work show the most visible two-surface build pattern of the day. One pushes multimodal coding into the terminal and IDE, while the other pushes broad browser work and memory-backed swarms onto the desktop. Together they show builders trying to split high-context coding from high-volume parallel execution rather than forcing one interface to do both.

GMI Agent Box and CodeRabbit Agent attack a different gap: operational glue. GMI packages compute, models, hosting, and listing into one launch surface, while CodeRabbit packages repo history, tickets, monitoring, and team memory into one Slack-native agent. The attached CodeRabbit diagram is informative because it shows the full thread-to-plan-to-PR flow rather than a generic chatbot screen.

CodeRabbit Slack agent diagram showing investigation, planning, and PR creation from a single Slack thread

Hermes multi-agent workflow, OpenProse, and Xcode SwiftUI skills all externalize expertise into reusable artifacts: YAML triage configs, Markdown contracts, and installable skills. That repeated build pattern matters because it matches the day’s dominant conversation—codify the structure first, then let agents execute inside it.

6. New and Notable¶

Anthropic turned agent education into a free course catalog¶

@ameliahazelai said (70 likes, 43 replies, 3,985 views, 27 bookmarks) that Anthropic had released 13 free courses, and the attached screenshot showed Claude 101, Claude Code in Action, and other Academy entries. Anthropic’s public Learn page confirms new Academy courses focused on AI fluency, API development, MCP, and Claude Code. This matters because education itself became part of the product surface on June 8, not just a side resource.

Screenshot of Anthropic Learn showing free Academy courses including Claude 101 and Claude Code in Action

Apple pushed agent skills into Xcode and out to the wider agent ecosystem¶

@luka_bernardi said (50 likes, 4 replies, 2,166 views, 15 bookmarks) that Xcode now ships SwiftUI agent skills and can export them for use with other agents. Apple’s June 8 newsroom post and the public SwiftUI Agent Skill repo make that claim concrete: best-practice bundles are becoming installable artifacts rather than local prompt recipes.

“Loop engineering” escaped niche jargon and became shared shorthand¶

@gauthampai joked (566 likes, 7 replies, 91,086 views, 24 bookmarks) that LinkedIn would turn harness engineering into “loop engineering.” The term also appeared in @IntuitMachine's framework post (33 likes, 1 reply, 2,778 views, 33 bookmarks) and in @shannholmberg's explainer (12 likes, 4 replies, 258 views, 8 bookmarks) on open versus closed looping. That combination of jokes, diagrams, and operator definitions made the phrase itself a signal.

7. Where the Opportunities Are¶

[+++] Agent control planes for long-running coding work — The strongest evidence came from multiple sections at once: @KingBootoshi said (112 likes, 11 replies, 6,578 views, 179 bookmarks) he needs long discussion phases, ADRs, and dogfooding before trusting long runs; @reach_vb recommended (53 likes, 13 replies, 1,992 views, 13 bookmarks) a middle-ground approval mode; @_avichawla argued (16 likes, 5 replies, 1,079 views, 16 bookmarks) that serious agents need repo, monitoring, and ticket context; and @Vtrivedy10 said (32 likes, 3 replies, 2,751 views, 16 bookmarks) traces are the engine of improvement. That combination points to one product gap: systems that combine planning, memory, routing, approvals, and trace review in one place.

[++] Portable skills and workflow artifacts — @luka_bernardi said (50 likes, 4 replies, 2,166 views, 15 bookmarks) Xcode skills can export to any agent, the public Agent Skills overview frames skills as a cross-agent standard, and both Hermes multi-agent workflow and OpenProse turn agent logic into reusable files. This is moderate because the need is obvious and the artifacts exist, but the ecosystem is still fragmented.

[++] Mixed-model routing and local-swarm operations — @levie said (118 likes, 31 replies, 30,808 views, 93 bookmarks) routing across model families is one of the new hard problems, @DeRonin_ said (47 likes, 14 replies, 1,355 views, 14 bookmarks) he is already manually splitting work across Kimi, frontier models, and local cleanup, and Kimi Code plus Kimi Work show product demand on both the IDE and desktop sides. This is moderate because the savings are visible, but operators still handcraft the policy.

[+] Agent deployment and distribution infrastructure — @gmi_cloud launched (59 likes, 27 replies, 50 quotes, 7,527 views) Agent Box as hosted compute plus marketplace, and @KimiDevs announced (346 likes, 25 replies, 11,343 views, 67 bookmarks) a more distribution-ready Kimi Code with plugins and ACP support. This is emerging because builders clearly want packaging and distribution, but trust, badges, and operational governance still look early.

8. Takeaways¶

Planning and verification are the human jobs people trust most right now. @KingBootoshi said (112 likes, 11 replies, 6,578 views, 179 bookmarks) his highest-leverage work is the discussion phase, not typing code, and Learn Harness Engineering packages that mindset into a public course. (source)
The feed is moving from prompt engineering to loop design. @IntuitMachine summarized (33 likes, 1 reply, 2,778 views, 33 bookmarks) the ladder from context to harness to intent to loop engineering, while @gauthampai joked (566 likes, 7 replies, 91,086 views, 24 bookmarks) that the phrase itself is now meme-ready. (source)
Mixed-model routing is becoming a first-class operating problem. @levie said (118 likes, 31 replies, 30,808 views, 93 bookmarks) the routing layer is where value accrues, and @DeRonin_ said (47 likes, 14 replies, 1,355 views, 14 bookmarks) he is already splitting work across Kimi, frontier models, and local cleanup. (source)
The next agent products are context packages, not just new chat shells. @_avichawla argued (16 likes, 5 replies, 1,079 views, 16 bookmarks) that serious engineering agents need monitoring and ticket context, @gmi_cloud launched (59 likes, 27 replies, 50 quotes, 7,527 views) Agent Box as deploy-and-distribute infrastructure, and @tonbistudio shared (24 likes, 1 reply, 1,750 views, 29 bookmarks) a reusable orchestration template. (source)
Education and skills are becoming part of the platform race. @ameliahazelai said (70 likes, 43 replies, 3,985 views, 27 bookmarks) Anthropic released a large free course catalog, and @luka_bernardi said (50 likes, 4 replies, 2,166 views, 15 bookmarks) Xcode now ships exportable SwiftUI agent skills. (source)