Twitter AI Coding - 2026-05-15¶

1. What People Are Talking About¶

1.1 Antigravity is still culturally important, but operational trust is weak 🡕¶

Antigravity remained one of the most discussed AI coding products on May 15, but the conversation was dominated by contradictory evidence: screenshots showed a live multi-model picker and Google scheduled an I/O session around it, while users treated the absence of recent visible updates as a sign the product might be fading. Three high-signal items supported the theme, and the replies pushed beyond vibes into specific complaints about limits, model fit, and communication.

@jahirsheikh8 asked (296 likes, 90 replies, 165,073 views, 28 bookmarks) whether Google was about to shut Antigravity down. The attached screenshot matters because it shows the product still exposing Gemini 3.1 Pro (High/Low), Gemini 3 Flash, Claude Sonnet 4.6, Claude Opus 4.6, and GPT-OSS 120B rather than a dead or disabled interface.

Antigravity model picker showing Gemini 3.1 Pro and Flash, Claude Sonnet 4.6, Claude Opus 4.6, and GPT-OSS 120B in a live menu

@HarshithLucky3 posted (227 likes, 17 replies, 12,410 views, 14 bookmarks) a sharper version of the same concern: the visible changelog was still at version 1.23.2 from Apr 16, 2026 and only listed bug fixes for MCP loading and workspace-specific settings. In the same tweet, they explicitly asked the Google AI Studio team to release a dedicated desktop and mobile app “like codex,” turning a vague complaint into a concrete product request.

Antigravity changelog showing version 1.23.2 dated Apr 16, 2026 with only MCP-loading and workspace-settings bug fixes

@FlutterDev promoted (38 likes, 1 reply, 1,604 views, 9 bookmarks) a Google I/O session whose landing page says “Vibe coding is an important skill for developers in 2026” and promises to show how Google teams use agentic tools like Google Antigravity. That makes the day’s dominant Antigravity question less about whether the product exists and more about whether Google is communicating its roadmap clearly enough.

Discussion insight: Replies to the shutdown thread added the most actionable evidence. One respondent said the warning signs started when Google “nerfed the limits” and pushed users onto Gemini 3.1 Pro, while another said Gemini 3.1 was “decent” but not strong enough for heavy coding and that they might “go full local with Qwen.”

Comparison to prior day: On May 14, the main contradiction was between ambitious demos and a stale changelog. On May 15 that contradiction hardened: the stale changelog was still the screenshot people used as evidence, but Google itself was simultaneously advertising an Antigravity session for I/O.

1.2 GitHub's answer to AI coding is getting more structured and more expensive 🡕¶

GitHub-adjacent discussion shifted away from simple “who has the best model” comparisons and toward workflow structure, internal consolidation, and billing exposure. Four items supported the theme: a widely shared report that Microsoft was moving internal developers toward Copilot CLI, screenshots showing what token billing could look like in practice, GitHub's own spec-driven workflow repo, and a user report that Copilot App quality differed sharply between Windows and macOS.

@Cointelegraph reported (137 likes, 40 replies, 16,070 views, 12 bookmarks) that Microsoft was cancelling most internal Claude Code licenses and shifting thousands of developers to GitHub Copilot CLI, citing The Verge. The replies complicated the claim: one user said Claude still worked better for many .NET scenarios, while another pointed out that GitHub Copilot can also use Claude, reframing the issue as harness and platform preference rather than a pure model switch.

@edzitron argued (52 likes, 2 replies, 3,758 views, 5 bookmarks) that Copilot's June 1 move to token-based billing exposes the economics that many AI products normally hide. The attached screenshots are the key evidence: one explains that most AI subscriptions do not charge end users on actual token burn, while the other lists sample Copilot token totals ranging from $23.81 to $5,851.77 for users paying about $10 to $54 per month.

Screenshots from Ed Zitron's thread showing Copilot's planned move to token-based billing and example monthly token burns ranging from tens to thousands of dollars

@techNmak highlighted (4 likes, 2 replies, 209 views, 8 bookmarks) GitHub's Spec Kit, and the linked repository describes an open source toolkit for “Spec-Driven Development” where specifications become executable through commands such as /speckit.specify, /speckit.plan, /speckit.tasks, and /speckit.implement. The README screenshot matters because it shows GitHub packaging an explicit alternative to open-ended vibe prompting.

Spec Kit README header describing an open source toolkit for executable specs rather than ad hoc vibe coding

@ibuildthecloud said (15 likes, 5 replies, 1,367 views, 2 bookmarks) that the Copilot app looked bad on Windows compared with a macOS influencer demo, and a reply added that Copilot's recent releases had been strong but that new rates could make users much more token-sensitive. That is a smaller post, but it captures the day's broader tension: GitHub is shipping more workflow surface area while users become more sensitive to both platform quality and cost visibility.

Discussion insight: The replies to the Microsoft-shift and Copilot-app posts kept returning to the same point: model access alone is not enough. Users care about the harness around the model, whether costs stay legible, and whether the experience is equally good on the platform they actually use.

Comparison to prior day: May 14 was still mostly about whether GitHub could restore category credibility. By May 15, the conversation had become more operational: internal migration claims, concrete token-burn screenshots, and a GitHub-backed repo for spec-driven workflows.

1.3 Coding agents are becoming remote-control systems, but the control plane is still rough 🡕¶

A third theme was the rapid spread of “remote supervision” patterns: people steering coding agents from phones, converging coding products into ChatGPT, and extending computer-use features across multiple agent stacks. Three items supported the theme, and the strongest discussion was about whether the new surfaces were reliable enough for everyday work.

@mark_k announced (45 likes, 8 replies, 1,430 views, 1 bookmark) that Codex Mobile lets people watch active work, review diffs, approve commands, switch models, and steer Codex from a phone while sessions continue on a real development machine. The screenshot shows a concrete control surface rather than a generic launch card.

Codex mobile screen showing projects and connected development machines inside the ChatGPT app

@haider1 replied to the same launch (13 likes, 3 replies, 931 views) by calling it a rushed release: chats loaded slowly, prompts failed with SwiftcanCellationError, and the app sometimes reported the paired computer as offline. A reply supplied a workaround instead of a defense: Codex CLI in tmux plus SSH from Moshi on iPhone was described as more reliable because it avoids pairing and survives network changes better.

@mark_k also reported (31 likes, 3 replies, 1,183 views, 4 bookmarks) that OpenAI had reorganized around bringing ChatGPT and Codex closer together, with Greg Brockman taking over all products and Codex head Thibault Sottiaux moving to core product and platform. In parallel, @orca_build said (12 likes, 1 reply, 756 views, 2 bookmarks) its Computer Use feature now works across macOS, Windows, and Linux for Pi, Codex, OpenCode, Claude Code, and more, suggesting that control surfaces are spreading beyond a single vendor's stack.

Discussion insight: The mobile posts were positive about the idea and negative about the polish. Users liked the ability to review diffs and approve commands away from the desk, but the most useful response of the day was still an SSH-based workaround rather than praise for the native preview.

Comparison to prior day: On May 14, distribution across mobile and plugins was the story. On May 15, the discussion moved from “this exists” to “this is useful, but here is how you actually keep it working.”

2. What Frustrates People¶

Antigravity's roadmap is opaque enough to generate its own rumor cycle¶

The strongest frustration on the day was uncertainty about whether Antigravity is still being actively developed. @jahirsheikh8 asked (296 likes, 90 replies, 165,073 views, 28 bookmarks) if Google was shutting it down, and @HarshithLucky3 followed (227 likes, 17 replies, 12,410 views, 14 bookmarks) with a changelog screenshot still showing Apr 16, 2026 as the latest visible update. The replies made the pain more specific: one user said limits had been cut and people were being pushed onto Gemini 3.1 Pro, while another said Gemini 3.1 was not tuned well enough for heavy coding and considered switching to local Qwen. Severity: High. Worth building for: yes, because the demand signal is partly for product capability and partly for roadmap trust.

Usage-based AI coding economics are becoming harder to ignore¶

GitHub billing was the clearest example. @edzitron posted (52 likes, 2 replies, 3,758 views, 5 bookmarks) that Copilot would move to token-based billing on June 1, and the screenshots in the thread listed sample token burns as high as $5,851.77 per month for a user paying around $39 per month. The same cost sensitivity appears in a smaller but concrete Claude Code example: @dsiroker showed (1 like, 2 replies, 251 views, 3 bookmarks) that Opus 4.7 fast mode was billed at $30 input and $150 output per Mtok, and a reply immediately shifted to operational risk by asking how five simultaneous teammate agents avoid editing the same files. Severity: High. Worth building for: yes, especially for budget controls, observability, and safer parallel execution.

Remote-control coding from mobile is useful, but the preview quality is still shaky¶

The Codex mobile launch attracted both demand and bug reports. @mark_k described (45 likes, 8 replies, 1,430 views, 1 bookmark) a phone workflow for watching active work, reviewing diffs, approving commands, and switching models, and one reply called that “actually useful, not just a demo feature.” But @haider1 reported (13 likes, 3 replies, 931 views) that the preview was unstable: slow chats, SwiftcanCellationError, and intermittent “computer is offline” failures. The best workaround in the thread was not another app feature; it was Codex CLI in tmux plus SSH from Moshi on iPhone. Severity: Medium-High. Worth building for: yes.

Agent UX quality still varies by platform¶

A smaller but telling complaint came from @ibuildthecloud, who said (15 likes, 5 replies, 1,367 views, 2 bookmarks) the Copilot app looked bad on Windows compared with a macOS demo. A reply on the same post said Copilot's recent releases had been strong but warned that new pricing would make users more token-sensitive. That pairing matters because it shows two kinds of friction arriving together: interface inconsistency and cost awareness. Severity: Medium. Worth building for: yes, if a product can offer consistent cross-platform behavior without hiding cost tradeoffs.

3. What People Wish Existed¶

A dedicated Google AI Studio desktop and mobile app¶

This need was stated directly. @HarshithLucky3 asked (227 likes, 17 replies, 12,410 views, 14 bookmarks) Google AI Studio to ship a dedicated desktop and mobile app “like codex,” tying the request to dissatisfaction with Antigravity's visible update cadence. It is a practical need rather than an aspirational one: the user is not asking for a new category, just a clearer and more reliable Google-owned surface for coding workflows. Opportunity: direct.

Reliable phone-based supervision for coding agents¶

The Codex mobile posts show that people want to manage coding sessions away from the desk, but also that today's implementations are not yet dependable. @mark_k framed (45 likes, 8 replies, 1,430 views, 1 bookmark) mobile review and approval as immediately useful, while @haider1 documented (13 likes, 3 replies, 931 views) failed prompts and offline pairing in the preview. The reply recommending tmux plus SSH from iPhone shows that people will assemble this capability themselves if native products do not make it reliable. Opportunity: direct.

Clear cost controls for heavy agent use¶

The GitHub billing screenshots and Claude Code fast-mode pricing example point to the same practical wish: users want long-running agent workflows without discovering the cost after the fact. @edzitron showed (52 likes, 2 replies, 3,758 views, 5 bookmarks) sample Copilot token totals in the hundreds to thousands of dollars, and @tom_doerr linked (6 likes, 599 views, 11 bookmarks) a local dashboard for visualizing Claude Code sessions, timelines, and costs. The need is practical and urgent, with partial answers appearing from independent builders before platform vendors have fully normalized the controls. Opportunity: direct.

Better coordination for multi-agent parallel work¶

@dsiroker showed (1 like, 2 replies, 251 views, 3 bookmarks) Claude Code fast mode running with “5 simultaneous teammate agents,” and the first reply immediately asked the missing systems question: what happens when they edit the same files, do they get separate branches, or is merging manual? That is a practical workflow need, not hype language, and there was no strong evidence in the sample that the current tools solve it cleanly. Opportunity: emerging.

4. Tools and Methods in Use¶

Tool	Category	Sentiment	Strengths	Limitations
Google Antigravity	Agent-first IDE	(+/-)	Live multi-model picker; still important enough for a dedicated Google I/O session; positioned as a free or low-friction alternative in comparison posts	Visible update cadence is unclear; replies complain about reduced limits and Gemini 3.1 being weaker for heavy coding
GitHub Copilot CLI	CLI coding assistant	(+/-)	Internal-Microsoft migration chatter gave it credibility as a first-party harness; at least one user said it solved bugs Claude Code struggled with	Token-based billing is becoming central to the product conversation; replies note that model access alone does not settle workflow preference
GitHub Copilot App	Agent workspace	(+/-)	Technical-preview framing centers on a full development lifecycle in one app	One tester reported the Windows experience looked much worse than macOS, and replies tied adoption to rate sensitivity
Spec Kit	Spec-driven workflow toolkit	(+)	Turns specs into executable workflows with slash commands, task generation, and issue integration across 30+ agent integrations	No strong complaints surfaced in today's sample, but the workflow is explicitly a reaction against ad hoc prompting rather than a drop-in replacement for it
Codex Mobile	Mobile coding-agent companion	(+/-)	Lets users review diffs, approve commands, switch models, and supervise running sessions from a phone	Preview reports mention slow chats, `SwiftcanCellationError`, and paired machines going offline
Claude Code	CLI coding agent	(+/-)	Still treated as a strong coding baseline; supports fast mode and multi-agent teammate workflows	Fast mode carries premium token pricing, and replies question how parallel agents avoid file conflicts
Orca Computer Use	Computer-use layer	(+)	Extends desktop-app control across macOS, Windows, Linux, and multiple agent stacks including Codex and Claude Code	No strong user complaints surfaced in the sampled discussion; evidence today was launch-oriented rather than field reports
goal	MCP / workflow control	(+)	Adds persistent objectives, auto-continuation, turn budgets, and completion audits to agent sessions	Full auto-continuation is Claude-only; Cursor and OpenCode rely on MCP tools without hook-based continuation
clibib	Citation utility / agent skill	(+)	Fetches BibTeX from DOI, PMID, arXiv, URL, ISBN, or title search and works across multiple agents	README warns that title-based search is less reliable than DOI or URL input
Claude Code Karma	Local observability dashboard	(+)	Local-first dashboard for sessions, timelines, costs, tools, agents, and analytics using FastAPI, SQLite, and SvelteKit	README warns that Claude Code only keeps about 30 days of local session data, so older history disappears

Overall satisfaction was fragmented by workflow layer rather than by one universal winner. Antigravity still drew attention because of product ambition, but trust was weak; GitHub's tools drew interest because of structure and first-party integration, but pricing became harder to ignore; and OpenAI's mobile/ChatGPT convergence pushed supervision beyond the desktop, even as users reported bugs. The clearest workarounds were local Qwen as an escape hatch from hosted limits, tmux plus SSH from iPhone instead of the Codex mobile preview, and third-party control-plane tools like goal and Claude Code Karma to add visibility or persistence that the base tools do not yet provide.

5. What People Are Building¶

Project	Who built it	What it does	Problem it solves	Stack	Stage	Links
Spec Kit	GitHub	Open source toolkit for spec-driven development with executable specifications and slash-command workflows	Replaces ad hoc vibe prompting with a structured path from requirements to implementation	`specify-cli`, slash commands, GitHub Issues integration, 30+ agent integrations	Shipped	tweet, repo
Claude Code Karma	Jayant Devkar	Local-first dashboard for Claude Code sessions, timelines, costs, tools, agents, and analytics	Gives developers observability into agent usage and spend without sending data to a third party	FastAPI, SQLite, SvelteKit	Shipped	tweet, repo
goal	@secemp9	Standalone MCP and slash-command system that ports Codex-style persistent goals to other agents	Keeps long-running agent work moving with budgets, status, and completion checks	MCP server, Claude hooks, Cursor/OpenCode integrations via `uvx`	Shipped	tweet, repo
clibib	@deliprao	Python CLI and agent skill for fetching BibTeX from DOI, PMID, arXiv, URL, ISBN, or title search	Removes citation lookup friction inside research-heavy coding or writing workflows	Python CLI, agent skill, CrossRef/Zotero-backed lookups	Shipped	tweet, repo
Orca Computer Use	@orca_build	Cross-platform computer-use feature for agents controlling desktop apps and emulators	Lets developers automate UI testing and real-app workflows across multiple agent stacks	macOS, Windows, Linux computer-use layer for Pi, Codex, OpenCode, Claude Code, and more	Shipped	tweet

Spec Kit was the most consequential build signal because it came from GitHub and explicitly positioned structure as the answer to unpredictable vibe coding. Claude Code Karma, goal, and clibib point to the same second-order pattern: builders are no longer only making coding agents, they are building the surrounding control plane for costs, persistence, and research utilities. Orca extends that pattern into execution by turning “computer use” into a cross-agent capability rather than a single-vendor feature. The repeated trigger is not lack of models; it is lack of workflow scaffolding around them.

6. New and Notable¶

GitHub turned anti-vibe-coding sentiment into an official workflow repo¶

The Spec Kit post from @techNmak and the linked repository mattered because GitHub was not just defending Copilot with another model or UI announcement. It was publishing an open-source workflow that says specifications should become executable, with commands for constitution, spec creation, planning, tasking, and implementation. That is a notable shift from “prompt better” toward “formalize the work before the agent runs.”

Copilot billing debate moved from theory to screenshots¶

@edzitron posted (52 likes, 2 replies, 3,758 views, 5 bookmarks) screenshots of Copilot's upcoming token-based billing transition alongside example monthly token totals. The notable part was not just that billing changes are coming; it was that the screenshots made hidden consumption legible enough to become a discussion artifact on their own.

OpenAI is folding coding workflows closer to ChatGPT¶

The strongest public evidence was the combination of @mark_k describing Codex Mobile as a phone-based control surface and @mark_k describing the internal OpenAI reorg that pulls ChatGPT and Codex closer together. Together they point to one product direction: coding is becoming another mode inside the broader ChatGPT surface, not a separate desktop-only workflow.

7. Where the Opportunities Are¶

[+++] Agent control planes for cost, state, and completion — Evidence spans Copilot token-billing screenshots, Claude Code fast-mode pricing, Claude Code Karma's local analytics dashboard, and goal's persistent objective system. The strongest opening is not another base model, but software that makes long-running agent work legible, budgeted, resumable, and auditable.

[++] Reliable cross-surface supervision — Codex Mobile, the request for a Google desktop/mobile coding app, and Orca's cross-platform computer use all point to the same need: developers want to start, monitor, and approve coding work across laptop, phone, and desktop apps without losing state. The opportunity is moderate because the demand is clear, but today's implementations still look preview-grade.

[+] Multi-agent coordination and merge safety — The five-teammate Claude Code post and immediate reply about file conflicts show a smaller but real gap around branch isolation, conflict avoidance, and wrap-up when many agents work at once. The signal is emerging rather than dominant, but it is tightly coupled to how agent workflows are actually scaling.

8. Takeaways¶

Antigravity's problem is no longer only product capability; it is product trust. Users shared a live model picker and a Google I/O session page on the same day they treated a month-old changelog screenshot as evidence of neglect. (source, source, source)
GitHub's AI coding story is getting more process-heavy and more cost-visible at the same time. The feed combined a report of Microsoft shifting internal developers toward Copilot CLI, screenshots of Copilot token burn, and a GitHub-backed repo for spec-driven development. (source, source, source)
Remote control is becoming a first-class coding workflow, but reliability is lagging demand. Codex Mobile made phone-based diff review and approval concrete, yet the most practical response in the thread was still a tmux plus SSH workaround because the preview was unstable. (source, source)
The builder energy on this date was concentrated around agent infrastructure, not just new agents. Spec Kit, goal, clibib, Claude Code Karma, and Orca all focused on structure, persistence, citations, visibility, or execution around existing agent workflows. (source, source, source, source, source)