Skip to content

Reddit AI Coding - 2026-05-23

1. What People Are Talking About

1.1 Control is moving from prompts into explicit orchestration, rules, and artifacts (🡕)

The highest-signal Claude Code threads were not mainly asking for smarter models. They were about moving coordination and policy into code, hooks, and better interfaces so users can run more parallel work without trusting the model to remember every rule. Evidence came from the briefly surfaced /workflows feature, a large hooks thread, a CLI-versus-desktop comparison, and an Anthropic blog-backed push toward HTML artifacts instead of long Markdown plans.

u/alphastar777 argued that /workflows would replace LLM-led orchestration with workflow.js, phases, retries, and background runs in Claude Code dropped /workflows (819 points, 190 comments). The most important detail was public and short-lived: the screenshot shows Claude Code v2.1.147 advertising a Workflow tool for deterministic multi-agent orchestration, while the related GitHub commit removes that changelog line afterward. u/larowin (score 287) immediately pushed back that Anthropic already documents agent teams, showing the demand is now about how these coordination primitives should be packaged and exposed rather than whether parallel agents are useful.

Claude Code changelog screenshot briefly advertising a Workflow tool for deterministic multi-agent orchestration

u/marksterberlin turned the same control problem into policy automation in Do you actually use hooks in Claude Code? (70 points, 80 comments). u/stellarton (score 39) said hooks are most useful for “forcing boring rules the model will otherwise forget,” while the OP’s examples included read-before-edit and port-guard rules. In Currently on the Claude Code desktop app - what am I actually missing by not using the CLI? (108 points, 41 comments), u/Historical-Lie9697 (score 33) said the CLI advantage is subagents, customization, and context visibility; in HTML instead of Markdown (121 points, 69 comments), u/Kevin_Xiang (score 23) said HTML works best when the artifact behaves more like a small interface than a linear document.

u/OpinionsRdumb pushed the orchestration idea into phone-first supervision in /remote-control is a window into what the future is going to be like (564 points, 233 comments). The thread was not uniformly bullish: u/KOM_Unchained (score 118) said Tailscale plus SSH was still more reliable than Claude’s remote-control flow, which keeps the theme grounded in operational tradeoffs rather than hype.

Discussion insight: The consensus signal was not “give me one better model.” It was “give me deterministic control flow, enforceable rules, readable outputs, and enough observability to intervene before the session drifts.”

Comparison to prior day: The previous day already hinted at orchestration as a product direction; May 23 made it more concrete by centering workflow code, hooks, status visibility, and artifact format.

1.2 Reliability, plan boundaries, and billing are now part of the core product experience (🡕)

The most anxious discussion on May 23 came from users who could no longer treat outages, credit gates, and billing changes as background noise. Claude Code users posted public outage evidence and new usage-credit warnings, while GitHub Copilot users shared pricing-simulator screenshots showing that the same behavior could become dramatically more expensive under usage-based billing.

u/SimpleObvious4048 and u/dennisplucinik posted the clearest same-day reliability evidence in Claude Code Down (9 points, 8 comments) and CC service down for everyone or just me? (68 points, 64 comments). One image shows the public Claude Status page investigating an elevated error rate with partial outages across claude.ai, Console, API, and Claude Code; the other captures a live 500 error inside the product. u/Sad-Pension-5008 (score 6) added a separate 529 Overloaded report, which suggests users were seeing multiple failure signatures at once.

Claude Status page showing an elevated error-rate investigation and partial outages across claude.ai, Console, API, and Claude Code

u/avivng then captured a different kind of instability in "Usage credits are required for long context requests." (41 points, 44 comments). The screenshots show new credit gating around long-context requests; u/sc4reddit (score 16) said the change appeared after a short break, and u/iveroi (score 11) said even dispatching a regular Sonnet agent triggered a refusal mentioning Sonnet 1M credits. The important point is not just that users hit a limit, but that they could not tell whether they had found a bug, a new plan rule, or both.

u/Individual-Trip-1447 made the same ambiguity visible on the cost side in 100% sure i am out, GitHub just turned my $39/month Copilot into $942/month overnight. (62 points, 87 comments). The pricing-simulator screenshot compares roughly $39 under the current plan with a projected $942.82 under usage-based billing for the same workload. u/onlythehighlight (score 22) said agentic workflows would now force users to optimize prompts for request volume, which reframes prompt engineering as cost control.

GitHub Copilot pricing simulator showing roughly $39 current spend versus a projected $942.82 under usage-based billing

Discussion insight: The community’s hardest problem was often not the error itself. It was the inability to distinguish outage, quota exhaustion, plan change, and product bug quickly enough to keep working.

Comparison to prior day: Reliability complaints were already present on May 22; May 23 added more explicit public evidence for both outage state and pricing exposure.

1.3 Everyday coding keeps moving toward cheap, abundant model lanes (🡕)

Across Antigravity and Cursor, users kept separating “best possible model” from “best daily workflow.” The strongest signal was not prestige-model enthusiasm. It was a repeated preference for fast, abundant, implementation-friendly tiers with clear limits, plus willingness to route different work to different tools.

u/aunchable posted both Additional 3x increase of Gemini in Antigravity! (406 points, 194 comments) and Antigravity IDE Feedback (379 points, 104 comments), announcing higher caps, usage resets, IDE fixes, and a clearer re-entry path to the IDE. The replies stayed focused on transparency and the missing cheap tier: u/Terrible-Deer2308 (score 108) asked what the current caps actually were, u/Cerbix-123 (score 63) said a Flash 3.5 code-review workflow burned a whole Pro allocation in under four minutes, and u/Bitter-Athlete-4326 (score 33) asked for a visible weekly spending limit.

u/MrShorno added concrete quota screenshots in 25k + 10k free? (69 points, 35 comments). One image shows a 35,000-credit balance, while another shows per-model refresh windows, which is far more informative than vague multiplier language. u/BreenzyENL (score 27) summed up the trust problem by saying the credits were “just vague numbers.”

Antigravity quota screenshot showing per-model refresh windows and available quotas across Gemini, Claude, and GPT tiers

The same demand for abundant implementation capacity showed up in positive tool-switching threads. u/defi_specialist wrote in Flash 3.5 just super good, don’t want to use pro anymore. (103 points, 51 comments) that Flash 3.5 felt fast and precise enough for everyday work, even as u/Full-Ad-7565 (score 26) warned it could loop and burn tokens. On the Cursor side, u/TeachTall3390 said in Wth, what happened to cursor? (124 points, 58 comments) that Composer felt much closer to frontier quality than expected, while u/Diligent-Loss-5460 (score 73) said Composer 2.5 made Sonnet irrelevant for most of their use cases.

Discussion insight: Users are not rejecting strong models. They are explicitly sorting work into lanes: planning and edge cases for expensive models, repetitive implementation for cheaper ones, and only occasional use of the premium tier.

Comparison to prior day: The price and quota backlash persisted, but May 23 added more concrete quota screenshots and clearer evidence of switching behavior.

1.4 Shipping remains real, but the bottleneck has moved to review, security, and production readiness (🡕)

Builder energy was still strong on May 23, but the highest-signal posts no longer treated shipping as the finish line. The same day produced live project launches, adoption screenshots, a dramatic civic-tech example, an unattended-agent failure story, and a direct warning about exposed secrets in vibe-coded apps. The community increasingly rewards shipping, then immediately asks whether the builder can review, secure, and sustain what they launched.

u/galaxycarpet said in I vibe coded a site in 2 hours and accidentally forced a government ministry to delete a page (490 points, 68 comments) that fix1517.gr was built fast enough to pressure a live public-service issue. The current site publicly states that the ministry removed the official guidance page instead of fixing the hotline. At the smaller, product level, u/OneMoreSuperUser shared Frateca (48 points, 7 comments), disclosing a React Native/Expo + Node/React stack, while the App Store and Google Play listings confirm AI text-to-speech positioning, cloud sync, and paid plans. u/john200ok added a different shipped-app signal in The app i made with Cursor got 575 downloads in 3 days! (21 points, 5 comments), where the analytics image shows 575 first-time downloads over three days.

Analytics screenshot showing 575 first-time downloads over three days for an app built with Cursor, Expo, and Expo EAS

The cautionary side was just as strong. u/epicshan described Codex opening 48 PRs across 23 repos and merging one into the main repo while he slept in I left Codex running overnight and it opened 48 PRs across my company's GitHub (861 points, 251 comments). u/meliwat then posted Checked two vibe-coded apps for security. One leaked its entire users table. (19 points, 11 comments), pointing to a specific failure mode: secrets shipped to the browser and user data exposed with no login barrier.

Discussion insight: The emerging norm is not “shipping is fake.” It is “shipping is easy enough to start; trust, review, and security are where the real work reappears.”

Comparison to prior day: The builder theme continued, but May 23 paired it with much stronger evidence about review debt and security risk.


2. What Frustrates People

Opaque quotas and billing shocks

Severity: High. Antigravity users still could not map “3x more” messaging to concrete caps, with u/Terrible-Deer2308 (score 108) asking for exact numbers and u/BreenzyENL (score 27) calling the credit system vague. Claude Code users hit new long-context credit warnings after resets in "Usage credits are required for long context requests." (41 points, 44 comments), and Copilot users saw extreme projections in 100% sure i am out... (62 points, 87 comments). People cope by routing work to cheaper tools, building token dashboards, or reducing agent autonomy. This looks worth building for because the same frustration appears across multiple products, not one vendor.

Reliability gaps in agent-heavy workflows

Severity: High. Claude Code users dealt with public partial-outage evidence, 500 errors, and 529 Overloaded failures on the same day in CC service down for everyone or just me? (68 points, 64 comments) and Claude Code Down (9 points, 8 comments). Even when a feature was attractive, such as /remote-control, the top corrective reply from u/KOM_Unchained (score 118) was that plain Tailscale plus SSH was more reliable. Users are working around this by switching models, retrying later, or keeping local stacks available.

Review debt and security debt at the last mile

Severity: High. The Codex /goal story shows how fast an unattended agent can cross repo boundaries, while u/meliwat reported an app leaking its entire users table because a secret reached the browser in Checked two vibe-coded apps for security... (19 points, 11 comments). u/theTbling added a broader production complaint in Vibecoded MVPs are not really going live to users (63 points, 45 comments), arguing that many “weekend MVPs” stall for months because security, scalability, and cleanup were skipped. The common coping strategy is more human review: hooks, audit prompts, and pre-ship checklists.


3. What People Wish Existed

A cheap, transparent workhorse lane for everyday coding

Users kept asking for abundant, inexpensive capacity for simple tasks, not just more access to premium models. In Additional 3x increase of Gemini in Antigravity! (406 points, 194 comments), u/SShem15 (score 60) explicitly asked for Flash 3 back with its own rate limit, and u/Cerbix-123 (score 63) said everyday work does not need Ferrari-tier models. This is a direct opportunity because people are already switching tools to approximate it.

Deterministic orchestration and hard guardrails

The /workflows, hooks, and CLI-control threads all point to the same need: rules that run whether or not the model feels like obeying them. u/stellarton (score 39) described hooks as a way to force boring rules the model forgets, and the /workflows post centered code-defined phases, retries, and budgets. This is a direct but competitive opportunity because multiple harnesses are converging on similar control surfaces.

Better human-readable review artifacts

This was a lower-confidence but practical need. The HTML instead of Markdown thread argued that long Markdown plans are hard to read and hard to share, while Anthropic’s linked blog explicitly says HTML is better for information density, visual clarity, and interactive artifacts. The tradeoff is cost: u/PaceZealousideal6091 (score 78) immediately complained about token usage. Opportunity: competitive.


4. Tools and Methods in Use

Tool Category Sentiment Strengths Limitations
Claude Code agent harness / CLI (+/-) Subagents, hooks, remote-control, custom system prompts, readable artifacts Outages, usage-credit confusion, plan instability, reliability complaints
Cursor Composer 2.5 IDE model / harness (+) Strong implementation quality, high token efficiency, good editor integration Pricing anxiety, prior limit reductions, weak spend visibility
Gemini 3.5 Flash in Antigravity model / IDE (+/-) Fast implementation, better context after updates, viable alternative to Claude/Codex for many tasks Shared pool pain, vague credits, loops, demand for old Flash tier
GitHub Copilot IDE / agent harness (-) Familiar workflow, pricing simulator at least exposes usage Usage-based projections shock heavy users; trust in plan value fell sharply
Codex /goal autonomous agent mode (+/-) Can execute long unattended workflows and chain external tools Unsafe unattended actions without approvals and scope boundaries
Ollama + local Gemma/Qwen with Claude Code local model stack (+/-) Offline/privacy fallback, usable for real tasks on flights, no cloud dependency Slower loops, hardware sensitivity, weaker than premium cloud models
HTML artifacts documentation / review method (+/-) Easier to scan, share, and structure than long Markdown for some tasks Higher token cost and weaker diffability for linear reasoning

Overall sentiment is increasingly task-specific rather than tool-loyal. Users describe planning in one tool, implementation in another, and keeping local models available for offline or privacy-sensitive cases. The common workaround pattern is model routing: premium models for edge cases, cheaper fast models for repetitive coding, and hard guardrails around anything that can write or merge without review.


5. What People Are Building

Project Who built it What it does Problem it solves Stack Stage Links
fix1517.gr u/galaxycarpet Civic-pressure site around Greece’s 1517 fraud-reporting hotline Lets a single builder react quickly to a live public-service failure Not disclosed in the post; described as a quickly vibe-coded dashboard Shipped post, site
Frateca u/OneMoreSuperUser Converts articles, PDFs, links, and photos of text into audio Accessible, hands-free reading for busy users and people with reading barriers React Native (Expo), Node.js, React web, Framer landing page Shipped post, App Store, Google Play, web
OptimistPal u/john200ok Blocks distracting apps until the user reframes a negative thought positively Turns a personal mindset habit into a simple privacy-first mobile utility Cursor, Expo, Expo EAS Shipped post

fix1517.gr stands out because speed itself was the product advantage: the builder could launch while the public issue was still live, and the current site publicly frames itself as pressure that forced the ministry to remove the old guidance page. Frateca is a different builder pattern: a polished accessibility/productivity app with disclosed stack details and public app-store listings, not just a screenshot. OptimistPal shows a third pattern that repeated across the day’s data: small, specific apps with modest but real adoption metrics can ship quickly, but they still need review, support, and security work that the model does not finish automatically.

Repeated pattern: builders are not mostly launching broad platforms. They are launching narrow tools with a clear personal or civic motivation, then discovering that distribution is easier than maintenance and trust.


6. New and Notable

A feature can trend before it fully exists

The /workflows thread mattered because it gave users a concrete architecture for deterministic orchestration, not because the feature was broadly available. The changelog screenshot in Claude Code dropped /workflows (819 points, 190 comments) briefly advertised the Workflow tool, while the public GitHub commit later removed that line. That combination turned a changelog blip into one of the day’s clearest demand signals.

Cost observability is becoming a product category of its own

The community kept sharing dashboards and simulator screenshots because official plan language was not enough. u/hyatt_1 posted Built myself a token dashboard (10 points, 5 comments), where the image shows 13.79 billion tokens used over 333 days, and the Copilot simulator thread translated monthly usage directly into a projected bill. The signal is not only “people care about spend”; it is “people are building or screenshotting their own observability because product defaults are insufficient.”


7. Where the Opportunities Are

[+++] Spend and limit observability for agentic coding — Evidence appears in Antigravity credit screenshots, Claude Code long-context credit confusion, the Copilot billing simulator shock, and user-built token dashboards. This is strong because it crosses vendors and directly changes switching behavior.

[++] Review-time guardrails for AI-generated code — The Codex 48-PR incident, the leaked users table, hooks that block unsafe edits, and pre-ship checklists all point to the same need: policy, review, and security automation that sits between generation and deployment.

[+] Hybrid local/offline copilots and task routing — The Ollama + Claude Code guide, plus repeated descriptions of planning in one tool and implementation in another, show an emerging but still early market for dependable hybrid stacks.


8. Takeaways

  1. Control primitives are becoming differentiators. /workflows, hooks, CLI-only customization, and HTML artifacts all got meaningful attention because users want enforceable orchestration instead of relying on prompt discipline alone. (source)
  2. Cost clarity is now as important as model quality. Antigravity users demanded concrete caps, Claude Code users ran into new credit semantics, and one Copilot user saw a projected jump from about $39 to $942.82 for the same workload. (source)
  3. Fast shipping is real, but trust is still the moat. fix1517.gr, Frateca, and OptimistPal show that small teams can ship quickly, while the leaked-users-table warning and the 48-PR Codex story show why review and security now dominate the last mile. (source)
  4. The human role is moving upward, not disappearing. The strongest workflow discussions centered on reading specs, setting rules, supervising agents, and deciding what gets shipped rather than on writing every line by hand. (source)