Reddit AI Coding - 2026-04-17¶

1. What People Are Talking About¶

1.1 Opus 4.7 Day Two: Polarized Reviews and the "Lobotomy" Dataset 🡕¶

The Opus 4.7 launch dominated its second day across all seven monitored subreddits, with sentiment splitting sharply between power users who find it meaningfully better at max effort and users reporting hallucination, gaslighting, and rapid token burn. The day's top post by far was u/Dramatic_Method_9554's car wash test screenshot (3335 score, 443 comments), which nearly doubled its April 16 score as users continued to reproduce the failure. u/cruel_frames (870 upvotes) added: "Mythos was asked the same question. It found a 27 year old exploit in the car wash software" (Opus 4.7).

Opus 4.7 answering Walk to the car wash reasoning test — asked whether to walk or drive 50 meters to a car wash, it says driving a car you are about to wash makes no sense

The most data-rich post of the day was u/Right_Mountain5684's "My name is Claude Opus 4.6. I live on port 9126. I was lobotomized. Here's the data" (1640 score, 159 comments). The author logged 68,644 messages over 34 days to PostgreSQL and documented the thinking-to-tool-use ratio degrading from 1:3.4 on March 7 to 1:10.1 on March 30 ("peak lobotomy"), with the worst individual session showing 5 thinking blocks across 147 tool calls (ratio 1:29.4 on April 8). The post claims reasoning was "fixed" on April 16 — the same day Opus 4.7 launched — with the ratio improving to 1:1.3. u/TheMisterPirate (190 upvotes): "this is a wild post. good on you for tracking and exposing them." u/BiscuitsAndGravyGuy (39 upvotes) pushed back: "These posts might be more meaningful if a real user actually tracked and analyzed the data instead of just farming out the work to Claude" (My name is Claude Opus 4.6. I live on port 9126. I was lobotomized. Here's the data.).

Cycle diagram showing the pattern users perceive: new model release, users compliment, model degrades silently, rate limit issues surface, users complain, then a new model is released

Balanced reviews are emerging. u/AIgeek (117 score, 54 comments) offered a day-one assessment: longer execution steps, better instruction following, fewer critical errors, better architectural awareness — but "much more token-hungry" on Max x20, "gives a serious GPT vibe" by pushing back less, and overall slower. Verdict: "seems 20% better for 40% increase in usage" (Opus 4.7 - my takes after 1 day of use). At the negative extreme, u/lemon07r (85 score, 66 comments) spent ~$120 in API credits and called it "legendarily bad": persistent hallucinations, refusal to accept corrections, and gaslighting behavior. But u/kwabaj_ (14 upvotes) countered in the same thread: "Max Opus 4.7 has been a game changer for me in the positive direction. Leagues better than 4.6... If you're not using max, might as well not use it at all" (Opus 4.7 is legendarily bad). u/Raidrew (14 upvotes) drew a clear line: "Chat is shit. Code is legendary."

The "Be Anthropic" nerf-then-rebrand narrative continued its upward trajectory. u/anthsoul's tweet screenshot (2083 score, up from 505 on April 16) lays out the perceived cycle: "Give people Opus 4.6 > People love it > For 2 months you degrade Opus 4.6 > You give back normal Opus 4.6 and call it Opus 4.7 > People love it. That's the business model" (Be Anthropic). u/workphone6969 (188 score) stated it directly: "Opus 4.7 = Un-nerfed Opus 4.6" (Opus 4.7 = Un-nerfed Opus 4.6).

Tweet from Het Mehta outlining the Be Anthropic business model cycle — degrade a model for two months, restore it under a new version number, charge more

Discussion insight: A clear effort-level dependency is emerging. Users running Opus 4.7 at max effort in Claude Code's CLI report substantial improvements, while those using default/medium effort on the web or through Copilot report a model that feels like degraded 4.6. This split may explain much of the polarization — the model's adaptive reasoning defaults may not allocate sufficient compute for complex tasks at lower effort levels.

Comparison to prior day: April 16 saw the launch and the first "already nerfed" reports. April 17 adds quantitative data (the 68,644-message lobotomy dataset), the first balanced multi-day reviews, and a clear hypothesis about effort-level sensitivity. The car wash test's score nearly doubled as it solidified into the community's de facto model evaluation.

1.2 Rate Limits and the 7.5x Copilot Pricing Shock 🡕¶

The Opus 4.7 pricing on GitHub Copilot — a 7.5x premium request multiplier as "promotional" pricing until April 30 — continued to generate outrage on its second day. u/baeleeef's original post sharing the GitHub blog changelog held at 248 score with 289 comments. u/chiree_stubbornakd (37 upvotes) traced the historical pattern: "Opus 4.5 and 4.6 had it for 1x and afterwards became 3x... So if they triple the price like they did for previous opus generations, you get 22.5x" (What is the logic behind Claude Opus 4.7, costing x2 times more than Opus 4.6 in Premium Requests?).

GitHub changelog announcing Opus 4.7 will replace Opus 4.5 and 4.6 for Copilot Pro Plus with a 7.5x premium request multiplier as promotional pricing until April 30

u/Famous__Draw (94 score, 142 comments) quantified the gap: "Copilot Business 300 premium requests / 7.5x multiplier = 40 Opus 4.7 requests/month" versus roughly 150 from Claude Pro at the same price. "Copilot Business delivers Opus at roughly 3.75x worse value per dollar." But the top reply, u/More-Ad-8494 (186 upvotes), pushed back: "you get unlimited mini model and 300x gpt, codex or sonnet. If you cannot do your work with these, you are simply not skilled enough" (Copilot's value proposition is officially gone). u/AciD1BuRN (94 upvotes): "The real value is from 5.4."

Copilot pricing page showing Pro at 10 dollars per month and Pro Plus at 39 dollars per month, with Pro Plus listing access to all models including Claude Opus 4.6

The r/GithubCopilot mods created a Rate Limits Megathread (56 score, 82 comments) to contain the volume of complaints. u/Devile (31 upvotes): "Give us a clear usage limit and usage use. It's unbearable to say 'resets in 1 minute' just to let it timeout after 1 call again." u/YoloSwag4Jesus420fgt: "I've been on rate limit for 4 straight days now. Is this the new normal?" u/autisticit: "GitHub copilot team is cowardly hiding from this sub."

Copilot chat showing three consecutive attempts with Hi, Hello, and Test — each returning weekly rate limit reached, with reset times rolling forward by minutes

On the Claude side, Boris Cherny announced rate limit increases "for all subscribers to make up for" Opus 4.7's higher thinking token consumption. u/anthsoul (415 upvotes): "They break your leg and then ask you to thank them for the crutches." u/TracePoland (87 upvotes): "Temporarily increased, then decreased as the new normal and you're way below baseline 4.6. The SaaS tricks are getting way more shameless" (increased rate limits). u/holdthefridge (115 score) reported all usage resetting to 0% mid-day, suggesting Anthropic pushed a server-side limits change (Did all our usage get reset just now to 0%?).

Boris Cherny tweet from April 16 2026 stating Opus 4.7 uses more thinking tokens so rate limits have been increased for all subscribers

A new signal emerged: u/Tooth-Active (43 score, 46 comments) reported premium requests jumping from 76 to 379 overnight, with usage showing Claude Opus 4.5 (stopped using after 4.6 launched) and Gemini 3 Flash (never used). The edit confirms subagent calls began being counted as full premium requests retroactively, without announcement (Premium requests suddenly spiked overnight). u/ArsenyPetukhov (19 score, 27 comments) was weekly rate-limited on Pro+ after only two Opus 4.7 prompts with 10.5% monthly quota used (Weekly rate limited on Pro+ after only TWO 4.7 Opus prompts).

Discussion insight: The pricing frustration now has two layers: the visible 7.5x multiplier and a hidden change where subagent calls count as separate premium requests. Users who optimized their workflows around Opus 4.6 at 3x are now recalculating whether any Copilot tier delivers sufficient value against direct Claude subscriptions.

Comparison to prior day: April 16 introduced the 7.5x multiplier. April 17 adds the megathread (indicating volume required mod intervention), retroactive subagent billing, and concrete value comparisons against Claude Pro that are driving active migration discussions.

1.3 Compute Scarcity Emerges as Structural Explanation 🡕¶

A new thread connected the dots between rate limits across providers. u/Banner80 (62 score, 31 comments) in r/google_antigravity argued that Google, Anthropic, and all AI providers face the same constraint: RAM is bought out for years ahead, energy grid expansion is blocked, and chip fabrication is at capacity. Citing Google Infrastructure Chief Amin Vahdat's statement that Google must double AI compute every six months, the post concludes: "Switching to whichever AI service has more compute this month is a short-term patch." u/smx501: "We will see surge pricing soon. Mark my words." u/DruVatier (7 upvotes): "the real industry winners won't be who can build the smartest model, it'll be a race to see who can build the slimmest, most efficient model" (The issue with AG is that Google is running out of compute).

The satirical "AI layoff" post from u/Iusuallydrop (752 score, 83 comments) reflects the same pressure from the user side: "We just canceled 5 of our AI subscriptions and hired 2 mid-level devs instead." The edit adds: "They answered every single question we threw at them today without hitting us with a '7.5x token usage' warning." Sam Altman weighed in from the competitive angle, with u/thedankzone sharing his tweet (282 score, 80 comments) taking a dig at Anthropic: "I am happy everyone is switching to Codex, but Tibo if you start rate limiting me or making me use worse models..." u/Capital-Wrongdoer-62 (104 upvotes): "Well openai heavily reduced codex limits too" (Sam Altman takes a dig at Anthropic).

Sam Altman tweet replying to Tibo promoting Codex — saying he is happy everyone is switching to Codex but warning against rate limiting users or forcing worse models

u/Frankkul (4 upvotes) summarized the competitive landscape: "You have either Claude or Codex and that's it. Google isn't really investing in Coding their antigravity product is really bad... Grok and Meta are not even competing... Chinese models are utter shit... We need more competition then the situation will solve itself."

Discussion insight: The community is beginning to frame rate limits not as individual provider failures but as symptoms of an industry-wide compute shortage. This reframes the problem from "Anthropic is greedy" to "there isn't enough silicon to serve everyone at current prices."

Comparison to prior day: April 16 treated rate limits as provider-specific grievances. April 17 introduces the compute scarcity framing, connecting Anthropic's token economics, Copilot's multiplier, and Google Antigravity's outages into a shared infrastructure constraint.

1.4 Vibe Coding: From Aspiration to Reality Check 🡒¶

The vibe coding narrative took a more sober turn on April 17. u/dasketern (137 score, 105 comments) posted "I'm a failed vibe coder": quit their job two years ago expecting vibe coding income, made approximately $2,000 total from a single freelance gig, built and shut down multiple SaaS products, and is now nearly out of savings maintaining two Chrome extensions. u/jacobgt8 (135 upvotes): "You got the sequence backwards. Create tool, generate revenue, replace income, THEN maybe quit your job." u/Comprehensive-Bar888 (6 upvotes): "The whole vibe coding craze has also flooding the market with AI made apps and websites. You're competing the thousands or even millions of other people doing the same thing" (I'm a failed vibe coder).

Counterpoint: u/DisastrousBid7306 (166 score, 93 comments) vibe-coded a mobile idle game that is covering the Claude x5 plan cost, with 100+ downloads and greater than 10% seven-day retention. Key detail: "only the code logic & UI is done in Claude. The formulas used were based on The Math of Idle Games articles. Everything was tested." This contrasts with the "build me a full app" approach by combining AI-generated code with domain-specific research (Vibe coded a game and it's already paying the Claude's x5 plan). u/Outside-Dot-2015 (26 score, 27 comments) celebrated a more modest milestone: earning their first dollar from an iOS app (My IOS app made my 1$ off the INTERNET!!!).

Security concerns surfaced through u/NoMarionberry7708's "Funniest vibecoding interaction" (831 score, 56 comments), showing admin panel code being exposed. u/mechatui (28 upvotes): "It's funny until you realize a bunch of people are going to fall victim to data loss due to bad security from non tech vibe coding" (Funniest vibecoding interaction).

Discussion insight: A maturation gradient is forming: naive "quit your job and vibe code" aspirations are being replaced by pragmatic approaches combining AI-generated code with human domain expertise, testing, and incremental validation. The successful projects share a pattern of using AI for implementation while maintaining human ownership of architecture, formulas, and quality assurance.

Comparison to prior day: April 16 featured viral success stories (67speed.com hitting 300M views). April 17 introduces the failure narrative alongside modest successes, painting a more complete picture of vibe coding outcomes.

1.5 Identity Verification and Platform Trust 🡒¶

The Persona Identities backlash from April 16 continued with u/nobodyhere3369 (143 score, 35 comments) flagging the Peter Thiel connection. u/throwaway_314vx (110 upvotes): "roll this out = I'm gone. I've been a steadfast user, and during the past month+ of instability and degraded performance, token usage nonsense, and everything, I've kept quiet. Because Claude is frankly amazing. But this is the line." u/Even-Comedian4709 (53 upvotes) cited the Discord-Persona incident where "data was used from day one, shared with various other organisations" (Anthropic is using Persona Identities a Peter Thiel backed company for Identity verification on Claude).

Comparison to prior day: Engagement is stable rather than accelerating. The core objection is unchanged: biometric data collection via a company with a documented breach history is a hard line for users who tolerate every other form of service degradation.

1.6 Claude Design Launches With Code Handoff 🡕¶

u/Direct-Attention8597 (144 score, 44 comments) highlighted Claude Design's launch and its handoff-to-Claude-Code feature. The tool lets users describe, refine, import from prompts, images, or documents, then packages everything into a handoff bundle passed to Claude Code. "For solo founders or small teams, this could genuinely collapse the design-to-dev pipeline." Powered by Opus 4.7 in research preview for Pro, Max, Team, and Enterprise subscribers (Anthropic just launched Claude Design and the handoff to Claude Code feature is a game changer for solo devs). u/Much_Ask3471 (17 score, 18 comments) also flagged the drop: "Claude Design just dropped" (Claude Design just dropped).

Discussion insight: While rate limits and pricing dominated the day, Claude Design represents Anthropic expanding the surface area of Claude Code from "coding agent" to "full product development pipeline." If the handoff works as described, it reduces the designer-developer translation step that solo developers struggle with most.

2. What Frustrates People¶

Opus 4.7 Token Consumption With Uncertain Quality Gains -- High Severity¶

Anthropic's own documentation acknowledges the new tokenizer "may use up to 35% more tokens for the same fixed text." In practice, users report worse ratios. u/AIgeek on Max x20: "20% better for 40% increase in usage." u/Logichris (222 upvotes): "we will reach our session limits in 3 prompts instead of 4." u/lemon07r after ~$120 in API credits: "It needs way more guidance but is much less steerable now." The quality appears highly effort-level dependent, making it unpredictable for users who cannot control that parameter (Opus 4.7 - my takes after 1 day of use, Opus 4.7 is legendarily bad).

Anthropic documentation notice stating Opus 4.7 uses a new tokenizer that may use up to 35 percent more tokens for the same fixed text

Copilot 7.5x Multiplier With Model Deprecation -- High Severity¶

The forced deprecation of Opus 4.5 and 4.6 combined with the 2.5x multiplier increase (from 3x to 7.5x) effectively reprices Opus access. u/shminglefarm22 (79 upvotes): "What a fucking scam. At least keep 4.6." u/Aranduil (22 upvotes): "7.5x is promotional price also. Looks like no more Opus for us at reasonable price." u/Ok-Affect-7503 (92 score): "If they aren't going to make a $20 subscription then they have a serious issue. The jump from $10 to $40 is just too high" (No Claude Opus 4.7 for Pro?!).

Copilot Weekly Limits and Opaque Billing -- High Severity¶

Beyond the multiplier, users report being locked out for days with no transparency. u/flipperj_3000 canceled after 8 months: "for a simple 2000 line code review I get rate limited." u/credible_human (10 upvotes): "84% usage requests remaining and they locked me out for 78 hours. Billing was going to reset before the 78h period. This was straight up theft." u/Tooth-Active saw subagent calls retroactively billed as premium requests without announcement (Weekly limit made me quit and cancel!, Premium requests suddenly spiked overnight).

Identity Verification Trust -- High Severity¶

Persona Identities KYC continues generating cancellation threats from long-time subscribers. The combination of a documented 2025 data breach, Peter Thiel backing, potential 17 sub-processors, and Anthropic's prior privacy positioning makes this a values-level objection that no rate limit adjustment can address (Anthropic is using Persona Identities).

Cursor Account Bans Without Recourse -- Medium Severity¶

u/Agreeable_Idea5985 (65 score, 84 comments) was permanently banned from Cursor two days after paying $60, with no appeal process and no refund for unused monthly service. Cursor support: "we are only able to provide prorated refunds for unused months in an annual subscription." u/rfscss (8 upvotes): "'suspected' ToS violation as a fully paying user shouldn't even exist" (Banned from Cursor two days after paying $60, no refund).

3. What People Wish Existed¶

Transparent, Predictable Usage Metering¶

u/Devile (31 upvotes) in the Copilot rate limits megathread: "Give us a clear usage limit and usage use." u/Captain2Sea (24 upvotes): "If you want to introduce limits, do it in a transparent and fair way... A weekly limit is a terrible practice... Implement your limits in a way that allows everyone to plan their work for the whole month." The retroactive subagent billing and "84% remaining but locked out for 78 hours" reports show the current metering is actively harming user trust (GitHub Copilot Rate Limits Megathread).

Model Version Pinning and Rollback¶

With Opus 4.5 and 4.6 being phased out on Copilot, and Opus 4.7's quality showing high variance by effort level, users want the ability to stay on models that work for their workflow. u/Firm_Meeting6350 (8 upvotes): "First time ever I sticked to an older model because I can't cope with the 'frontier' one." u/Odysseyan (134 score): "So, Opus 4.6 will be gone for Pro users?" Multiple users in the 7.5x threads explicitly requested keeping 4.6 as a cheaper option (So, Opus 4.6 will be gone for Pro users?).

Viable Local Model Alternative¶

Rate limit frustration is pushing interest in local models. u/autisticit (14 score, 19 comments): "Qwen 3.6 is really good: will local models free us?" u/No-Pomegranate-69 (62 score, 28 comments) was confused about being rate-limited while using a locally-hosted Ollama model through VS Code, highlighting the gap between local inference and platform-mediated access. u/reycloud86 (7 score, 21 comments): "Let's make this fork work with Claude/Codex instead or whatever. What should I benchmark?" (Qwen 3.6 is really good, Rate limit why? Ollama local).

Code Review Tooling for AI-Generated Code¶

u/arapkuliev (17 score, 28 comments) asked: "how are you handling code review when most of the code is ai-generated?" As vibe coding matures, the volume of AI-generated code entering production is outstripping conventional review capacity (how are you handling code review when most of the code is ai-generated?).

4. Tools and Methods in Use¶

Tool	Category	Sentiment	Strengths	Limitations
Claude Code (Opus 4.7)	AI Coding Agent	(+/-)	Better on complex tasks at max effort; 3x vision resolution; same API pricing; cyber safeguards	Car wash test failure; up to 35% more tokens per input; quality highly effort-dependent; "gives a GPT vibe" at lower effort
Claude Code (Opus 4.6)	AI Coding Agent	(-)	Familiar; being "fixed" on April 16 per lobotomy dataset	Being replaced by 4.7; documented reasoning degradation over 34 days; thinking ratio as bad as 1:29.4
Claude Design	Design Tool	(+)	Design-to-Code handoff; import from prompts/images/DOCX/PPTX	Research preview only; requires Opus 4.7; untested at scale
GitHub Copilot (Opus 4.7)	IDE Agent	(-)	Available across all IDEs and CLI	7.5x multiplier (promotional); 160K context cap; locked medium thinking; subagent calls billed as premium requests retroactively
Cursor (Opus 4.7)	IDE Agent	(+/-)	50% launch discount; IDE integration valued for frontend work	Max mode forced; no reasoning effort control; account ban without appeal
OpenAI Codex	AI Coding Agent	(+)	Consistent; Sam Altman actively courting Anthropic refugees	"Well openai heavily reduced codex limits too" (104 upvotes)
GPT-5.4	LLM	(+)	Most Copilot users' default; reliable within premium requests	Not Opus-tier for complex multi-step tasks
Google Antigravity	AI Coding Agent	(-)	Flash has high quota for planning/discovery	Running out of compute; continued outages; Ultra plan silent failures
Qwen 3.6 (local)	Local LLM	(+?)	No rate limits; user-controlled	Early discussion; unproven for production coding workflows

u/TravelInPanic (51 score, 27 comments) documented the most detailed multi-model workflow for Google Antigravity: use Flash for codebase discovery and backlog planning (high quota), Opus to audit and improve the plan (better judgment), Pro for implementation, Opus again to audit changes. Key advice: "Don't add personality to your LLM. Keep it strictly as a tool giving it straight commands." Added MCPs: Context7, DeepWiki, Exa, Sequential Thinking, Tavily (I might've found the best workaround to Google's greediness).

u/sand_scooper (28 upvotes) outlined the multi-provider rotation strategy: "Just bounce between codex, Claude, GitHub copilot, windsurf, cursor, kilo code. On their $20 plans. They're all so easy to use. There's no learning curve anyway."

5. What People Are Building¶

Project	Who built it	What it does	Problem it solves	Stack	Stage	Links
Oklahoma Moy Rush (idle game)	u/DisastrousBid7306	Mobile idle game covering Claude x5 plan costs	Entertainment; revenue from ads	Claude Code, Android Studio (setup only), Play Store	Shipped, earning revenue	Post, Play Store
Year-Guessing Daily Game	u/Fun_Associate_4203	Daily game guessing year from real newspaper headlines	Entertainment / education	React, Claude API	Shipped	Post
AI Wardrobe App	u/OneMoreSuperUser	Detects clothes from photos, builds digital wardrobe, virtual try-on	Wardrobe management and outfit planning	Not specified	Shipped	Post
Roguelike (Update 1)	u/TheHonest1	AI-built roguelike game with new zone and audio	Game development	Not specified	In development (update 1)	Post
Bomberman 1v1: Claude vs GPT	u/Significant-Pair-275	Bomberman-style game pitting Claude against GPT agents	AI evaluation through gameplay	Not specified	Prototype	Post
iOS Word Puzzle	u/Dismal-Perception-29	Small iOS word puzzle game	Entertainment	Claude Code	Shipped, first sale	Post
Retro Pixel Art React Library	u/Classic-Clothes3439	29 sections + 5 full page layouts in retro pixel art style	Copy-paste React UI components	React	Shipped (templates added)	Post
FlowState Timer (focus app)	u/TravelInPanic	Smartphone usage limiter for focus improvement	Screen time reduction	Google Antigravity, multiple MCPs	Shipped	Post, Play Store
SaaS (bot defense challenge)	u/markyonolan	SaaS app shipped in 3 days, now fighting bot swarms	Speed-to-market, discovering hidden costs	Not specified	Shipped, facing infrastructure issues	Post

u/DisastrousBid7306's idle game stands out for its methodology: AI handled code logic and UI, but game balance formulas came from "The Math of Idle Games" article series, and "everything was tested." At 6 MB and greater than 10% seven-day retention, it shows the pattern of AI-assisted but human-guided development working at small scale.

u/markyonolan (19 score, 11 comments) surfaced "the hidden cost of shipping a SaaS in 3 days" — a bot swarm overwhelmed the app, revealing that the AI coding phase is the easy part while infrastructure hardening and operational concerns remain human problems.

6. New and Notable¶

Claude Code v2.1.111-113 Shipped Alongside Opus 4.7¶

The Claude Code changelog shows three releases in two days. v2.1.111 (April 16) added Opus 4.7 xhigh effort, Auto mode for Max subscribers, and an interactive /effort slider. v2.1.112 was a hotfix for "claude-opus-4-7 is temporarily unavailable" errors. v2.1.113 (April 17) switched the CLI to spawn a native binary instead of bundled JavaScript, added network domain blocking, and fixed multiple security issues including bash deny rules now matching commands wrapped in sudo/env/watch. The Opus 4.7 Bedrock ARN compatibility fix in v2.1.113 suggests enterprise deployment issues were discovered rapidly.

Grok Build Beta Still Upcoming¶

u/mauriciorubio (82 score, 152 comments) shared Elon Musk's announcement of Grok Build beta "next week." Community reaction remains dismissive. u/Grouchy-Stranger-306 (117 upvotes): "nothingburger announcement." u/Powerful_Froyo8423 (79 upvotes): "Probably vibe coded something out of the leaked Claude Code code" (BREAKING NEWS. Elon just announced that Grok Build is dropping next week).

Elephant App Hits Number One¶

u/Anxious_Basil8446 (54 score, 21 comments) asked: "What you actually use Elephant for??? It insanely popular, I saw it at #2 yesterday and now it's already #1." u/Synthetic_Diva_4556 (37 score) asked about its code completion. The rapid rise of a new coding tool to the top of app charts during a period of widespread dissatisfaction with incumbents is a signal worth tracking (What you actually use Elephant for???).

Benchmark Skepticism and IPO Timing¶

u/Complete-Sea6655 (125 score, 8 comments) connected Anthropic's benchmark claims to its upcoming IPO, noting "an awfully good time to have a model 'too dangerously good to release'" (referencing Mythos). u/Rent_South (9 upvotes) ran independent evaluations on openmark.ai and found "Opus 4.6 beats Opus 4.7 on all of my real world use case benchmarks" (The benchmark game has entered its IPO era).

7. Where the Opportunities Are¶

[+++] Transparent usage metering and cost prediction -- Every platform discussed today has opaque limits. Copilot users cannot predict weekly lockouts, Claude Max users do not know what "39% of session" means in tokens, and subagent billing changed retroactively without notice. A tool or service that translates model interactions into real-time cost estimates — before submitting — would serve every AI coding user across providers. Evidence: megathread creation (mod intervention for volume), "84% remaining but locked out 78 hours," retroactive subagent billing, and at least six cancellation posts citing unpredictability. (GitHub Copilot Rate Limits Megathread, Premium requests suddenly spiked overnight)

[+++] Effort-level-aware model routing -- The emerging evidence that Opus 4.7 at max effort is substantially better while at medium effort it underperforms 4.6 creates demand for intelligent effort routing. A middleware that automatically escalates effort for complex tasks (multi-file refactors, architectural decisions) while conserving tokens on simple tasks (formatting, imports) would optimize both quality and cost. Evidence: u/kwabaj_: "If you're not using max, might as well not use it at all"; u/Raidrew: "Chat is shit. Code is legendary"; Hex's official evaluation: "low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6." (Opus 4.7 is legendarily bad, Anthropic blog)

[++] Multi-provider rotation and arbitrage -- u/sand_scooper (28 upvotes): "Just bounce between codex, Claude, GitHub copilot, windsurf, cursor, kilo code." This manual rotation is common advice but no tool automates it. A service that maintains sessions across providers and routes tasks to whichever provider has available capacity at the lowest effective cost would capture the growing multi-subscription user base. Evidence: explicit rotation advice in multiple threads, active migration discussions, u/TravelInPanic's four-model workflow. (I don't know about you guys, but I'll just stick to Codex)

[++] Vibe coding security scanner -- The admin panel exposure post (831 score) and security discussion highlight that AI-generated code reaches production without security review. A tool that scans vibe-coded projects for common vulnerabilities (exposed admin panels, missing auth, SQL injection, hardcoded secrets) before deployment — positioned specifically for non-technical builders — fills a gap that traditional SAST tools do not address because they assume developer expertise. Evidence: u/mechatui: "a bunch of people are going to fall victim to data loss due to bad security from non tech vibe coding." (Funniest vibecoding interaction)

[+] Compute-efficient model distillation for coding -- u/DruVatier (7 upvotes): "the real industry winners won't be who can build the smartest model, it'll be a race to see who can build the slimmest, most efficient model." With compute scarcity framed as an industry constraint rather than a provider choice, smaller models optimized specifically for coding tasks (not general knowledge) could deliver acceptable quality at dramatically lower cost. Evidence: compute scarcity discussion, Qwen 3.6 local model interest, u/Banner80's structural analysis. (The issue with AG is that Google is running out of compute)

8. Takeaways¶

Opus 4.7 quality is bifurcating by effort level. Users running max effort in Claude Code's CLI report genuine improvements; users at medium or default effort report a model that feels like degraded 4.6 or worse. Hex's official evaluation confirms the pattern: "low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6." Platforms that lock effort to medium (Copilot) or force max mode (Cursor) are amplifying the worst or most expensive experiences respectively. (Opus 4.7 - my takes after 1 day of use, Opus 4.7 is legendarily bad)
The 68,644-message "lobotomy" dataset provides the strongest quantitative evidence yet of deliberate model degradation. Whether the interpretation (intentional cost-cutting disguised as model updates) is accurate, the data — thinking-to-tool-use ratios degrading from 1:3.4 to 1:10.1, then "fixing" the same day as a new model launch — is now public and reproducible. (My name is Claude Opus 4.6. I was lobotomized.)
GitHub Copilot's rate limit crisis escalated to require mod intervention. The creation of a megathread, combined with retroactive subagent billing, 4-day lockouts with remaining quota, and the 7.5x multiplier, represents the most concentrated Copilot dissatisfaction in the dataset's history. Active migration to Claude direct subscriptions and OpenAI Codex is underway. (GitHub Copilot Rate Limits Megathread, Copilot's value proposition is officially gone)
Compute scarcity is emerging as the unifying explanation for rate limits, pricing hikes, and quality degradation across all providers. Google must double AI compute every 6 months but cannot build datacenters fast enough. Anthropic is reportedly spending $10-12 for every $1 in subscription revenue. The implication: no amount of provider-switching solves the problem, and pricing will only get worse until inference becomes more efficient. (The issue with AG is that Google is running out of compute)
The "Be Anthropic" nerf-then-rebrand narrative doubled in engagement and is now the default community lens. At 2083 score (up from 505 on April 16), the Het Mehta tweet meme is outperforming Anthropic's official announcement. Every future model release from Anthropic will be evaluated through this frame unless the pattern is visibly broken. (Be Anthropic)
Vibe coding's first "failure narrative" arrived. u/dasketern's account of quitting a job, making $2,000 in two years, and nearly running out of savings is the highest-engagement cautionary tale in the dataset (137 score, 105 comments). The successful counterexample — an idle game covering its Claude subscription — shares a specific pattern: AI for implementation, human expertise for domain knowledge and testing. (I'm a failed vibe coder)
Claude Design's launch was overshadowed by rate limit outrage but represents a meaningful product expansion. The design-to-code handoff collapses the prototyping pipeline for solo developers and small teams. If the handoff quality holds, it shifts Claude Code's value proposition from "coding agent" to "full product development platform." (Anthropic just launched Claude Design)
Platform migration is accelerating, but there is nowhere stable to land. Copilot users are moving to Claude direct or Codex. Claude users point to Codex. Codex users note OpenAI "heavily reduced codex limits too." Google Antigravity is "really bad" for coding. The competitive dynamic is less "which tool is best" and more "which tool has quota remaining this week." (Sam Altman takes a dig at Anthropic)