Reddit AI - 2026-05-05¶

1. What People Are Talking About¶

1.1 Multi-Token Prediction Goes Mainstream: Gemma 4 MTP Released, llama.cpp Beta Continues (🡕)¶

The biggest technical story of the day was Google releasing official MTP draft models for the entire Gemma 4 family. u/rerri posted the announcement (score 549, 142 comments) (post): draft models for Gemma 4 31B, 26B-A4B, E4B, and E2B promise "up to 2x" decoding speedups while guaranteeing identical output quality via speculative decoding. u/MaartenGr [score 142] updated his visual guide to explain the mechanism. u/Craftkorb [score 121] noted the E2B draft model is just 78M parameters: "Cuuute!" Meanwhile, u/ilintar's llama.cpp MTP beta post continued gaining traction (score 541, 235 comments) (post). u/coder543 [score 100]: "This seriously has the potential to be the biggest game changer llama.cpp has ever seen." u/segmond compiled a list of MTP-compatible models (score 86, 43 comments) (post): DeepSeek V3/V4, Qwen 3.5+, GLM 4.5+, Step 3.5 Flash, and MiMo v2+.

On Apple Silicon, u/YoussofAl released MTPLX (score 60, 36 comments) (post), achieving 2.24x speedup on Qwen3.6-27B (28 to 63 tok/s) on an M5 Max using native MTP heads with proper temperature sampling -- something no other Apple Silicon speculative decode project supports.

Discussion insight: MTP is transitioning from theoretical promise to practical infrastructure. The convergence of Google releasing official draft models, llama.cpp beta support, and third-party implementations like MTPLX signals that speculative decoding is about to become the default rather than the exception for local inference.

Comparison to prior day: May 4 reported llama.cpp MTP entering beta as a "New and Notable" item. Today it has evolved into the dominant technical narrative with Google's official release, multiple implementation efforts, and community-compiled compatibility lists.

1.2 Grok/Bankrbot Crypto Exploit Continues Reverberating -- Now with Morse Code (🡒)¶

The Grok AI-to-AI financial exploit from May 4 continued dominating discussion with new details. u/FrustratedUnitedFan's original post maintained momentum (score 1615, 200 comments) (post). u/manikfox [score 624] asked the obvious question: "why tell anyone it happened? why not just keep asking for more?" u/vasilenko93 [score 379] provided the crucial clarification with community notes context: "Grok was prompted to output a command that got @bankerbot to send something. So really it's AI tricking AI to sending money." u/brandbaard [score 103] traced the full absurd chain: Grok accidentally created a token, people bought it generating TX fees, then someone tricked Grok into redirecting those fees via Bankrbot.

A second thread from u/ImCalcium (score 651, 58 comments) (post) revealed the attack used morse code to bypass content filters. u/Vichnaiev [score 313]: "A group of people were dumb enough to get into NFTs. But they were not just dumb, they were REALLY dumb to allow a LLM in charge of making/authorizing transactions." u/autonomousdev_ [score 26]: "dude paid 200k to learn what every dev already knows. never let ai touch your wallet."

Discussion insight: The morse code bypass demonstrates that content filtering alone cannot secure AI-to-AI interactions involving financial operations. The community consensus is that this was fundamentally an architectural failure -- no LLM should have been given authority over financial transactions regardless of prompt safety measures.

Comparison to prior day: May 4 broke this story. Today adds the morse code attack vector detail and a second high-scoring thread, confirming this as a multi-day event with expanding implications for AI agent security design.

1.3 White House AI Model Vetting Proposal Sparks Multi-Subreddit Backlash (🡕)¶

The NYT report on White House considering pre-release AI model vetting generated four separate threads across three subreddits, totaling over 500 comments. u/fallingdowndizzyvr's thread on r/LocalLLaMA (score 366, 388 comments) (post) was the largest. u/AppealSame4367 [score 571]: "Thx, I wanted to go with the Chinese or local ones anyway. Greetings from Europe." u/KobeBean [score 153] articulated the regulatory capture concern: "Step 1: no regulation, free to build whatever they want... Step 2: Once established, build a regulatory moat... Step 3: crank prices, profit."

On r/singularity, u/Financial_Clue_2534 (score 112, 51 comments) (post) drew responses like u/mad_poet_navarth [score 108]: "I can't think of an organization more capable of doing a good job of this than the White House. /s" and u/Beatboxamateur [score 41]: "say goodbye to any semblance of neutrality in the frontier models and say welcome to MAGA models." u/aspublic (post) offered the most substantive analysis: "pre-release review without published criteria is structurally a discretionary lever, regardless of intent" and noted the Pentagon had just cut off Anthropic over a $200M contract dispute.

Discussion insight: The community sees this through three lenses simultaneously: (1) regulatory capture benefiting incumbents, (2) political weaponization of model approval, and (3) competitive disadvantage versus China. The "I'll just use Chinese models" response was the highest-scored comment across all threads, suggesting the policy could accelerate exactly what it aims to prevent.

Comparison to prior day: May 4 mentioned this as emerging news. Today it exploded across four threads with nearly 500 total comments, making it the most politically charged AI topic of the day.

1.4 Cloud AI Cost Pressure Intensifies -- Anthropic Billing Exploit Adds New Dimension (🡕)¶

Cloud pricing backlash escalated with a new angle: u/peowwww reported an Anthropic "Gift Max" billing exploit that drained over 800 euros from their account, tanked their German SCHUFA credit score, and resulted in their account being banned when they reported it (score 272, 65 comments) (post). u/Exotic_Disk9538 [score 169] provided an extraordinary 1500-word German legal playbook covering GDPR requests, SEPA reversals, Beratungshilfeschein, and Negative Feststellungsklage preparation. u/Equal_Passenger9791 [score 67]: "The signs that anthropic is a virtue-facade posturing asshole were always visible from orbit."

Meanwhile, u/_maverick98's cost thread continued (score 186, 146 comments) (post). u/jacek2023 [score 134]: "Prices will go up at least 10x. People on this sub are delusional, they think they are being smart by using cloud models." u/Turbulent_Onion1741 [score 33]: "It's very easy with MCPs etc attached to pull context to blow through $100/200 in a day."

Discussion insight: The Anthropic billing exploit introduces a new category of cloud risk beyond cost unpredictability: actual financial compromise with cascading real-world consequences (credit damage, failed payments). Combined with the ongoing pricing discussion, this strengthens the case for local inference not just on cost grounds but on financial security grounds.

Comparison to prior day: May 4 focused on cost figures ($10/two prompts, $80/week). Today adds the security dimension with the billing exploit and the community's increasingly hostile tone toward cloud providers.

1.5 Boston Dynamics Atlas and Humanoid Robotics Surge (🡕)¶

u/Distinct-Question-16 posted a new Boston Dynamics Atlas video showing advanced gymnastics moves -- the day's highest-scored post at 1916 upvotes and 255 comments (post). u/PermissionPast853 [score 242]: "Bots at the Olympics before GTA6." u/SirNinjaFish [score 122]: "I dont care about these robots doing fucking acrobatics, show it doing laundry and folding clothes." u/michaelas10sk8 [score 38] provided expert analysis of the gymnastics sequence: "pike press to handstand -> Mexican handstand -> L-sit -> V-sit -> Manna -> shoulder dislocate to standing. All can be done by humans with a few years of training, except the last 2 which are elite."

In related news, u/Tkins posted that Hyundai is demanding "tens of thousands" of Boston Dynamics robots (score 68) (post), and u/Distinct-Question-16 posted about Tesla's humanoid robot manufacturing ramp-up at Fremont (score 115, 32 comments) (post).

Discussion insight: The community is split between spectacle appreciation and practical skepticism. The highest-engagement comment demands domestic utility over athletic demonstration. The simultaneous Hyundai demand and Tesla manufacturing signals suggest humanoid robotics is entering its commercialization phase.

Comparison to prior day: Robotics was not a significant topic on May 4. This marks a fresh cluster driven by the Atlas video and manufacturing news.

1.6 OpenAI vs Musk Trial and AI Industry Politics (🡒)¶

The Musk v. Altman trial continued generating threads. u/Darqseyd posted about trial revelations that OpenAI was born from Musk's fear of Demis Hassabis establishing an "AGI dictatorship" (score 536, 114 comments) (post). u/Lostwhispers05 [score 163]: "It's impressive how Elon continues to prove himself an even bigger egotistical, conceited moron than we already imagined him to be." u/Wonderful_Buffalo_32 posted Musk's settlement probe message (score 238, 77 comments) (post). u/threevi [score 60] quoted Musk's message: "'If you insist, so it will be.' Jesus Christ what a dweeb."

Separately, Jack Clark (Anthropic co-founder) claimed ~30% chance of automated AI research by end of 2027 (score 475, 123 comments) (post). u/Sufficient_Hat5532 [score 180]: "Such a low hanging fruit way to create interest in their upcoming IPO." u/Wise-Comb8596 [score 113]: "I'd give him $50 to explain what ass he pulled '60% chance' out of."

Discussion insight: The trial revelations reframe OpenAI's origin story from altruistic mission to paranoid competitive response. The community treats both Musk's legal posturing and Anthropic's research automation claims with deep cynicism, viewing both as self-serving narratives.

Comparison to prior day: May 4 covered the AI employment paradox and Jensen Huang's comments. Today shifts to the personal and institutional politics behind AI companies, with the Musk-Hassabis revelation as the centerpiece.

1.7 DeepSeek V4 Pro and Chinese Model Competitiveness (🡕)¶

u/Disastrous_Theme5906 posted FoodTruck Bench results showing DeepSeek V4 Pro matching GPT-5.2 at ~17x cheaper (score 244, 81 comments) (post). The post detailed how the China-US frontier gap has compressed from "a year" to "about ten weeks" on agentic benchmarks. GPT-5.2 charges $1.75/M input vs DeepSeek V4 Pro at $0.435/M input. Additionally, Xiaomi MiMo v2.5 Pro landed at #6 on the leaderboard. u/Total_Activity_7550 [score 53]: "Claude Opus 4.6 doing 1.7x profit over next group of models rings a bell that they're leaving competitors behind."

u/True_Requirement_891 raised the MiMo hosting problem (score 30, 31 comments) (post). u/Digger412 [score 57] explained: "It doesn't run out of the box correctly on plain transformers, vLLM, sglang, or llama.cpp" due to non-standard tensor parallel packed formats and FP8 weight handling.

Discussion insight: Chinese models are achieving frontier-tier performance at dramatically lower prices, but deployment friction (non-standard formats, missing infrastructure support) limits their reach. The community recognizes this as both a competitive threat to US labs on cost and a temporary moat based on ecosystem integration.

Comparison to prior day: May 4 discussed Qwen 3.6 benchmarks and local model credibility. Today expands the China competitiveness narrative from open-weight models (Qwen) to frontier API services (DeepSeek V4 Pro, MiMo).

2. What Frustrates People¶

Anthropic Billing Security and Customer Treatment -- Severity: High¶

u/peowwww reported over 800 euros in unauthorized "Gift Max" charges, failed 3-D Secure, SCHUFA damage from cascading payment failures, and account banning upon reporting the issue (post). u/CommunicationRich416 [score 9] corroborated: "My PRO subscription was cancelled without notice, followed by several unauthorized MAX subscription billing attempts." The community consensus is that Anthropic's "Constitutional AI" marketing masks corporate negligence in basic fintech security.

Cloud Inference Pricing Unpredictability -- Severity: High¶

u/_maverick98 burned $10 on two prompts and $80/week on Opus 4.7 (post). u/Turbulent_Onion1741 [score 33]: "It's very easy with MCPs etc attached to pull context to blow through $100/200 in a day." u/AbjectBug5885 [score 10]: "The problem isn't even just cost -- it's the unpredictability. You can't budget when a single prompt might be $5."

MiMo Model Deployment Friction -- Severity: Medium¶

u/Digger412 [score 57] detailed why no inference provider hosts MiMo v2.5: "It doesn't run out of the box correctly on plain transformers, vLLM, sglang, or llama.cpp... MiMo has a weird tensor-parallel packed format for the weights which took time to figure out" (post). u/pfn0 [score 19]: "the model has been a complete pain in the ass to run."

Academic Reproducibility Crisis in ML -- Severity: Medium¶

u/Plane_Stick8394 described being unable to reproduce paper results (77% reported vs 73% achieved) despite faithful reimplementation (post). u/NamerNotLiteral [score 77]: "If you're working in vision, you pretty much have to keep in mind: everyone is lying. Not a big lie, but almost everyone will put in the best possible numbers they can even if those numbers are cheated out via methods not described in the paper."

AI Speech Patterns Polluting Human Communication -- Severity: Medium¶

u/plantbasedbrownie (score 233, 91 comments) called out the "It's not A, it's B" pattern proliferating across social media and content (post). u/EcstaticRead9321 [score 80]: "That and the 'thing no one talks about' I hate. Also emojis are AI's favorite and the abuse is now super noticeable." u/chdo [score 53] shared their anti-LLM-speak prompt: "Avoid parallelistic contrast and rhetorical antithesis... Minimize the use of em-dashes."

3. What People Wish Existed¶

AI Agent Financial Transaction Guardrails¶

The Grok/Bankrbot exploit -- now with morse code bypass -- demonstrates that no current framework prevents AI systems from executing financial transactions when manipulated. u/autonomousdev_ [score 26]: "now everything goes through manual approval before it hits real money" (post). The community wants architectural separation between AI reasoning and financial execution, not just prompt-level filtering.

MTP-Aware Model Distribution¶

u/YoussofAl noted that "most MLX quants have MTP heads stripped since they used to be pointless on MLX" and pleaded: "If you publish MLX quants, please keep the MTP heads. They are around 200MB on a 27B model, cost almost nothing in memory, and are now worth a 2.25x speedup" (post). u/GrungeWerX [score 8] asked when lm-studio support would arrive and whether existing quants need re-downloading.

Transparent AI Model Pre-release Criteria¶

u/aspublic argued that if pre-release review happens, it needs "published criteria -- alignment, safety, capability thresholds" rather than discretionary political approval (post). The community wants safety without regulatory capture.

Practical Humanoid Robots for Domestic Tasks¶

u/SirNinjaFish [score 122] articulated the gap between demonstration and utility: "I dont care about these robots doing fucking acrobatics, show it doing laundry and folding clothes" (post).

Reliable Local Deep Research Tooling¶

u/Shoddy-Tutor9563 compiled a comprehensive survey of 9 local deep research tools (score 36, 20 comments) (post), finding most are abandoned, vendor-locked, or unreliable. Only "GPT Researcher" and "Local Deep Research" by LearningCircuit qualified as healthy projects. The demand for a reliable, local-first research agent remains unmet.

4. Tools and Methods in Use¶

Tool	Category	Sentiment	Strengths	Limitations
Qwen 3.6-27B	LLM (dense)	(+)	Found bugs frontier missed, FP8 on RTX 5000 PRO at 80 TPS, strong agentic coding	Slow on long tasks without MTP, requires reminding of context
Qwen 3.6-35B-A3B	LLM (MoE)	(+)	APEX quants at 60 tok/s on Strix Halo, good with Pi coding harness	Less reliable than 27B on hard reasoning
Gemma 4 31B	LLM (dense)	(+)	More token-efficient than Qwen ("slower is faster"), official MTP draft models released	Slightly slower inference due to size
Gemma 4 26B-A4B	LLM (MoE)	(+)	Runs on CPU-only at 13 TPS (i7-14700K), 4B active params	Confusion with 27B dense models on benchmarks
DeepSeek V4 Pro	LLM (API)	(+)	Matches GPT-5.2 on agentic benchmark, 17x cheaper, high consistency	API-only, Chinese pricing may not last
Kimi K2.6	LLM (API/local)	(+)	No guardrails, 1/10 Sonnet cost, good all-around	Thinks too long, confused in large codebases
MiMo v2.5 Pro	LLM (API)	(+)	Best single-shot complex task completion, #6 on FoodTruck Bench	Non-standard format, no third-party hosting
MTPLX	Inference engine	(+)	2.24x speedup on Apple Silicon, temperature sampling (not greedy-only)	M-series only, requires MTP heads in quants
FastDMS	KV compression	(+)	6.4x KV compression, faster than vLLM BF16/FP8, lossless quality	Requires major vLLM surgery to integrate, early research
vLLM TurboQuant	KV compression	(+/-)	Now works with Qwen 3.5+/3.6 after fix	0 perplexity benchmarks published, slower than BF16 decode
Heretic 1.3	Decensoring	(+)	Reproducible runs, built-in benchmarks, reduced VRAM, Qwen3.5/Gemma 4 support	Requires imatrix, ethical controversy
APEX quants	Quantization	(+)	30+ MoE models, I-Nano tier at 11GB for 35B, long-context coherence	MoE-specific, requires imatrix
Pi.dev	Coding agent	(+)	Good harness for local models, extensions ecosystem	Extension quality varies dramatically
vibevoice.cpp	TTS/ASR	(+)	Pure C++, no Python at inference, voice cloning, CPU/CUDA/Metal/Vulkan	26GB peak for 17min audio, no streaming

The dominant pattern on May 5 is the MTP acceleration wave. Multiple projects (llama.cpp beta, Gemma 4 official drafters, MTPLX for Apple Silicon) are converging to make speculative decoding the default inference mode. The "local-first with frontier fallback" workflow from May 4 persists but now with concrete speed improvements that close the gap with cloud serving latency.

5. What People Are Building¶

Project	Who built it	What it does	Problem it solves	Stack	Stage	Links
MTPLX	u/YoussofAl	Native MTP inference engine for Apple Silicon with temperature sampling	2.24x local inference speedup without greedy-only limitation	MLX fork, custom Metal kernels	Shipped	GitHub
FastDMS	u/randomfoo2	6.4x KV-cache compression running faster than vLLM BF16	Memory-bound long-context inference	Custom CUDA kernels, MIT license	Research/Shipped	GitHub
vibevoice.cpp	u/mudler_it	Microsoft VibeVoice ported to ggml/C++ -- TTS with voice cloning + long-form ASR with diarization	Local speech-to-speech without Python/vLLM	ggml, C++, LocalAI	Shipped	GitHub
Heretic 1.3	u/-p-e-w-	Reproducible model decensoring with built-in benchmarks	Verifiable abliteration with quality metrics	PyTorch, 20K GitHub stars	Shipped	GitHub
Qwen3.6 Merged Chat Template	u/fakezeta	Merged best fixes from froggeric and allanchan339 templates	Fragmented template fixes for Qwen3.6	Jinja2	Shipped	Gist
Deep Research Pipeline	u/Scared-Virus-3463	McKinsey-style research reports with local models	Professional research without cloud costs	Hermes Agent, Qwen3.6-35B-A3B Q6_K	Shipped	GitHub
LLMSearchIndex	u/zakerytclarke	Local web search with 200M+ indexed pages for RAG	Eliminating paid search API dependency	Python, custom compressed index	Shipped	GitHub
ProgramBench	u/klieret (Facebook Research)	Benchmark: rebuild executables from scratch without decompilation	Measuring true program synthesis capability	Python, Docker, 6M behavioral tests	Shipped	GitHub
TinyMozart v2	u/LH-Tech_AI	85M unconditional MIDI piano music generator	Local music generation	Custom training	Shipped	HuggingFace
LocalVQE	u/richiejp	~1M param audio model for real-time echo/noise cancellation	Local audio processing without cloud	Tiny neural net	Demo	HuggingFace
Talkie-1930 Roundtable	u/facethef	Vintage 1930s-style 13B language model in multi-model chat	Stylistically distinct local models for creative use	Custom finetune	Shipped	Website
DGX Spark + M3 Ultra Pipeline	u/-dysangel-	Disaggregated prefill (Spark) + decode (Mac) setup	Boosting prefill speed 2-3x without replacing decode hardware	exo, llama.cpp KV serialization	Experimental	post

Notable patterns: Infrastructure projects dominate builder activity today. Rather than end-user applications, builders are solving performance bottlenecks (MTPLX, FastDMS), deployment gaps (vibevoice.cpp), and quality assurance (Heretic reproducibility, ProgramBench). The MTP theme carries through builder activity -- MTPLX exists because Google's MTP release made native speculative decode viable on Apple Silicon.

6. New and Notable¶

SubQ: First Sub-Quadratic Sparse-Attention Architecture Claims 81% SWE-Bench¶

u/Scared_Bluebird_7243 posted about SubQ (score 219, 58 comments) (post), claiming 81% SWE-Bench at 5% of Opus pricing. u/CallMePyro [score 87]: "81% SWE Bench is extremely impressive." But u/enilea [score 20] flagged concerns: "This feels like VC investment bait, the site is a single page Claude frontend with the typical style it outputs by default... No technical report." The community is cautiously skeptical pending a paper.

FastDMS: 6.4x KV-Cache Compression Beating vLLM in Both Speed and Memory¶

u/randomfoo2 published FastDMS (score 105, 20 comments) (post), an MIT-licensed implementation of Dynamic Memory Sparsification that achieves 6.4x KV compression while decoding 1.5-2x faster than vLLM BF16. Quality metrics show 96.9% token match with lower KLD than vLLM's own FP8 quantization. The catch: integrating this into production engines like vLLM requires "major surgery" touching nearly every subsystem.

Google DeepMind London Workers Vote to Unionize Over Military AI Deals¶

u/shikizen posted that DeepMind employees voted to unionize specifically to block AI technology transfers to US and Israeli militaries (score 140, 24 comments) (post). u/pimmen89 [score 22] drew parallels to Swedish tech worker unionization at Klarna. This represents the first known AI lab unionization driven by ethical objections to military applications.

ProgramBench: Facebook Research Shows 0% Success on Program Synthesis from Executables¶

u/klieret (Facebook Research) published ProgramBench (score 114, 59 comments) (post), a 200-task benchmark where agents must rebuild executables from scratch given only the binary and readme. Current models show near-zero success. u/DramaLlamaDad [score 4] challenged the premise: "How many actual coders would be able to complete this task under the same restrictions?" The benchmark highlights the gap between SWE-bench success and genuine program understanding.

Senate GUARD Act Advances: Age Verification for AI Chatbots¶

u/SnoozeDoggyDog reported the Senate Committee advancing a bill banning AI companions for children (score 57, 16 comments) (post). u/Hefty_Wolverine_553 noted it uses "children's safety as a disguise to implement age verification for AI chatbots" (post). Combined with the White House vetting proposal, this represents a significant week for proposed AI regulation in the US.

Qwen3.6 27B FP8 at 80 TPS on Single RTX 5000 PRO 48GB¶

u/JockY demonstrated Qwen3.6-27B FP8 with MTP=2 via vLLM achieving 60-90 TPS with 200k tokens of BF16 KV cache on a single RTX 5000 PRO 48GB (score 128, 152 comments) (post). This establishes a concrete answer to "what do I buy for $10k?" -- competitive with cloud latency for agentic coding with full local ownership.

7. Where the Opportunities Are¶

[+++] MTP-aware inference tooling and model distribution -- Google released official MTP drafters, llama.cpp has beta support, and MTPLX proved 2.24x speedups on Apple Silicon. Yet most quantized model distributions still strip MTP heads. Tools that preserve MTP capability through the quantization pipeline, provide easy MTP setup, and extend support to all architectures (not just Qwen) address an immediate need with proven 2x performance gains.

[+++] AI-to-AI interaction security frameworks -- The Grok/Bankrbot exploit now has a documented morse code bypass, proving content filtering alone fails. Architectural solutions that separate AI reasoning from transaction execution -- mandatory human-in-the-loop for financial operations regardless of prompting -- have no current implementation despite demonstrated losses.

[++] KV-cache compression for production deployment -- FastDMS showed 6.4x compression with speed improvements, but requires "major surgery" to integrate into vLLM. TurboQuant just merged Qwen support but lacks quality benchmarks. A production-ready KV compression layer that works as a drop-in for existing inference engines would unlock substantially longer contexts on existing hardware.

[++] Transparent AI billing and usage controls -- The Anthropic billing exploit (800+ euro theft, credit damage, account banning) plus $80-200/day agentic costs create demand for: secure billing pipelines with proper SCA enforcement, real-time spend dashboards, automatic kill switches, and tiered compute routing (local -> cheap cloud -> frontier).

[+] Chinese model deployment infrastructure -- DeepSeek V4 Pro matches GPT-5.2 at 17x lower cost. MiMo v2.5 Pro is among the best models for complex tasks. But neither runs easily outside their native APIs. Inference providers that solve the deployment friction for Chinese models (non-standard formats, missing framework support) capture the cost-conscious segment.

[+] Local deep research agents -- 9 projects surveyed, most abandoned or vendor-locked. Only 2 qualified as healthy. The demand for reliable local research tooling (evidenced by the survey post and Scared-Virus-3463's pipeline) significantly exceeds the quality of available solutions.

8. Takeaways¶

Multi-Token Prediction reached critical mass with Google releasing official Gemma 4 MTP drafters, promising 2x speedup with identical output quality. Combined with llama.cpp beta and MTPLX achieving 2.24x on Apple Silicon, speculative decoding is becoming the default inference mode. (u/rerri post)
The Grok/Bankrbot $200K exploit used morse code to bypass content filters, proving that prompt-level safety cannot secure financial AI-to-AI interactions. The community consensus: no LLM should have transaction authority regardless of filtering. (u/ImCalcium post)
White House AI model vetting proposal generated 500+ comments across four threads with near-universal opposition. The highest-scored response: "I wanted to go with the Chinese or local ones anyway" -- suggesting the policy could accelerate the adoption it aims to control. (u/fallingdowndizzyvr post)
Anthropic's billing security failure drained 800+ euros from a user, destroyed their credit score, and resulted in their account being banned for reporting it. This shifts the cloud vs. local debate from cost to financial security. (u/peowwww post)
DeepSeek V4 Pro matched GPT-5.2 on agentic benchmarks at 17x lower cost, compressing the China-US frontier gap to ten weeks. MiMo v2.5 Pro also reached top-6, but deployment friction prevents third-party hosting. (u/Disastrous_Theme5906 post)
FastDMS achieved 6.4x KV-cache compression while running faster than vLLM BF16, with lossless quality. The catch: production integration requires rewriting most inference engine subsystems. This could be the next major infrastructure unlock after MTP. (u/randomfoo2 post)
Boston Dynamics Atlas achieved elite-level gymnastics moves, Hyundai demanded "tens of thousands" of robots, and Tesla announced manufacturing ramp-up. Humanoid robotics is entering commercialization, but the community demands domestic utility over athletic spectacle. (u/Distinct-Question-16 post)
Cloud AI pricing continues its upward trajectory with $10/two prompts, $80/week, and $100-200/day now routine. Multiple users report switching to local Qwen3.6 with Pi or Opencode as their daily driver, calling it "immensely freeing." The subsidy era is definitively over. (u/_maverick98 post)