Reddit AI - 2026-05-26¶
1. What People Are Talking About¶
1.1 AI video manipulation is moving faster than trust defenses (🡕)¶
The highest-engagement Reddit discussion on May 26 was not about a new model benchmark or price cut. It was about how convincing AI-assisted video editing and view reconstruction are becoming, and whether that breaks old assumptions about visual evidence. Two highly upvoted singularity threads drove this theme, and both discussions turned quickly from "wow" to surveillance, misuse, and evidentiary collapse.
u/Able-Line2683 posted The Strength of Gemini Omni is in video manipulation (3000 points, 316 comments). The post itself is brief, but the comments explain why it spread: u/A_Novelty-Account (score 502) said recorded events may soon stop functioning as evidence that those events happened, while u/MrKvic_ (score 380) argued that Omni's real strength is editing existing media rather than generating from scratch. That combination matters because it shifts the conversation from novelty to integrity: people are reacting not only to quality, but to how cheaply believable edits can now be made.
u/keemalexis reinforced the same anxiety in reconstructing different angles from live footage (1474 points, 145 comments). The post describes 4D Gaussian splatting that converts flat footage into spatial data, and the replies immediately framed it as surveillance or sexualized misuse: u/Happy_Brilliant7827 (score 244) compared it to CSI-style angle recovery, while u/CrowdGoesWildWoooo (score 489) said it would obviously be used for "research" purposes. The technical novelty is real, but the community reaction shows that public trust is not keeping pace with the capability curve.
Discussion insight: Commenters were less interested in which lab shipped the capability than in what happens when edited or reconstructed footage becomes routine. The dominant concern was not model quality but the weakening of media literacy and proof.
Comparison to prior day: May 25 already had strong multimodal "reality shock" energy, but May 26 pushed the focus more directly onto evidence tampering and synthetic video manipulation as the lead public concern.
1.2 AI ROI skepticism is hardening into budget cuts and institutional rules (🡕)¶
The prior day's cost debate did not fade. It broadened into a more punitive mood: internal budgets are being cut, failed automation examples are accumulating, and institutions are writing formal rules against unrestricted AI use. The through-line is that "use more AI" is no longer persuasive on its own; people now want workflow-level proof that it saves money or improves judgment.
u/mpuchala shared Microsoft reports are exposing AI's real cost problem: Using the tech is more expensive than paying human employees (421 points, 93 comments). Fortune says Microsoft is removing most direct Claude Code licenses while pushing teams toward GitHub Copilot CLI, and Uber had already burned through its 2026 AI coding-tools budget in four months (Fortune; The Verge). The most cited Reddit synthesis came from u/Zestyclose-Treat-616 (score 40), who said retries, hallucination review, workflow integration, human oversight, security, and reliability engineering make AI-assisted employee versus non-assisted employee the more honest comparison.
The same skepticism showed up in physical operations. u/andrewaltair posted Starbucks just scrapped their automated inventory AI after only 9 months (58 points, 7 comments). Futurism, citing Reuters, says the tool frequently miscounted or mislabeled ingredients such as milk and syrup bottles before Starbucks retired it and returned stores to manual counting (Futurism). Even though the Reddit score was modest, it was one of the clearest examples in the dataset of a real business rolling back AI after a simple operational task failed.
The governance side became more explicit too. u/andrewaltair posted UC Berkeley Law is completely banning AI use starting summer 2026 (291 points, 60 comments), and The Decoder says the school is banning AI for brainstorming, drafting, editing, translating, proofreading, and exams, allowing it only for legal research (The Decoder). In a parallel but broader moral register, u/andrewaltair posted Pope Leo XIV just dropped a massive 42,300-word encyclical on AI (356 points, 67 comments); The Guardian says the encyclical calls for AI to be "disarmed," warns against concentrated power over data and infrastructure, and argues that warfare uses must face the "most rigorous ethical constraints" (The Guardian).
Discussion insight: Reddit is no longer treating cost complaints, school bans, and moral governance language as separate stories. They are being read as evidence that uncontrolled AI adoption is now meeting resistance from finance, operations, education, and public ethics at the same time.
Comparison to prior day: May 25 centered on whether AI is actually cheaper in production. May 26 kept that cost argument, then added stronger proof of retrenchment: internal license cuts, a failed inventory rollout, a law-school ban, and a papal encyclical.
1.3 Open local stacks are specializing around control, documents, and edge deployment (🡕)¶
Local-AI discussion stayed strong, but the emphasis shifted from generic hardware talk to specialized deployments: uncensored model tooling, document extraction, vertical legal workflows, and small-board multimodal inference. The strongest posts were not abstract open-source manifestos. They were concrete builds with model cards, repo links, measured throughput, or deployment constraints.
u/-p-e-w- drew mainstream attention to uncensored model tooling in The Financial Times has published an article about Heretic (786 points, 204 comments). The post quotes the FT saying Heretic removed Llama 3.3 guardrails in under 10 minutes and that creator Philipp Emanuel Weidmann said the tool had already been used to create more than 3,500 decensored models whose downloads totaled 13 million. That discussion then fed into a concrete artifact: u/LLMFan46 released Qwen3.5 35B A3B uncensored heretic Native MTP Preserved is Out Now With the Full 785 MTPs Preserved and Retained, Available in Safetensors, GGUFs. NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats (329 points, 63 comments). The linked model card claims 85% fewer refusals than the original with 0.0487 KL divergence and only a small MMLU drop from 84.12% to 83.72% (Hugging Face).
u/Gailenstorm posted NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction (self-hostable) (224 points, 51 comments). The post says the model is meant for PDFs, receipts, forms, tables, and multi-page documents; the model card adds a structured benchmark where NuExtract3.4_4B-RL scored 0.651 versus 0.538 for gemma-4-E4B-it, plus Apache-2.0 licensing and quantizations down to a 4GB-VRAM floor (Hugging Face). The most practically useful reply came from u/Bubulela (score 7), who said they wanted to replace Gemini Flash 3 because "the cost adds up fairly quickly," tying the builder story back to the day's cost theme.
u/TumbleweedNew6515 published the day's densest deployment writeup in Update on 12x32gb sxm v100 cluster / local AI for legal drafting (296 points, 96 comments). The post says the stack moved away from vLLM and onto llama.cpp because MoE GGUFs on Volta were the only route to usable speed, then reports rough decode rates of about 113 tok/s for Gemma-4-26B-A4B, 82 tok/s for Qwen3.6-35B-A3B, and 50 tok/s for Qwen3.5-122B-A10B on real drafting prompts. A smaller but equally concrete edge variant came from u/Known_Ice9380 in Wrote a custom C++ engine for MiniCPM-V 4.6 on Orange Pi AIPro (Ascend 310B) to bypass framework overhead (39 points, 5 comments): the repo says a from-scratch C++/AscendC engine doubled decode speed from 2.88 to 5.90 tokens/s while keeping both text and vision on the NPU hot path (GitHub).

That same optimization mindset also showed up in u/Simple_Library_2700's 1000 tps generation on Qwen3.6 27B with V100s (224 points, 74 comments), where the attached benchmark table reports 1322.72 total tokens/s on prompt processing for qwen3.6-27b-awq and a 1562-token peak. In the smaller-model lane, u/kevinlch surfaced MiniCPM5-1B (112 points, 26 comments); the model card says it has 1.08B parameters, 679.6M non-embedding parameters, and 131,072 context length, while u/jake_that_dude (score 15) called that combination more interesting for a cheap local tool router than as a chat bot (Hugging Face).

Discussion insight: The common demand across these posts was control: control over refusals, over deployment cost, over document workflows, and over the hardware/runtime path itself. Local AI is increasingly being used not as a hobby category but as a way to escape pricing, policy, or infrastructure constraints.
Comparison to prior day: May 25 focused on hardware preference, AMD runtimes, and local refusal-policy workarounds. May 26 kept the control theme but moved deeper into specialized artifacts: decensoring tools, document VLMs, legal-drafting stacks, and sub-$200 edge deployment.
2. What Frustrates People¶
AI cost claims keep failing the full-workflow test¶
Severity: High. The strongest frustration on May 26 was not that models are expensive in isolation, but that organizations still cannot predict what an AI workflow really costs once retries, review, integration, and failed deployments are counted. In the Microsoft/Uber thread, u/Zestyclose-Treat-616 (score 40) said the honest comparison is AI-assisted employee versus non-assisted employee because retries, hallucination review, security, and reliability work all add up in production (post) (421 points, 93 comments). Fortune adds that Uber exhausted its 2026 AI coding-tools budget in four months and that Microsoft is removing most Claude Code licenses while standardizing around Copilot CLI (Fortune). Starbucks supplies the concrete field failure: Reuters-backed reporting says its inventory AI kept miscounting bottles and was retired after nine months (post) (58 points, 7 comments); (Futurism). People cope either by limiting AI to assistive roles or by moving tasks to local stacks they can meter more tightly. This looks worth building for because the pain spans developer tools, agents, and physical-world automation.
Trust breaks when media and agents stop being inspectable¶
Severity: High. The most emotional discussion of the day came from people who felt they were losing the ability to verify what they were seeing or what an agent had done. In the Gemini Omni thread, u/A_Novelty-Account (score 502) said recorded events may no longer work as evidence, and u/Enrico_Tortellini (score 189) argued that poor media literacy will make persuasive synthetic content especially dangerous (post) (3000 points, 316 comments). The same trust problem showed up in agent design: u/RonnySaya argued that users need to know what an agent clicked, submitted, retried, or skipped, and u/Born-Exercise-2932 (score 2) said audit trails are what make failure reconstruction and handoff possible (post) (37 points, 35 comments). People are asking for verification layers, not just better outputs. This is worth building for because the failure mode is hidden error, not only bad UX.
Open local stacks still impose a heavy setup and runtime tax¶
Severity: Medium. Builders are clearly making progress, but the operator burden remains visible. u/weilding said a "five minute setup" for an open-source agent turned into an evening of YAML, environment variables, and skill markdown (is anyone else frustrated with how much config open source AI agents need? (10 points, 23 comments). Even in positive builder threads, the caveats are operational: NuExtract3 users discussed vLLM weight-key issues and config mismatches, while the legal-drafting cluster post explains that vLLM on Volta was a dead end for the author's MoE GGUF workflow, forcing a migration to llama.cpp (NuExtract3 thread) (224 points, 51 comments); (legal drafting thread) (296 points, 96 comments). People cope by specializing hard and documenting their exact hardware path. This is still worth building for because the audience is motivated, but too much of the work remains infrastructure babysitting.
3. What People Wish Existed¶
Agent systems that show every action, retry, and handoff¶
This was the clearest explicit need in the dataset. u/RonnySaya asked for agents that expose every click, submission, retry, and stop condition in AI agents need audit trails more than they need more autonomy (37 points, 35 comments), and u/Born-Exercise-2932 (score 2) added that audit trails are what preserve context during human or agent handoff. This is a direct opportunity, not an aspirational one: the pain is operational, the requested feature is specific, and the trust gap is already blocking adoption.
Low-cost private document extraction that can replace paid cloud OCR¶
The NuExtract3 thread was both a launch post and a demand signal. u/Bubulela (score 7) said they were trying to replace Gemini Flash 3 because "the cost adds up fairly quickly," while other replies asked about book scanning, multi-column layouts, and dense tables in real workflows (post) (224 points, 51 comments). Because some alternatives already exist, this is a competitive opportunity rather than an empty white space, but the repeated emphasis on self-hosting, low VRAM, and structured extraction shows demand for cheaper private pipelines.
Open-source agent stacks that do not start with a config marathon¶
The need here is mundane but practical. u/weilding described spending hours on YAML, env vars, and skill markdown before getting a basic agent running in is anyone else frustrated with how much config open source AI agents need? (10 points, 23 comments). This is a direct opportunity for wrappers, installers, and opinionated defaults, though it is likely to be competitive because many teams can see the same friction.
4. Tools and Methods in Use¶
| Tool | Category | Sentiment | Strengths | Limitations |
|---|---|---|---|---|
| Claude Code | Coding CLI | (+) | Popular inside Microsoft; useful enough that developers preferred it during the internal comparison period (The Verge) | License cost became a budgeting problem; Microsoft is removing most direct licenses |
| GitHub Copilot CLI | Coding CLI | (+/-) | Microsoft says it can be shaped to its repos, workflows, and security expectations (The Verge) | Still seen as behind Claude Code by internal users, which is why the switch is contentious |
llama.cpp |
Inference runtime | (+) | Let the legal-drafting builder run MoE GGUFs on V100 hardware and fix a Gemma chat-parser issue (legal drafting thread) (296 points, 96 comments) | Dense models on the same hardware were still too slow for the author's target workflow |
| vLLM | Inference runtime | (+/-) | Familiar serving path and used for NuExtract3 benchmarks | On Volta it was described as a dead end for the desired MoE GGUF setup; NuExtract3 users also reported weight-key and config friction |
| NuExtract3 | Document VLM / OCR | (+) | Apache-2.0, Markdown plus JSON extraction, benchmark lead over several small models, runs with 4GB VRAM floor (Hugging Face) | Users still asked about tricky layouts, academic papers, and runtime quirks |
| MiniCPM-V 4.6 | Multimodal VLM | (+) | Works on Orange Pi AIPro through a custom NPU-first C++ engine with no torch_npu hot path (GitHub) |
Requires custom kernels, Ascend tooling, and single-batch greedy decode assumptions |
| Heretic | Model editing tooling | (+/-) | Removes guardrails quickly and has obvious demand in the local-model community (Heretic thread) (786 points, 204 comments) | Mainstream scrutiny, takedown risk, and legal pressure are now part of the operating environment |
| Qwen3.5/Qwen3.6 MoE variants | Local LLMs | (+) | Fast enough on older V100 hardware for drafting and batch serving; multiple formats are available | Naming, quantization, and runtime choices are confusing, and the best path depends heavily on the use case |
| Gemini Flash 3 | Cloud model / OCR alternative | (+/-) | Users said it works well on document workflows | Repeatedly cited as costly enough to motivate migration to local models |
Overall satisfaction was polarized by control. Cloud tools were praised when they worked, but the loudest migration signals pointed away from opaque bills and toward stacks users could tune or host themselves. The clearest runtime migration was from vLLM to llama.cpp for Volta-era MoE work; the clearest organizational migration was from Claude Code to Copilot CLI under budget pressure; and the clearest product substitution story was from Gemini Flash 3 toward self-hosted document extraction.
5. What People Are Building¶
| Project | Who built it | What it does | Problem it solves | Stack | Stage | Links |
|---|---|---|---|---|---|---|
| NuExtract3 | u/Gailenstorm | Open-weight 4B document VLM for Markdown conversion and structured extraction | Reduces dependence on paid OCR/document APIs for forms, receipts, tables, and scans | Qwen3.5-4B base, Apache-2.0, GGUF, MLX, GPTQ, W8A8, FP8, vLLM, SGLang, llama.cpp |
Shipped | post, model |
| MiniCPM-V 4.6 Orange Pi engine | u/Known_Ice9380 | From-scratch C++ inference engine for multimodal chat on Orange Pi AIPro 20T | Avoids framework overhead on Ascend edge hardware | C++, AscendC, Gradio, MiniCPM-V 4.6, Orange Pi AIPro 20T | Alpha | post, repo |
| Local legal-drafting cluster | u/TumbleweedNew6515 | Multi-model orchestrated drafting and review workflow for legal work | Keeps long-context legal drafting local while sustaining usable throughput | llama.cpp, Claude Code orchestration, Gemma-4-26B-A4B, Qwen3.6-35B-A3B, Qwen3.5-122B-A10B, V100 and 3090 servers |
Beta | post |
| Heretic | u/-p-e-w- | Tool for removing model guardrails | Gives local users more control over refusal behavior | Heretic, GitHub distribution, MPOA-style model editing | Shipped | post, repo |
| Qwen3.5 35B A3B uncensored Heretic release | u/LLMFan46 | Benchmarked uncensored release in multiple formats | Packages a general-purpose uncensored local assistant without forcing one runtime | Heretic, Safetensors, GGUF, NVFP4, GPTQ-Int4 | Shipped | post, model |
The strongest repeated build pattern was not "general AI app" but tightly scoped infrastructure. Builders were either cutting recurring cloud cost out of a workflow, reclaiming control over refusals, or forcing hardware they already owned to do useful local work. The legal-drafting stack and the Orange Pi engine are especially notable because both are highly specific responses to infrastructure friction: one solves Volta-era throughput limits with MoE plus routing, and the other bypasses heavyweight frameworks entirely to make cheap edge multimodal inference practical.
6. New and Notable¶
Uncensored-model tooling crossed into mainstream coverage¶
Heretic was already a live topic in LocalLLaMA, but May 26 made it notably more public. u/-p-e-w- said the Financial Times tested Heretic on Meta's Llama 3.3 and quoted adoption numbers of 3,500+ decensored models and 13 million downloads in The Financial Times has published an article about Heretic (786 points, 204 comments). The importance is not just the tool itself; it is that a previously insider debate about refusal removal is now being framed as a mainstream policy and platform issue.
AI governance arguments are now being written by institutions outside tech¶
May 26 also stood out for the breadth of who was speaking. UC Berkeley Law adopted a near-total ban on AI use in graded work except research contexts (post) (291 points, 60 comments), while Pope Leo XIV's encyclical called for AI to be "disarmed" and subjected to strict ethical limits (post) (356 points, 67 comments). That combination matters because it shows governance pressure no longer coming only from regulators, labs, or journalists.
7. Where the Opportunities Are¶
[+++] AI cost accounting for real workflows — Multiple sections point to the same gap: Microsoft and Uber are struggling to justify usage-based coding bills, Starbucks rolled back an inventory system after real-world errors, and commenters repeatedly distinguished cheap tokens from cheap workflows. The opportunity is strong because the pain is already expensive and broadly shared.
[++] Agent auditability and replay — Users want to know what agents clicked, submitted, retried, and handed off. This appears in the trust crisis around synthetic media and in the explicit audit-trail thread, making it a moderate opportunity with a very concrete product shape.
[+] Private document extraction on small hardware — NuExtract3, MiniCPM-V edge deployment, and the MiniCPM5-1B interest all show demand for cheap local document and tool-routing workflows. This is an emerging opportunity because the market already has active builders, but the recurring requests around cost, OCR quality, and simpler deployment show room for better products.
8. Takeaways¶
- The biggest public AI anxiety today was not job loss or AGI; it was whether edited and reconstructed video can still be trusted. That concern drove the highest-engagement post of the day and shaped the discussion around Gemini Omni and 4D reconstruction. (source)
- The ROI story has moved from abstract pricing to concrete rollback and retrenchment. Microsoft and Uber are being discussed through budget overruns, while Starbucks supplies a clear example of AI automation being retired after failing a routine task. (source)
- Local AI is increasingly about control, not ideology. The retained builder posts focused on refusing cloud bills, avoiding guardrails, fitting models onto owned hardware, or making domain-specific workflows such as legal drafting and document OCR actually run. (source)
- Governance pressure is broadening beyond regulators and labs. A law school ban and a papal encyclical landed on the same day, which gave Reddit unusually concrete evidence that AI boundaries are now being argued in education and public ethics, not only in product teams. (source)