Inefficient AI models are killing us

Last week ago I talked on whether AI data centers are stealing all your water. The short answer was no, not yet, but the trajectory is a growing problem. Data centers and AI overuse has become a hot button issue, and the question I kept getting back, in different shapes from different readers is, “Fine, but what are the labs actually doing about this?”

So I went looking.

Every frontier lab has a public posture on compute efficiency. They all have a small model. They all have a blog post about quantization or distillation or “responsible scaling.” Every single one of them is also pouring tens of billions of dollars into the largest data center buildout in industrial history.

The question I want to answer is whether the efficiency work is moving the needle, or whether it’s all for show. So I built a timeline. It’s the cleanest way I’ve found to see which labs led, which labs followed, and whether the curve is anywhere close to moving.

Jake’s compute-efficient LLM timeline

2015 - 2022

2023

2024

2025

2026 (So far)

April 2026

Anthropic. Claude Opus 4.7 releases alongside the Claude Code Skills framework and the sub-agent delegation pattern. This routes orchestration to Opus and grunt work to Haiku by default, which cuts both the bill and the compute for the median multi-step task. Marks the first time efficiency-by-design has shown up as a default product behavior.

Moonshot AI. Kimi K2.6 releases. 1T-parameter MoE with 32B active per token, priced ~88% below Opus 4.7. The second non-American lab in 18 months to undercut US frontier pricing by an order of magnitude.

DeepSeek. DeepSeek V4 releases, pushing the MoE curve further. API costs roughly an order of magnitude below US frontier prices, and the gap on benchmark capability is closer to zero than expected.

OpenAI. GPT-5.5 takes the #1 spot on the Artificial Analysis Intelligence Index with a reported 86% hallucination rate on certain factual benchmark categories. The clearest signal yet that OpenAI is optimizing for raw capability, not for efficiency or reliability, from the company most willing to spend an order of magnitude more compute for a single-digit benchmark bump.

Eleven years of papers, releases, and counter-moves. You can see which labs led on architecture (Google on the Transformer, distillation, and MoE; DeepMind on Chinchilla; DeepSeek on production MoE plus reasoning), which led on product packaging (OpenAI’s mini ladder, Anthropic’s prompt caching and delegation, Apple’s on-device bet), which led on infrastructure (UC Berkeley’s vLLM), and which led on capex (Microsoft on nuclear, OpenAI on Stargate). The efficiency work is present and it’s accelerating, but the capex work is also real, and it’s accelerating faster.

So is any of this enough?

No.

The unit economics are getting better, but the environmental impact is getting worse. Every lab can show you a chart of cost-per-million-tokens going down and to the right and none of them can show you a chart of their total annual energy consumption going down (because the line is straight up).

The IEA projects global data center electricity demand to roughly double by 2030 to around 945 TWh, driven primarily by AI. Lawrence Berkeley Lab’s 2024 United States Data Center Energy Usage Report puts US data centers at 6.7-12% of national electricity consumption by 2028, up from 4.4% in 2023. Google’s emissions are up 48% since 2019 and the company execs themselves are explicitly blaming AI. Microsoft’s overall emissions are up nearly 30% since 2020, with Scope 3 (the data-center construction and supply chain piece) doing almost all of the rising. AWS doesn’t break theirs out clearly, which is its own kind of answer.

The labs will tell you they’re working on efficiency, and they are, no lie there. They’ll also tell you efficiency frees them to do more, and it does. But Jevons paradox is doing what Jevons paradox always does: cheaper inference means more inference, not less total power draw. The frontier models are getting bigger, the thinking budgets per query are getting longer, and the number of agents per user is climbing every quarter.

A second pattern worth naming: every architectural efficiency win of the last three years has either originated outside the US frontier labs or arrived in their products only after a non-US competitor forced the move. Meta moved to MoE for Llama 4 three months after DeepSeek’s R1. Anthropic’s prompt caching is the most generous American exception, and even it landed years after the technique was in the literature. The American hyperscalers aren’t leading efficiency, they’re just reacting to it on a delay while spending faster.

The labs have decided on your behalf that productivity gains and scientific upside justify the energy costs.

Why this should scare the industry

There’s a tactical case for caring about this that has nothing to do with the environment (though, that should be enough).

Public opinion on AI is in a worse place than the industry pretends. The most recent Pew Research survey has 51% of US adults more concerned than excited about increased use of AI in daily life, against just 11% who are more excited than concerned. That concerned-vs-excited gap has widened from 37% concerned in 2021 to a steady ~50% across 2023, 2024, and 2025. Environmental impact is one of the top concerns named alongside job displacement and disinformation. Those water and energy stories I was talking about are running on MSNBC and Fox at the same time, which is weird.

People rejected genetically modified food. People rejected fur. People rejected single-use plastic at retail scale. People rejected fast fashion brands, slowly and partially but visibly enough to reshape that industry. The pattern in every case is the same: a 5 to 15% consumer defection layered on top of a regulatory wave that the companies didn’t see coming because their internal polling told them they were fine.

If a meaningful slice of ChatGPT subscribers cancel because of energy guilt, that is a material revenue hit on a consumer side that now accounts for roughly two-thirds of OpenAI’s top line, per recent reporting on the company’s 2025 revenue mix. If a state attorney general launches an “AI emissions disclosure” suit and wins (and I’d bet these are coming), the discovery phase alone forces public methodology on energy reporting. If an EU regulator decides the AI Act’s energy disclosure requirements have actual teeth, then every frontier lab has to publish numbers they’ve spent years hiding.

The labs are betting that capability dazzles people enough to outrun the backlash and they’ve been right on this for three years. Hell, they might be right for three more. But the cultural ground is shifting underneath them, and the efficiency posture they’ve adopted (small model variants buried inside the API page, vague PR numbers, nuclear power deals announced for five years out) is calibrated to an old 2023 conversation. The 2026 conversation is louder, more specific, and more informed.

If you run an AI lab and you’re reading this:

Publish real, audited energy numbers per model, per query class.
Stop the marketing-range game on watt-hours.
Cap the headline model’s compute growth at the rate of demonstrated capability improvement, not at the rate of capex availability.
Ship the small model first, the big model second.
Make efficiency an actual product constraint instead of a PR line.

This is what needs to be done, but the current incentive structures inside a $90B/year AI company won’t reward it. I’d hate for the most important technology of my career to get killed by a backlash that the industry could have priced in years earlier.

Originally published on the Handy AI newsletter →