Not many people think about the under-the-hood models powering ChatGPT. To the average user, the incremental updates to the world’s fastest growing product have just meant that sometimes it feels like its getting smarter (or that now a brief Thinking… dialogue appears when a hard question is asked). More technical users pay closer attention, tracking each and every little change that each of OpenAI’s nine model releases have brought to the platform.

Today’s update isn’t a little change; this is the big one. GPT-5, worthy of the full number upgrade, promises to wow ChatGPT users in a way that hasn’t happened since GPT-4’s release in 2023.

TL;DR

The fine details

GPT-5 keeps OpenAI’s promise of combining all its model variations into a single system.

In ChatGPT, you’ll be talking to a multi-model router that evaluates what type of request you’re making and dynamically delegates between “thinking” (high-reasoning) and “main” (fast-response) models. That router learns in real time for each user, incorporating model-switch patterns, preference signals, and correctness outcomes. (For example, you can say “think hard about this” in your prompt and watch it change behavior.)

GPT-5 will now powering ChatGPT by default, even for free-tier users. The fast-response model (gpt-5-main) handles most queries, and the smarter model (gpt-5-thinking) gets routed in for hard ones.

This means:

Under the hood

All model versions are also addressable directly in the API if folks want fine-grained control. The top-end model supports an expanded 400k context window (including 128k max output tokens).

New API features:

Developers can dial in tradeoffs between speed, cost, and accuracy using reasoning_effort, pick between full/nano variants, steer output verbosity, and now format tool calls in plaintext.

This enables new architectures:

Enterprise users (e.g. BNY Mellon, Intercom, Notion) are already integrating GPT-5 into production workflows, claiming improvements in runtime stability, tool robustness, and versioning guarantees via snapshots.

If you want more details (and specific bench metrics), check out OpenAI’s post on GPT-5 for developers.

OpenAI vs... OpenAI

GPT-5 decisively replaces every ChatGPT model before it. Here's the lineage:

GPT-5 claims to beat all predecessors across every relevant benchmark: coding (SWE-bench, Aider polyglot), tool use (τ2-bench), long-context retrieval (MRCR, BrowseComp), instruction following (COLLIE, Scale MultiChallenge), factuality (FactScore, LongFact), and hallucination minimization.

It also supports multi-tool reasoning: parallel calls, tool error recovery, preamble messages between steps, and the ability to report progress live.

On τ2-bench telecom (a complex, mutable-environment tool use benchmark), no other model broke 50%. GPT-5 scores 96.7%. It dominates multimodal reasoning tasks (CharXiv, MMMU-Pro), passes long-context tests (up to 256k tokens) with >90% match rates, and ranks highest on instruction-following and factuality.

Claude’s most recent models is comparable on raw performance, but lags GPT-5 on agentic tasks, tool integration, and frontend code generation. Gemini 2.5 Pro is still strong on vision and long-context, but slower and more prone to hallucinations under pressure. Mistral’s open weights can’t compete on safety or task orchestration (though, reminder, OpenAI’s open source models just released this week as well).

OpenAI’s future legacy

GPT-5 is OpenAI’s first model designed to scale as infrastructure. Everything in this release (the router, the API modularity, the control parameters, the agent-first mindset) is pushing us further toward a world where intelligence is embedded in everything.

The product strategy is clear:

The future is obvious: longer context, tighter agent integration, more real-world interfaces, and smaller deployment footprints for on-device execution. All of that will now built on GPT-5’s spine and streamlined product experience.

Originally published on the Handy AI newsletter →