Model Release Synthesized from 4 sources

Altman Oversold Images 2.0, But Text Rendering Actually Improved

Key Points

• Text rendering is the commercial threshold that matters, not general image quality
• Simon Willison's Where's Waldo test: gpt-image-1 failed, gpt-image-2 succeeded
• Thinking capability allows web search during generation to verify text
• English text now production-ready; non-English languages still unreliable
• Altman's GPT-3-to-GPT-5 comparison is horizontal overstatement, not vertical truth

References (4)

[1] OpenAI Launches ChatGPT Images 2.0 with Web Search — The Verge AI ↗
[2] ChatGPT Images 2.0 Excels at Text Rendering — TechCrunch AI ↗
[3] OpenAI Upgrades ChatGPT Image Generation Model — Wired AI ↗
[4] OpenAI launches ChatGPT Images 2.0 with major quality leap — Simon Willison's Weblog ↗

Sam Altman's comparison between AI image models and language model generations is, charitably, overreaching. The jump from gpt-image-1 to gpt-image-2 is not the equivalent of GPT-3 to GPT-5. But one specific capability Altman de-emphasized is worth your attention: text rendering in generated images actually works now—and that's the feature that determines whether this tool has real commercial value.

Text inside images has been the persistent failure mode for AI image generators. A bakery cannot use a model that renders "Happy Birthday" as gibberish. A marketing team cannot deploy a tool that produces illegible labels. For the past two years, AI-generated images remained novelties precisely because they could not be trusted with words. That constraint is now breaking.

Simon Willison tested this with a practical methodology: he prompted both gpt-image-1 and gpt-image-2 to create a Where's Waldo-style scene containing a raccoon operating a ham radio. The older model produced an image where no human reviewer could locate the raccoon—despite it supposedly being present. When Willison asked Claude Opus 4.7 with high-resolution vision to analyze the gpt-image-1 output, the model hallucinated a raccoon that did not exist, pointing to an instruction card and insisting the animal must be hiding somewhere. The generation had failed silently.

The same prompt with gpt-image-2 produced an image where the amateur radio club booth was clearly labeled "AMATEUR RADIO CLUB - W6HAM"—and the raccoon was immediately visible operating the equipment. The text rendered correctly. The scene followed the instruction. The hidden object was actually hidden.

This is the benchmark that matters. Not whether the model can produce beautiful landscapes or surreal art—current models handle those tasks adequately. But text is where commercial use cases live. E-commerce product images with pricing labels. Social media graphics with brand names. Infographics with data labels. Presentations with headlines. None of these worked reliably before because any attempt to render words produced garbled characters or missing letters.

The improvement comes from a new "thinking" capability in gpt-image-2, where the model can access web search during the generation process to verify proper spelling, check logo designs, and confirm brand typography. This mirrors the reasoning approach that made o1 and o3 computationally powerful for language tasks.

Multiple outlets confirmed the text rendering leap. TechCrunch documented legible signage in generated scenes. Wired verified "significantly better" text output. The model still struggles with non-English characters—a meaningful limitation for global commerce—but English text now works reliably enough for production use.

Altman's framing missed the point. The equivalent-of-GPT-5 claim suggests a horizontal leap across all capabilities. The actual improvement is vertical: text rendering crossed a threshold that makes business applications viable, while other capabilities improved incrementally. This is still significant—but it's the difference between a technology becoming useful and a technology becoming marginally better at things people already use it for.

The model is rolling out to ChatGPT Plus, Pro, Business, and Enterprise subscribers today.