Research Synthesized from 1 source

Why Open Models Won't Close the Capability Gap

Key Points

• Gap persists due to structural incentives, not talent or compute shortages
• Closed models hold unmeasurable advantages benchmarks cannot capture
• Distillation enables fast-following but not frontier exploration
• Open model demand exists but supply remains economically dictated
• Benchmark parity does not equal capability parity in premium use cases

References (1)

[1] Analysis: Open AI models struggle to close gap with closed labs — Interconnects ↗

The belief that open AI models are steadily closing the capability gap with closed labs is a comfortable narrative that the data refuses to confirm. The gap persists not because open research lacks talent or compute, but because the incentive structures governing open development fundamentally do not reward the same moat-building that closed labs pursue relentlessly.

This is the uncomfortable conclusion emerging from a careful examination of capability scaling curves through mid-2026. According to analysis from Interconnects, it is "surprisingly" the case that top closed models have not shown a growing capability margin over open alternatives, despite the massive compute advantages that should, in theory, widen that gap. If raw resources were the only variable, the divergence should be accelerating. Instead, it has stabilized — but it has not inverted.

The reason is structural, not technical. Open model labs have become genuinely skilled at matching closed competitors on established benchmarks. This is not a small achievement. It reflects both abundant talent and sufficient computing power deployed strategically. Chinese open-weight labs have pushed this dynamic furthest, investing heavily in benchmark performance as a fundraising and adoption mechanism. Keeping the "catching up" narrative alive is commercially rational when your ability to attract investment depends on demonstrating parity or superiority on measurable tasks.

But benchmarks measure what can be measured. Closed models maintain hard-to-quantify qualities that remain invisible to these evaluations — the robustness that matters when knowledge workers present novel challenges, the general usefulness that converts casual users into daily dependents. These are precisely the qualities that build durable competitive moats. You cannot distill what you cannot perceive, and you cannot replicate what you cannot observe.

Distillation — the process of extracting capabilities from closed models to improve open ones — has genuine value but is not a substitute for building original capabilities. It allows fast-following on established domains. It does not enable frontier exploration. The Chinese LLM companies benefit from distillation dynamics, but Interconnects notes that changes in these dynamics, including potential regulation, "will not be a determining factor on the balance of capabilities." The moat is built elsewhere.

The demand-side story compounds this dynamic. Organizations, individuals, and sovereigns genuinely want open models. This demand is real and growing. But demand and supply have decoupled. Supply remains dictated by economics — specifically, by the calculus of whether releasing a model strengthens or weakens a company's competitive position. Closed labs release models when doing so serves strategic purposes: capturing ecosystem value, establishing standards, or pre-empting regulatory threats. They do not release models that would eliminate their advantages.

The result is a stable equilibrium that should not surprise us in retrospect. Open models will continue to narrow the gap on measurable, established tasks. They will continue to trail on the unmeasurable qualities that matter most for premium use cases. The "catching up" framing implies a finish line that does not exist, because the finish line keeps moving. Closed labs are not standing still while open models advance — they are building new moats at the same pace, just in territories that benchmarks cannot see.

This does not make open models unimportant. They serve critical needs for sovereignty, auditability, and cost optimization. But the expectation that open development will eventually achieve parity in the ways that matter most for AI's highest-value applications misreads the incentive architecture of both ecosystems. The gap is structural. Until the economics of open research change in ways that reward frontier capability building rather than benchmark optimization, the curve will remain flat.