Eight years of unchallenged dominance in AI inference just found its first serious challenger.
Gimlet Labs closed an $80 million Series A round on Monday — a figure that would barely register as notable in today's AI funding landscape if it weren't for what the company is actually building. The San Francisco startup has developed middleware that allows large language models to run inference across NVIDIA, AMD, Intel, ARM, Cerebras, and d-Matrix chips simultaneously, distributing computational load in ways that no single-vendor solution can match.
The funding signals something more consequential than another AI infrastructure bet. Investors are wagering that the era of GPU monoculture — where NVIDIA's Hopper and Blackwell architectures command near-total control over model deployment — is structurally vulnerable. "The inference market is becoming commoditized faster than people realize," said one investor familiar with the round, who asked not to be named. "No single chip vendor can satisfy the demand curve we're looking at over the next three years."
Gimlet's technology addresses a real bottleneck. As enterprises deploy larger models for production applications, the cost and availability of inference compute has become a binding constraint. NVIDIA's H100 and B200 GPUs remain scarce, expensive, and heavily allocated to cloud hyperscalers. Smaller players and enterprise buyers face long lead times and unfavorable pricing. Gimlet positions itself as the plumbing that makes heterogeneous hardware viable — letting customers mix AMD's MI300X, Intel's Gaudi accelerators, or specialized inference chips from Cerebras without rewiring their model infrastructure.
The deal size relative to sector benchmarks underscores investor conviction. Series A rounds for AI infrastructure startups averaged $28 million in 2025, according to PitchBook data. Gimlet's $80M represents nearly three times that median — a premium that reflects both the technical ambition and the strategic bet on post-NVIDIA inference infrastructure.
The company will use the funding to expand its engineering team and scale its compiler technology, which translates model architectures into optimized execution paths across disparate silicon. The current version supports six distinct chip ecosystems without requiring developers to modify their model code. That's the key differentiator: abstraction without performance sacrifice.
Whether Gimlet becomes the connective tissue of a heterogeneous AI future or simply a transitional layer before chip vendors build their own interoperability standards remains an open question. NVIDIA's CUDA ecosystem remains deeply entrenched, and the company's own inference optimization tools continue to improve. But the $80M signals that the market believes alternatives have a genuine shot — and that the assumptions underpinning NVIDIA's valuation premium in inference may not hold indefinitely.