Industry Synthesized from 5 sources

10,000 NVIDIA Staff Run Codex at 35x Lower Cost Than Last Year

Key Points

  • 10,000+ NVIDIA employees now run Codex in production on GB200 NVL72
  • GB200 NVL72 delivers 35x lower cost per million tokens vs prior generation
  • Debugging cycles compressed from days to hours; features ship overnight
  • Enterprise security model: cloud VMs, zero data retention, read-only access
  • NVIDIA IT provisioned dedicated VMs for every employee before deployment
  • API access still pending; production deployment leads API by weeks
References (5)
  1. [1] GPT-5.5 powers Codex on NVIDIA GB200, delivering 35x cost reduction — NVIDIA AI Blog
  2. [2] OpenAI Explains What Codex Is and Its Capabilities — OpenAI Blog
  3. [3] Codex Now Supports Scheduled Automations and Triggers — OpenAI Blog
  4. [4] Codex Plugins and Skills Enable Workflow Automation — OpenAI Blog
  5. [5] GPT-5.5通过非官方API可用,官方接口尚待开放 — Simon Willison's Weblog

The real story buried inside OpenAI's latest model release isn't GPT-5.5. It's the infrastructure underneath.

NVIDIA just deployed Codex—powered by GPT-5.5—across more than 10,000 employees on GB200 NVL72 rack-scale systems. The result: a 35x reduction in cost per million tokens compared to prior-generation infrastructure, plus 50x higher token output per second per megawatt. Those aren't projections. They're the numbers on a live deployment inside a 30,000-person company.

Think about what that means for software development inside NVIDIA. Debugging cycles that once consumed days now close in hours. Feature work that required weeks of iteration is shipping overnight. Natural-language prompts generate end-to-end functionality with enough reliability that teams are treating these tools as production infrastructure, not research experiments.

Jensen Huang sent a company-wide email urging every NVIDIA employee to adopt Codex. "Let's jump to lightspeed," he wrote. "Welcome to the age of AI." This isn't CEO marketing—it's a founder betting his own workforce on the technology.

The GB200 NVL72 architecture matters here. These rack-scale systems were designed explicitly for inference workloads at enterprise density. NVIDIA built them; NVIDIA is now proving them out at the only scale that matters: real engineers doing real work on real codebases.

The security model is equally instructive. NVIDIA IT provisioned cloud virtual machines for every employee, allowing Codex agents to operate in sandboxed environments with SSH access to approved cloud infrastructure. Zero data retention governs the deployment. Agents access production systems with read-only permissions through command-line interfaces. This is what enterprise-grade agentic AI looks like—not bolted-on security, but architecture designed from the ground up.

The 35x cost figure is the thesis in a number. At prior economics, deploying frontier-model inference to 10,000 knowledge workers would have been a budget conversation, not a deployment decision. At 35x better, it's infrastructure.

This points to a coming bifurcation in enterprise AI. Companies with access to next-generation inference infrastructure will operate at a fundamentally different cost structure than those locked into older GPU fleets. The delta isn't marginal—it's categorical. Organizations spending $10 million annually on AI inference today could achieve equivalent output at under $300,000 with current-generation rack-scale systems.

The OpenAI-NVIDIA partnership traces back to 2016, when Huang personally delivered the first DGX-1 to OpenAI's San Francisco office. Ten years of co-evolution have produced something neither company could build alone: a model provider with frontier capability and an infrastructure vendor with the economics to deploy it at scale.

The API remains under wraps—OpenAI says additional safeguards are needed before broader access—but that's almost beside the point. The milestone isn't whether developers can access GPT-5.5 via API next month. It's that 10,000 engineers are already running it in production, with the numbers to prove it works.

0:00