Safety Synthesized from 1 source

OpenAI Cracks Disassembler for $1.73, Matching Anthropic's Guarded Model

Key Points

• GPT-5.5 passed 71.4% of Expert CTF challenges vs Mythos Preview's 68.6%
• GPT-5.5 solved a disassembler challenge for $1.73 in 10 minutes
• Both models succeeded at a 32-step network extraction test for first time
• Access restrictions changed nothing about underlying capability parity
• Both still fail at power plant control software simulation

References (1)

[1] UK AI Safety Institute: GPT-5.5 matches Mythos Preview in cyber tests — Ars Technica AI ↗

The UK's AI Security Institute has confirmed what the AI safety community has quietly suspected but rarely states aloud: the benchmarks meant to measure whether frontier AI systems are dangerous are actually measuring how good those systems are at hacking. In research published this week, AISI found that OpenAI's publicly released GPT-5.5 performs comparably to Anthropic's carefully restricted Mythos Preview on cybersecurity evaluations—passing 71.4 percent of expert-level challenges compared to Mythos Preview's 68.6 percent, a difference well within statistical margin of error.

The number that should concern policymakers is $1.73. That is what it cost GPT-5.5 to solve a disassembler challenge that required reverse-engineering a Rust binary—a task that would take a skilled human analyst hours or days. GPT-5.5 completed it in just over ten minutes, with no human assistance. This is not a safety metric. This is an offense metric. And it is now the primary yardstick by which the world's leading AI safety institutes measure risk.

Anthropic made headlines last month by restricting Mythos Preview's release to "critical industry partners" after its own evaluations showed strong cyberoffense capabilities. The company framed this as responsible stewardship of potentially dangerous technology. But AISI's parallel testing reveals a troubling dynamic: Anthropic's caution and OpenAI's openness led to functionally identical products on the dimensions that actually matter for security. Both models succeeded at "The Last Ones," a 32-step corporate network extraction simulation that no previous AI had ever completed. GPT-5.5 managed 3 successes in 10 attempts. Mythos Preview managed 2. The access restrictions that Anthropic described as a safety measure changed nothing about the underlying capability landscape.

This creates a perverse incentive structure that neither company has found a way to escape. Model weights released openly can be fine-tuned by malicious actors. Model weights restricted to partners can leak. The AISI research suggests that at current capability thresholds, access controls function as theater—useful for public relations, meaningless for preventing capability proliferation. What matters, apparently, is what the model can do, not who can touch it.

The one bright spot in AISI's findings is that both models still fail at "Cooling Tower," a simulation of attempts to disrupt power plant control software. This represents the current ceiling for AI-enabled critical infrastructure attacks. But given the trajectory—zero successes last year, multiple successes this year—the question is not whether AI will eventually pass this test, but how quickly and what that means for the defenders who currently assume these attacks require human expertise.

The uncomfortable truth is that AI safety research has become, in practice, a cyberoffense arms race with itself. The models that score highest on "safety" benchmarks are the ones most capable of breaching networks, reverse-engineering malware, and executing multi-step data extraction. Regulators citing AISI data to justify restrictions are citing penetration-test results. There is no separate alignment metric being measured at this scale. The safety institute tests offense capability because that is what the models can do—and because no one has yet figured out what else to measure.