The Pentagon says Anthropic poses an unacceptable risk to national security. Anthropic says the Pentagon doesn't understand how AI works. Both claims cannot be fully true—and the gap between them reveals why Washington's attempt to control frontier AI is unraveling.
Anthropic fired back Friday, submitting two sworn declarations to a California federal court disputing the Defense Department's characterization of the company as a security threat. The declarations, signed by company executives, argue that the government's case rests on technical misunderstandings that were never raised during months of negotiations. A week earlier, internal Pentagon communications obtained through those filings showed officials telling Anthropic the two sides were "nearly aligned." That message arrived days after President Trump declared the relationship "kaput."
The immediate dispute centers on whether Anthropic could theoretically manipulate its Claude models during a conflict—deploying degraded capabilities, inserting hidden flaws, or denying access to American allies. Anthropic's position is blunt: the company cannot do these things because it doesn't control the infrastructure where its models run. Customers deploy Claude through their own cloud providers, their own data centers. "We're not a cybersecurity firm," one executive noted privately. "We don't have a kill switch buried in inference servers."
But this defense—"we physically can't"—sidesteps a harder question. The Pentagon isn't really asking whether Anthropic could sabotage its own product. It's asking whether any AI company can guarantee predictable behavior under conditions of conflict, coercion, or compromise. The answer to that question is almost certainly no, and everyone in the room knows it.
AI models are not weapons systems with defined failure modes. They are probabilistic systems whose behavior depends on context, prompting, and fine-tuning that occurs after deployment. A government that has spent decades requiring deterministic guarantees—from hardware roots of trust to cryptographic integrity checks—faces an industry that cannot provide them. No contract clause, no audit, no technical specification resolves this fundamental mismatch.
The stakes extend far beyond Anthropic. Every major AI lab pursuing defense contracts now watches how this plays out. If the Pentagon succeeds in declaring frontier AI companies "unacceptable risks" based on theoretical capabilities rather than demonstrated harms, the entire defense AI partnership model collapses. Industry sources expect rival firms to quietly recalibrate their government outreach, waiting to see whether Anthropic weathers this storm.
The court filings signal that Anthropic intends to fight, not settle. The company is arguing not just that the Pentagon misunderstands its technology, but that the government's claims emerged too late in negotiations to constitute legitimate grounds for termination. Whether a judge accepts that procedural argument may determine whether this remains a contract dispute or escalates into a broader constitutional fight over executive authority to designate tech firms as national security risks.
What remains unresolved is whether anyone can articulate what a real guarantee would look like. The Pentagon has articulated a concern. Anthropic has articulated a technical refutation. Neither side has articulated a standard that could actually satisfy both. Until that standard exists, every defense AI contract will carry this flaw—wrapped in legal language, buried in court filings, but fundamentally unresolvable within the current framework.
The relationship may be salvageable. The court may force mediation. The parties may find a commercial arrangement that papers over the philosophical divide. But the underlying tension—that America's most powerful AI systems operate beyond the control frameworks Washington has relied upon for decades—will persist long after this specific contract dispute ends.