If AI models improve themselves, do they get smarter—or simply smaller?
The recursive self-improvement (RSI) narrative has seized the AI imagination. According to this view, today's large language models are good enough to start improving their own architectures and training procedures, leading to a closed loop of amplification that culminates in superintelligence. The logic sounds clean. The reality is lossy.
A compelling analysis from Interconnects identifies a fundamental flaw in this reasoning: each generation of AI trained on AI-generated data loses information, just like saving a JPEG file repeatedly. Every compression round introduces artifacts. Stack enough generations, and you no longer have an image—you have a smear.
Lossy compression describes how JPEG images discard fine details to reduce file size. Open, save, repeat. The degradation accumulates. Bright reds fade to orange. Sharp edges blur into gradients. The file shrinks; the picture suffers. You cannot recover what was lost by saving again.
Language models face an analogous problem. When a model generates training data for the next generation, it privileges high-probability outputs—the statistically typical responses. What gets squeezed out is the long tail: the unusual formulation, the edge case, the creative exception that made the original dataset rich. After enough iterations, the model converges on a compressedAverage of human expression, stripped of the nuance that made human expression valuable.
This isn't merely a hypothetical concern. Research on model collapse demonstrates that training on synthetic data degrades performance measurably. The degradation isn't random noise—it's systematic narrowing toward a mode that looks plausible but lacks the diversity of the source material.
The popular discourse around RSI treats self-improvement as if it compounds capability. It doesn't. It compresses it.
A counterargument holds that better evaluation methods could guide self-improvement more carefully, selecting only high-quality outputs for training. This helps. It does not solve the fundamental compression problem. Even the best filter cannot recover information that was never generated. Selecting among model outputs means selecting among compressed representations. The information loss occurs at generation, not curation.
Some propose explicit memory and retrieval systems to preserve diversity across iterations. These are interesting architectures. They address storage, not compression. The model still generates from a lossy internal representation. External memory just changes where the artifacts accumulate.
The Seed AI concept, dating to Yudkowsky's 2007 writings, imagined an AI designed for self-understanding and recursive modification from the start. This required primitive intelligence to bootstrap into something greater. Today's models are far more capable—but they were not architected for self-modification. They were trained to predict the next token. That is not the same substrate.
The oligopoly narrative—that two or three labs will dominate AI through capital and talent consolidation—diverts attention from a more fundamental problem. The bottleneck isn't resource concentration. It's information loss.
Self-improvement loops may still yield useful capabilities. Incremental gains from better tooling, better evals, and better architectures are real and valuable. But assuming these loops will compound into exponential intelligence growth requires ignoring a mathematical reality: every iteration through a lossy compressor produces a smaller, flatter, less capable output.
Until someone solves the compression problem, recursive self-improvement will remain more aspiration than mechanism.