The Self-Improving Kludge: Can AI Learn to Learn?
How AI models are closing the self-improvement loop awkwardly—and why that might be enough
“Why doesn’t it learn?”
The question comes up every time I explain how large language models work. Someone will say: “I corrected ChatGPT on something yesterday. It apologized, said it understood, even thanked me. But today it made the exact same mistake. What’s wrong with it?”
Nothing is wrong with it. It’s limited by how large language models work.
Large language models operate in two distinct phases. Phase one is training—where the model learns from billions of text examples, adjusting millions of configuration parameters (called “weights”) until it can predict patterns in language. This phase takes weeks of compute time and costs millions of dollars. Then training stops. The weights freeze.
Phase two is inference—using the trained model to answer questions. This is what you interact with when you use ChatGPT. But here’s the critical limitation: during inference, the model cannot learn. The weights stay frozen. Every conversation starts from the exact same snapshot taken when training ended.
When you talk to ChatGPT, you’re talking to a photograph, not a person.
You might be thinking: “But ChatGPT has memory features. It remembers things about me across conversations.” True. But that’s not learning—it’s a context engineering workaround. The system stores notes about you separately and includes them in future prompts. It’s like taping a sticky note to that photograph. The photograph itself hasn’t changed. The weights are still frozen. All the “memory” features, retrieval systems, and context tricks we use today are workarounds that give the illusion of learning.
To improve the model—to actually make it learn from all those corrections users give it—you have to go back to phase one. Human researchers must collect the feedback, decide what it means, create new training data, and retrain the system from scratch. It’s slow, expensive, and limited by human speed and judgment. The model isn’t improving itself. Humans are improving it, one training cycle at a time.
This is why many researchers argue that LLMs hit a fundamental ceiling. No matter how impressive they get, there’s a natural limit to how useful they can become if they can’t learn continuously.
Popular AI podcaster Dwarkesh Patel argues that “the lack of continual learning is a huge huge problem” and that while LLMs might have higher baseline performance, “there’s no way to give a model high-level feedback, and you’re stuck with the abilities you get out of the box.” Richard Sutton, widely considered the father of reinforcement learning, goes further: LLMs aren’t capable of learning on-the-job, so “no matter how much we scale, we’ll need some new architecture to enable continual learning.”
The skeptics point to a stark contrast. Consider AlphaGo, DeepMind’s system that learned to defeat world champion Go players. It didn’t need humans to improve it. AlphaGo played against itself millions of times, getting immediate, objective feedback with every game: win or loss. Each game taught it which strategies worked. The self-improvement loop was closed. No human bottleneck.
Why can’t we use the same approach for language models? Because Go is a closed game with clear rules and an objective win condition. There’s no ambiguity about whether a move was good or bad—the game tells you. But there’s no objective function for “good conversation” or “useful help.” Language is open-ended, context-dependent, and fundamentally subjective.
So when ChatGPT generates a response, what feedback does it get? You might use it, ignore it, love it, hate it. From the model’s perspective: nothing. No score. No signal. Just silence. To get better, humans must step in—collecting feedback, judging quality, creating new training data, retraining the entire system. Slow. Expensive. Limited by human availability.
If the humans are doing all the improving work, is the system really intelligent? Or are we just building very expensive tools that appear to be intelligent?
The human bottleneck can’t be removed. That’s the argument. And it’s a convincing one.
But recently, a paper from researchers at the University of Tübingen caught my eye—because it suggests a kludgy workaround.
AI Training AI: The Awkward Workaround
The individual LLM can’t learn. That architecture is fundamentally frozen. But what if you zoom out? What if you put that frozen LLM inside a larger system—and use another AI to play the role of the human researcher?
The LLM itself stays frozen. But the macro system could become self-improving. One AI trains the other AI, which trains the next version of the first AI. The loop closes, even if it closes awkwardly.
This is what PostTrainBench tests: whether AI models can do the work that human AI researchers currently do—collecting data, selecting training parameters, running experiments, evaluating results.
The setup was straightforward: give a model (let’s call it the “researcher AI”) a base model to improve, access to an H100 GPU, ten hours, and the standard tools human researchers use. The task? Do what AI researchers do every day: curate training datasets, select hyperparameters, run training loops, evaluate results, iterate. All the human bottleneck work.
The models achieved 20-30% of what human experts manage. Not impressive. Humans still hit around 60% effectiveness on these benchmarks.
But here’s what made me pause: they did it at all.
Think about what’s happening here. The researcher AI is mimicking what human researchers do—because that’s what it learned from watching humans during its own training. It’s not truly intelligent research. It’s pattern-matching on “what would a human researcher try next?” The models even discovered the same heuristic human researchers use: dataset quality matters more than training duration. Not because they understand why, but because they saw that pattern in their training data.
And the researcher AI has to judge whether the newly trained model is “better.” But where did it learn what “better” means? From humans, through RLHF during its own training. It’s a copy judging based on what it learned from the original. Like asking someone to grade their own test based on what they think the teacher wants.
This isn’t real self-improvement. It’s mimicking self-improvement. The macro system looks like it’s learning, but it’s really just running on patterns absorbed from humans. After all, someone had to train that researcher AI in the first place.
And yet: it’s working. Sort of. Twenty to thirty percent effectiveness is nothing to celebrate, but it’s enough to wonder: what if awkward mimicry is good enough?
When the Loop Closes
Here’s the thing about mimicry: if it’s good enough to close the loop, it doesn’t matter that it’s not “true” self-improvement.
Right now, every AI improvement cycle requires human researchers.
But even at 20-30% effectiveness, if AI can do that work, the dynamics shift. The arithmetic is straightforward: run a thousand AI researchers in parallel. They work 24/7 without fatigue. They test variations at scales humans can’t match. The mimicry doesn’t need to be good—it needs to be good enough to remove the bottleneck, cheap enough to run at scale, and fast enough to iterate quickly.
This is what people mean when they talk about AI takeoff scenarios. Once AI can improve AI—even awkwardly, even through mimicry—the pace accelerates. The human bottleneck disappears. Each generation of models can help train the next generation faster than humans could alone.
And this isn’t theoretical speculation. Google’s Gemini Pro reasoning model helped speed up training of Gemini Pro itself. OpenAI and Anthropic are almost certainly experimenting with similar approaches. The self-improvement loop is starting to close, even if it’s closing through kludgy mimicry rather than elegant architecture.
First-mover advantage starts to matter more than theoretical purity.
But can a kludge really be the path to dominance? History suggests: absolutely.
The Kludge That Wins
In the race between kludgy-but-exists and well-designed-but-theoretical, bet on the kludge every time.
Imagine you’re designing a bipedal organism from scratch. You wouldn’t design human knees. They’re structurally backward—evolved from quadruped knees that spent most of their time on all fours, now forced to bear full body weight in an upright position. Arthritis has plagued hominids since we became bipedal. Our knees hurt. Our backs ache. It’s a kludge.
But nature already had chimpanzees. And evolution kludged those quadruped bodies into bipedal humans—freeing up our hands for tools, allowing us to build and use technology, enabling us to dominate the ecosystem. Not optimal. But, good enough to win.
Or consider the giraffe’s recurrent laryngeal nerve—fifteen feet long when it could be six inches, running from the brain down into the chest, looping around the aorta, back up to the larynx. Made perfect sense in fish ancestors. Makes no sense in giraffes. But evolution extends what’s already there rather than redesigning from scratch. Terrible design. Works fine. Result: giraffes exist.
The pattern is clear: optimization happens within constraints. “Good enough” beats “perfect” if it arrives first.
Technology follows the same pattern. Your keyboard is almost certainly QWERTY—designed in 1868 to slow typists down so mechanical typewriters wouldn’t jam. Typewriters haven’t existed for decades. More efficient layouts exist. We know QWERTY is suboptimal. But when computers came along, everyone already knew QWERTY. Switching costs too high. Result: we type on a 156-year-old hack.
Once a kludge achieves dominance, ecosystem builds around it. Tools, expertise, infrastructure. Switching costs become prohibitive. Incremental improvements patch the worst problems. We never find out if the “better” solution would have been better. The kludge becomes the standard.
Applied to AI: even if a “proper learning architecture” exists somewhere, even if it would be theoretically superior, LLMs plus kludgy self-improvement might get there first. Massive investment already exists—trained researchers, optimized hardware, entire ecosystems. Once you have a working system, even a kludgy one, you don’t rebuild from scratch.
Good Enough to Win
The skeptics are right. LLMs fundamentally can’t learn. PostTrainBench’s workaround is circular—AI mimicking humans who trained the AI in the first place. Twenty to thirty percent effectiveness. Awkward, indirect, and might not scale.
But here’s what matters: LLMs don’t have to truly learn. The macro system just has to be good enough to close the loop. And history suggests that “good enough and here now” beats “theoretically perfect but not here yet.”
After all, we conquered the planet on knees that swell and ache. We type on keyboards designed for mechanical problems that vanished decades ago. First-mover advantage, not elegant design.
The question isn’t whether the path is elegant—it’s whether it’s good enough to get there first.
And right now, the kludge is in the lead.
See Also

