Scientists Watch an AI Evolve a Learning Rule No Human Ever Imagined

For most of AI’s history, machines have not truly “figured things out” on their own. They have learned — but only after humans first engineered the rules that allow them to learn. Even in reinforcement learning, where an AI agent trains itself by trial and error using rewards, the backbone of the process — the learning algorithm — is always human-designed. But a new study published in Nature hints that this division of labor may not last much longer. Researchers have created an AI system that discovered a completely new way to learn, without being told how to do it — and then outperformed the best human-designed algorithms on some of the hardest tests in the field.

The achievement is more than another benchmark victory. It suggests that the next great leaps in machine intelligence may not come from human insight at all, but from machines evolving learning strategies on their own — the way nature evolved intelligence in biological organisms over millions of years.

Evolving Intelligence Instead of Building It

The researchers took their inspiration not from computer science textbooks but from biological evolution. In nature, intelligence did not arise because someone “designed” a learning rule. It emerged from endless rounds of trial, error, mutation, and selection. The team reproduced that logic in silico. They created a vast population of AI agents, dropped them into numerous complex environments, and gave them an initial learning rule. Each agent tried — and mostly failed — to solve its tasks. Their successes and failures were then evaluated by a separate AI system, a kind of meta-intelligence, that acted like evolution itself.

This meta-network did not learn the tasks. It learned how to change the learning rule itself. After each generation of agents, it adjusted the rule so the next generation could adapt more successfully. Over many cycles, this artificial evolutionary pressure caused a new learning rule to emerge. The researchers named the discovered rule DiscoRL, and a specific implementation, tested on 57 Atari environments, was nicknamed Disco57.

When Machines Outperform Their Makers

To see whether Disco57 amounted to more than a curiosity, the team used it to train a fresh AI agent and evaluated its performance against some of the strongest human-engineered algorithms ever developed, including PPO and MuZero.

First came the classic Atari Benchmark — a widely respected stress test for general intelligence in digital agents. Disco57-trained agents outperformed all human-designed reinforcement algorithms on that benchmark. Then came the harder test: unseen environments. These included procedural worlds (ProcGen), survival-style simulation (Crafter), and the notoriously unforgiving dungeon crawler NetHack, where strategies must be invented on the fly. Again, DiscoRL showed state-of-the-art performance, despite never being explicitly programmed by humans for these domains.

This matters because doing well on unseen tasks is the closest thing AI has to a definition of “true intelligence.” It demonstrates not memorization but transferable learning. DiscoRL wasn’t just running a clever rule invented by researchers — it was using a learning strategy invented by itself through an evolutionary search process.

The End of Hand-Crafted Learning?

The elegant and unsettling implication of the study is that humans may no longer be needed to design the architectures of learning itself. Instead of writing the rules, humans might increasingly build the systems that evolve the rules. The researchers themselves acknowledged this in the closing lines of their paper: the algorithms required for stronger AI may soon “be automatically discovered from the experiences of agents, rather than manually designed.”

This marks a philosophical shift. For decades, human intuition has been the ceiling on machine learning progress. We taught machines how to think the way we imagined thinking works. But evolution does not care about our intuitions. It explores regions of possibility human minds would never consider, because we are limited by imagination, convention, and time. Machines running evolutionary meta-learning have none of those constraints.

An Unscripted Future for Artificial Minds

What was once a speculative fear — that AI would begin improving itself — may soon become a simple engineering reality. Not in the science-fiction sense of runaway superintelligence, but in the concrete, incremental sense that learning rules will no longer be authored by people. AI will begin to write the logic of its own intelligence.

That does not render humans obsolete. We still decide the goals, environments, safeguards, and evaluation criteria. But the intellectual center of gravity is shifting. We are moving from a world where AI is built, to a world where AI is bred; from instruction to evolution; from algorithms authored by humans to algorithms authored by machines.

The story of DiscoRL is not just another technical milestone. It is the first clear demonstration that the knowledge of how to learn — once considered the exclusive privilege of biological evolution and human insight — can be discovered by synthetic minds without us showing them the path. That transition, quiet and mathematical, may turn out to be one of the most important turning points in the history of intelligence itself.

More information: Junhyuk Oh et al, Discovering state-of-the-art reinforcement learning algorithms, Nature (2025). DOI: 10.1038/s41586-025-09761-x

Looking For Something Else?