Scientists Discover a Second Brain Learning System—and It Explains Why Habits Are So Hard to Break

In the dimly lit corridors of the Sainsbury Wellcome Center (SWC) at University College London, a quiet revolution in neuroscience has been brewing. It doesn’t come from flashy brain scans or miracle cures, but from the meticulous study of mice, movement, and molecules. And now, it’s shaking one of neuroscience’s core assumptions: that the brain relies on a single system to learn from experience.

For decades, scientists believed that our brains learned through a feedback loop of reward—try something, get a result, learn from the outcome. But what if that wasn’t the whole story? What if, deep within the folds of our brain, a second system was working in parallel, quietly engraving our repetitive actions into habits?

Published in Nature, a new study from SWC researchers has revealed precisely that: a second, value-free learning system driven by a different type of dopamine signal. This discovery offers not only a fundamental shift in how we understand learning and behavior, but a new framework for treating addiction, compulsive disorders, and even Parkinson’s disease.

The Dopamine Myth—and the Breakthrough That Challenged It

Dopamine has long held the spotlight as the brain’s “pleasure chemical.” It floods the system when we receive unexpected rewards—a sweet bite of chocolate, a social media “like,” a hug. These events generate what’s called a reward prediction error (RPE): the brain’s way of updating expectations based on how reality compares to what we anticipated.

This RPE-based learning system helps us weigh value and make flexible decisions. It’s what tells a child that reaching for a cookie before dinner leads to scolding but waiting leads to dessert. It’s rational, adaptive—and until now, it was thought to be the singular mechanism by which we learn from experience.

But a team led by Dr. Marcus Stephenson-Jones at SWC suspected there was more beneath the surface. For years, other dopamine neurons—particularly those linked to movement—had defied explanation. They didn’t seem to care about reward at all. Instead, they fired in sync with motion, hinting at a parallel purpose no one had yet pinned down.

This study changes everything.

Introducing the Action Prediction Error (APE)

What Stephenson-Jones and his colleagues found is that the brain carries a second kind of dopamine teaching signal: the action prediction error (APE). Unlike RPE, which tracks whether outcomes are better or worse than expected, APE has nothing to do with outcome at all. It doesn’t measure value; it measures frequency.

“Essentially, we have found a mechanism that we think is responsible for habits,” said Dr. Stephenson-Jones. “Once you have developed a preference for a certain action, you can bypass your value-based system and just rely on your default policy of what you’ve done in the past.”

In other words, APE tracks how often you do something—not whether it’s good or bad. And the more often you do it, the more your brain expects you to keep doing it. This system, it turns out, is deeply embedded in the part of the brain known as the tail of the striatum, and is powered by movement-related dopamine neurons.

The Sandwich Shop Model of the Mind

To grasp how these two systems operate in everyday life, Stephenson-Jones offers a relatable metaphor: the sandwich shop.

Imagine walking into a deli for the first time. You pore over the menu, analyze your options, maybe take a risk on the tuna melt. Depending on how it tastes, you adjust your future choice—a textbook case of RPE in action.

Now imagine going back again and again. Over time, you stop thinking. You walk in, order the tuna melt, eat, leave. You don’t weigh the pros and cons anymore. That’s APE at work. The action becomes a default—not because it’s best, but because it’s familiar.

This dual system allows for efficiency. Your brain stores some decisions as habitual, freeing up cognitive space for other tasks. That’s why you can drive a car, signal a turn, and simultaneously plan dinner in your head. One system handles the repetitive mechanics; the other handles conscious choice.

Into the Striatum: Where Habits Take Root

To prove this dual-learning model, the SWC team turned to mice and a carefully designed auditory discrimination task. Using an approach developed at Cold Spring Harbor Laboratory, mice were trained to move left or right in response to high- or low-pitched tones. As they learned, the researchers monitored dopamine activity using a genetically encoded sensor.

Crucially, they focused on the tail of the striatum—an area that, unlike its neighbor the nucleus accumbens, had remained something of a mystery. Where reward-focused dopamine neurons flood the nucleus accumbens, movement-related dopamine neurons signal in the tail of the striatum.

Fluorescent images showing the locations in the brain that the scientists recorded from – the tail of the striatum (TS) and ventral striatum (VS). Credit: Francesca Greenstreet.

The breakthrough came when researchers lesioned the tail of the striatum in some mice and compared their learning trajectories to unaltered mice. Initially, both groups performed similarly. But as learning progressed and the task became familiar, a stark divergence appeared.

Normal mice reached expert performance, solidifying their action preferences. Lesioned mice lagged behind. They learned slowly and never fully developed the habitual shortcut. The implication was profound: without access to the APE system in the tail of the striatum, these mice could only rely on the slower, more deliberate RPE system.

Further experiments silencing the tail in expert mice had catastrophic effects. Once the habit system was disabled, even experienced mice struggled—confirming that in later stages of learning, behavior becomes almost entirely driven by APE.

Implications for Addiction and Compulsive Behavior

This discovery doesn’t just reshape theoretical neuroscience—it has urgent clinical implications.

Addictions and compulsive behaviors are notoriously difficult to treat, in part because they’re not purely value-driven. Smokers know cigarettes are harmful. People with OCD understand that compulsions are irrational. But their actions persist.

Now, researchers have a clearer explanation why. These behaviors are likely encoded in the APE system, stored as high-frequency actions detached from outcomes. Repeated often enough, they become default policies.

The solution? Not brute-force suppression, but replacement. As Stephenson-Jones puts it, “If you replace an action consistently enough, such as chewing on nicotine gum instead of smoking, the APE system may be able to take over and form a new habit on top of the other one.”

In essence, you don’t erase the bad habit—you overwrite it.

Rewriting the Parkinson’s Playbook

Even more intriguingly, the findings shed light on one of neurology’s oldest puzzles: paradoxical movement in Parkinson’s disease.

Parkinson’s is caused by the death of dopamine neurons in the midbrain, specifically in the substantia nigra pars compacta. But what’s odd is that patients often struggle with basic habitual movements—like walking—yet can perform complex, goal-directed motions—like dancing or ice skating—with relative ease.

This paradox has baffled neurologists for decades. But the new study offers an elegant explanation.

The dying neurons in Parkinson’s are movement-related dopamine cells. These are precisely the neurons driving the APE system. So while the RPE-driven system remains functional—allowing flexible, value-based behavior—the habitual system collapses. Walking, once automatic, becomes effortful. Ice skating, which requires conscious coordination, remains intact.

“Suddenly, we now have a theory for paradoxical movement in Parkinson’s,” said Stephenson-Jones. “This gives us a new place to look in the brain and a new way of thinking about Parkinson’s.”

The Architecture of Two Minds in One Brain

The implications ripple outward. From AI and behavioral economics to education and neuropsychology, this dual-system framework could reorient how we model decision-making.

Dr. Claudia Clopath, who led the computational modeling for the study, helped map out how RPE and APE interact. The result is a hybrid learning architecture where flexibility and efficiency coexist. Early in learning, we rely on the flexible RPE system. But with repetition, APE takes over, allowing us to automate responses and focus our conscious mind elsewhere.

This model echoes ancient philosophical debates—between reason and habit, will and instinct—but grounds them in empirical neuroscience. It suggests that our brains, like well-oiled machines, offload the repetitive to make room for the novel.

The Road Ahead: Can We Rewire Our Defaults?

While the study is a landmark, it raises new questions as fast as it answers old ones. Can APE truly be manipulated therapeutically? How do these systems interact when they conflict—when our habits oppose our goals? And can we develop drugs or stimulation therapies that selectively target the APE system to treat addiction, OCD, or Parkinson’s?

The SWC team is already at work. Future experiments will test whether APE is necessary and sufficient for habits, what exactly is encoded within each system, and how switching between them is governed.

In the meantime, this discovery invites us to think differently about ourselves. We are not solely creatures of reason, nor prisoners of habit. We are both—and the tug-of-war between our twin learning systems defines much of who we are.

So the next time you reach for your favorite sandwich without thinking, remember: it may not be your taste buds calling the shots. It might just be your brain’s quiet, second teacher—the action prediction error—guiding your hand.

Reference: Dopaminergic action prediction errors serve as a value-free teaching signal, Nature (2025). DOI: 10.1038/s41586-025-09008-9www.nature.com/articles/s41586-025-09008-9

Think this is important? Spread the knowledge! Share now.