In the 17th century, the German astronomer Johannes Kepler revolutionized our understanding of the heavens. With painstaking observations and meticulous calculations, he discerned the precise laws governing planetary motion, enabling humanity to predict with astonishing accuracy where each planet would appear in the night sky. Kepler’s work was not merely an exercise in astronomy; it was an audacious attempt to decode the rhythms of the universe itself.
Yet, Kepler’s laws, as elegant as they were, described what the planets did rather than why. It was Isaac Newton, decades later, who provided the underlying explanation. Newton’s universal laws of gravitation revealed the invisible forces orchestrating the celestial ballet, and for the first time, humans could apply the same principles to understand phenomena on Earth—from the trajectory of cannonballs to the ebb and flow of ocean tides, and eventually, the path of spacecraft traveling to the moon and beyond. Newton transformed predictive patterns into fundamental understanding.
This historical leap—from empirical observation to deep theoretical insight—is a story of human intellect at its finest. Today, we find ourselves confronting a similar question, not about planets or gravity, but about artificial intelligence: Can modern predictive systems move beyond accurate forecasts to achieve genuine understanding?
Predictive AI: Accuracy Without Comprehension
Artificial intelligence, particularly so-called “foundation models,” has achieved extraordinary success in predicting outcomes across domains—from language processing to protein folding, from weather forecasting to board game strategies. In many ways, these systems resemble Kepler’s early astronomical models: astonishingly precise in prediction, yet operating without a formal grasp of underlying principles.
Researchers at MIT’s Laboratory for Information and Decision Systems (LIDS) and Harvard University are now asking a fundamental question: Do these AI systems merely predict patterns, or do they possess an understanding akin to Newton’s laws—a conceptual model of how the world works? The difference is subtle but profound. Predictions can be made through brute-force pattern recognition, but understanding implies a capacity to generalize—to take insights from one context and apply them meaningfully in another.
“The question we were addressing was: Have foundation models—has AI—been able to make that leap from predictions to world models?” explains Keyon Vafa, the study’s lead author. “We’re not asking whether they can, or whether they will—we’re asking whether they have done it so far.”
From Patterns to Principles
The distinction between pattern recognition and understanding can be illustrated through analogy. Kepler’s laws predicted planetary positions beautifully, yet they did not explain the invisible forces at play. Newton’s insights generalized those predictions to a wide array of phenomena. Similarly, centuries of agricultural knowledge allowed humans to selectively breed crops and livestock effectively, but Gregor Mendel’s discovery of genetic inheritance provided the foundational principles that explained why those patterns occurred.
Applying this analogy to AI, predictive models can often succeed in specialized tasks. A system might accurately forecast the next move in a game of Othello or the folding of a protein chain. Yet when researchers test these models across slightly different conditions or more complex scenarios, their predictive power often falters. The AI may know what tends to happen but not why, limiting its ability to adapt and generalize.
Measuring AI’s Understanding
To explore the gap between prediction and comprehension, the MIT-Harvard team devised a novel approach for evaluating how deeply AI systems understand the phenomena they model. They introduced a metric called inductive bias, which quantifies the extent to which a system’s predictions reflect the structure and dynamics of the real world, rather than simply memorizing patterns from data.
At the simplest level, the team examined lattice models—one-dimensional systems where an entity moves along a series of discrete positions, analogous to a frog hopping between lily pads. In these basic scenarios, predictive AI models could reconstruct the “world” accurately, inferring the underlying structure without being explicitly told. Yet as the models increased in complexity—introducing additional dimensions, multiple interacting states, or more elaborate rules—their success diminished.
Even in games like Othello, where the rules are well-defined and predictable, AI models can anticipate individual moves but fail to grasp the broader configuration of the board, particularly positions that are temporarily inaccessible or indirectly affected by prior moves. The more complex the environment, the more the models’ ability to generalize falters.
The Limits of Foundation Models
The researchers tested five different categories of predictive AI systems, spanning levels of sophistication. Across the board, they found a consistent pattern: as task complexity increased, the models’ fidelity to true world dynamics declined. In short, current AI excels at narrow predictions but struggles to develop a robust, generalizable understanding of the systems it models.
This insight has practical significance. Predictive AI is already being applied to cutting-edge scientific problems, such as forecasting the properties of chemical compounds or the behavior of previously unobserved proteins. Yet, according to Vafa, “Even for something like basic mechanics, we found that there seems to be a long way to go.” Models trained on vast datasets may offer impressive forecasts, but they often lack the principled understanding required to extrapolate reliably to novel situations.
Toward Generalizable AI Understanding
While the current limitations are clear, the research also points toward a path forward. By employing quantitative metrics like inductive bias, engineers and scientists can systematically evaluate how well AI models capture underlying realities. Once such benchmarks exist, they can guide the design of training methods aimed at fostering not only accurate predictions but also conceptual understanding.
Peter G. Chang, a graduate student involved in the study, emphasizes the engineering potential: “As an engineering field, once we have a metric for something, people are really, really good at optimizing that metric.” In other words, understanding how to measure AI comprehension is the first step toward developing models capable of genuine generalization.
Implications for Science and Society
The distinction between prediction and understanding is not merely academic; it carries profound implications for both science and society. If AI can only recognize patterns without grasping underlying principles, its usefulness for novel discoveries may be limited. By contrast, systems that develop a conceptual model of their domain could transform research, enabling scientists to explore entirely new hypotheses, design experiments, and uncover previously hidden relationships.
At the societal level, as AI assumes a greater role in decision-making—whether in healthcare, climate modeling, or economic planning—understanding its limitations becomes crucial. Accurate predictions alone are insufficient if the AI cannot anticipate conditions outside its training data or adapt to unforeseen circumstances. Developing a metric-driven approach to evaluate comprehension may ultimately enhance the reliability and safety of AI-driven systems.
Lessons from History
The parallels between AI today and the evolution of scientific understanding in human history are striking. Kepler’s laws offered practical utility but lacked explanatory depth; Newton’s principles provided a framework capable of generalization. Similarly, modern AI demonstrates remarkable predictive capability, yet understanding—the capacity to generalize, to infer unseen principles, to reason beyond specific data—is still largely out of reach.
The MIT-Harvard research highlights both the achievements and the limitations of contemporary AI, illustrating that prediction and understanding are not synonymous. True comprehension requires more than pattern recognition; it demands insight into underlying mechanisms, principles, and causal relationships. Just as Newton transformed Kepler’s observations into universal laws, the next generation of AI may need to transcend raw prediction to achieve genuine understanding.
The Road Ahead
The journey toward AI systems capable of world models is just beginning. By identifying the gap between prediction and comprehension, researchers can focus on designing models that do more than mimic patterns—they can aspire to reason about the systems they interact with. Inductive bias offers a promising tool, providing a concrete measure of a model’s alignment with reality and its potential to generalize across domains.
Ultimately, the story of Kepler, Newton, and AI reminds us of the enduring challenge of understanding. Curiosity and observation can reveal patterns, but insight requires conceptual synthesis. As predictive models continue to advance, the question remains: Can AI evolve from Kepler to Newton, from pattern recognition to genuine comprehension, and in doing so, expand the horizons of human knowledge? The pursuit is just beginning, and the stakes—for science, technology, and society—could not be higher.
More information: What Has a Foundation Model Found? Inductive Bias Reveals World Models. icml.cc/virtual/2025/poster/44374