World Models are the future of AI and here’s why

September 17th, 2025 marks the inaugural MIT Generative AI Impact Consortium (MGAIC) conference held at MIT’s Kresge Auditorium. The event brought together many of the world’s leading AI scientists, who shared their insights on the future of AI. Interestingly, large language models (LLMs)—most of today’s major models like GPT, Gemini, and Claude—were no longer the centre of conversations. "We need AI that builds models of how the world works—not just mimics human text," said keynote speaker Yann LeCun, Meta’s Chief AI Scientist. A month earlier, Google released Genie 3, an AI model distinct to all its predecessors. We are witnessing a new revolution in AI— a new architecture unlike any before: World Models.

World Models first emerged as an idea by Scottish psychologist Kenneth Craik in 1943 , who proposed an organism carrying a "small-scale model" of external reality within its head. This organism could "try out various alternatives, conclude which is the best of them... and in every way react in a much fuller, safer, and more competent manner."

For example, let both an LLM and a World Model learn to drive a car around a circuit. An LLM would convert the visual information into packets of numerical data, and then process them similarly with words.This process is known as visual-language integration. Then it starts experimenting, steering the car every direction it could travel. Depending on whether or not the car crashes, the LLM slowly “learns” what each correct move is. Compare that to a World Model. It also converts the visual info into numerical data, but keeps it as a continuous “flow” instead of discrete packets. Nor does it try to move immediately. Instead, it learns the mechanics: how the car moves, where the turns are, how to make those turns. Before trying any moves in the actual game, it would have “dreamed” about multiple different strategies and compared them before actually moving in the game. With World Models, there’s much less trying and failing , much more thinking and planning. 

Unlike LLMs, World Models are trained on real-world physical data, allowing them to learn the same rules we’re taught in physics class in school and “think” the same way humans might. That results in, for the first time in human history, AI models that comes close to having  “common sense”. Compare it to LLMs, in which no one has a single clue about how it makes decisions. Ultimately, LLMs are models of probabilities. They are giant lottery machines that try to predict what alphabet comes after another, instead of understanding the world as we see it. It is a model that our human mind doesn't intuitively comprehend

UCSF’s Shailee Jain went further to describe World Models as literal "silicon brains", implying that they would be the first ever artificial being that is capable of true thinking and true innovation. Human-like thinking and innovation. 

Furthermore, silicon as a material allows World Models to operate at exponentially faster rates than human brains, endure conditions so extreme that humans wouldn’t survive for a second in, and don't die from strokes, trauma, or dementia. This effectively allows them to take over [p.148] many dangerous yet intellectually demanding jobs, like cleaning deep sea spills, analysing data in space, or even building an entire colony on Mars–after all, they don’t need an atmosphere to survive.  

These are just a few examples of what World Models might be able to do in the future. But that doesn’t imply World Models are speculative–in fact, Google’s newest Genie 3 is a living demonstration of its feasibility. With one text prompt, it can create ultra-realistic 720p videos that follow real-world dynamics & physics, and even supports real-time interactions with objects in videos. It feels like playing the newest AAA game released by your favourite game developer. The model has its limitations. It’s unable to maintain accuracy in complex situations and doesn’t have an explicit representation of spacetime or reference frames (meaning it still doesn’t have the “common sense” I explained above). Nevertheless, it shows us the endless possibilities World Models bring. The model is currently not available to the public, but you can see an example of what it does here.

World Models are humanity’s current best pathway towards building AGIs–an entity that surpasses humans in overall intelligence or in some particular measure of intelligence. MIT’s Generative AI Impact Consortium marks the first time tech and academic giants like MIT, Meta and Amazon come together to collaborate on World Models. Google’s work showed the world that World Models aren’t just some crazy tech fantasy. So it wouldn’t be surprising that, in our lifetime, we would have AI that wouldn't just predict our words, but also predict our physical world. 

Previous
Previous

The Official Guide to Passion Projects (Passion Optional)

Next
Next

The Top 4 Costumes for Early Decision This Halloween (and The One You Should Wear)