World Models are the future of AI and here’s why

September 17th, 2025 marks the inaugural MIT Generative AI Impact Consortium (MGAIC) conference held at MIT’s Kresge Auditorium. The event brought together many of the world’s leading AI scientists, who shared their insights on the future of AI. Interestingly, large language models (LLMs)—most of today’s major models like GPT, Gemini, and Claude—were no longer the centre of conversations. "We need AI that builds models of how the world works—not just mimics human text," said keynote speaker Yann LeCun, Meta’s Chief AI Scientist. A month earlier, Google released Genie 3, a brand new AI model distinct to all its predecessors. We are witnessing a new revolution in AI— a new architecture unlike any that came before: World Models.

World Models first emerged as an idea by Scottish psychologist Kenneth Craik in 1943 , who proposed an organism carrying a "small-scale model" of external reality within its head. This organism could "try out various alternatives, conclude which is the best of them... and in every way react in a much fuller, safer, and more competent manner."

For example, let both an LLM and a World Model learn to drive a car around a circuit. An LLM would first have to convert the game into text prompts, a process known as “in-context learning”, and process the game via text. Then it starts experimenting, steering the car every direction it could travel. Depending on whether or not the car crashes, the LLM slowly “learns” what each correct move is. Compare that to a World Model. It processes the game using visual data, without having to convert it into text. Nor does it try to move immediately. Instead, it learns the mechanics: how the car moves, where the turns are, how to make those turns. Before trying any moves in the actual game, it would have “dreamed” about multiple different strategies and comparing them before actually trying them in the game. With World Models, there’s much less trying and failing, much more thinking and planning.

“Pull quote”

Unlike LLMs, the fact that we train World Models on physical real life data allows them to learn the same physics rules we learn in physics class in school and “think” the same as a human might think. That results in, for the first time in human history, AI models that might have “common sense”. Compare it to LLMs, in which no one has a single clue about how it makes decisions. Ultimately, LLMs are models of probabilities. They are giant lottery machines that try to predict what alphabet comes after another, instead of understanding the world as we homo sapiens see it. It is a model that our human mind doesn't intuitively comprehend

UCSF’s Shailee Jain went further than this to describe World Models as literal "silicon brains", implying that they would be the first ever artificial being that is capable of true thinking and true innovation. Human-like thinking and innovation. 

Furthermore, silicon as a material allows World Models to operate at exponentially faster rates than human brains, endure conditions so extreme that humans wouldn’t live for a second in, and don't die from strokes, trauma, or dementia. This effectively allows them to take over [p.148] many dangerous yet intellectually demanding jobs, like cleaning deep sea spills, analysing data in space, or even building an entire colony on Mars–after all, they don’t need an atmosphere to survive.  

These are just a few examples of what World Models might be able to do in the future, but that doesn’t imply World Models are speculative–in fact, Google’s newest Genie 3 is a living demonstration of its feasibility. With one text prompt, it can create ultra-realistic 720p videos that follow real-world dynamics & physics, and even supports real-time interactions with objects in videos. It feels like playing the newest AAA game released by your favourite game developer. The model has its limitations. It’s unable to maintain accuracy in complex situations and doesn’t have an explicit representation of spacetime or reference frames (meaning it still doesn’t have the “common sense” I explained above). Nevertheless, it shows us the endless possibilities World Models bring. The model is currently not available to the public, but you can see an example of what it does here.

World Models are humanity’s current best pathway towards building AGIs–an entity that surpasses humans in overall intelligence or in some particular measure of intelligence. MIT’s Generative AI Impact Consortium marks the first time tech and academic giants like MIT, Meta and Amazon come together to collaborate on World Models. Google’s work showed the world that this isn’t just some tech fantasy. It wouldn’t be surprising that, in our lifetime, we would have AI that wouldn't just predict our words, but also predict our physical world. 

Previous
Previous

The Official Guide to Passion Projects (Passion Optional)

Next
Next

The Top 4 Costumes for Early Decision This Halloween (and The One You Should Wear)