Google DeepMind has revealed Genie 3, its music basis world mannequin that can be utilized to coach general-purpose AI brokers, a functionality that the AI lab says makes for an important stepping stone on the trail to “synthetic common intelligence,” or human-like intelligence.
“Genie 3 is the primary real-time interactive general-purpose world mannequin,” Shlomi Fruchter, a analysis director at DeepMind, mentioned throughout a press briefing. “It goes past slender world fashions that existed earlier than. It’s not particular to any explicit surroundings. It will possibly generate each photo-realistic and imaginary worlds, and all the things in between.”
Nonetheless in analysis preview and never publicly obtainable, Genie 3 builds on each its predecessor Genie 2 (which might generate new environments for brokers) and DeepMind’s music video technology mannequin Veo 3 (which is claimed to have a deep understanding of physics).
With a easy textual content immediate, Genie 3 can generate a number of minutes of interactive 3D environments at 720p decision at 24 frames per second — a major soar from the ten to twenty seconds Genie 2 might produce. The mannequin additionally options “promptable world occasions,” or the power to make use of a immediate to vary the generated world.
Maybe most significantly, Genie 3’s simulations keep bodily constant over time as a result of the mannequin can keep in mind what it beforehand generated — a functionality that DeepMind says its researchers didn’t explicitly program into the mannequin.
Fruchter mentioned that whereas Genie 3 has implications for academic experiences, gaming or prototyping inventive ideas, its actual unlock will manifest in coaching brokers for general-purpose duties, which he mentioned is crucial to reaching AGI.
“We expect world fashions are key on the trail to AGI, particularly for embodied brokers, the place simulating actual world eventualities is especially difficult,” Jack Parker-Holder, a analysis scientist on DeepMind’s open-endedness group, mentioned throughout the briefing.
Techcrunch occasion
San Francisco
|
October 27-29, 2025
Genie 3 is supposedly designed to unravel that bottleneck. Like Veo, it doesn’t depend on a hard-coded physics engine; as an alternative, DeepMind says, the mannequin teaches itself how the world works — how objects transfer, fall, and work together — by remembering what it has generated and reasoning over very long time horizons.
“The mannequin is auto-regressive, that means it generates one body at a time,” Fruchter informed TechCrunch in an interview. “It has to look again at what was generated earlier than to determine what’s going to occur subsequent. That’s a key a part of the structure.”
That reminiscence, the corporate says, lends to consistency in Genie 3’s simulated worlds, which in flip permits it to develop a grasp of physics, much like how people perceive {that a} glass teetering on the sting of a desk is about to fall, or that they need to duck to keep away from a falling object.
Notably, DeepMind says the mannequin additionally has the potential to push AI brokers to their limits — forcing them to study from their very own expertise, much like how people study in the true world.
For example, DeepMind shared its take a look at of Genie 3 with a latest model of its generalist Scalable Instructable Multiworld Agent (SIMA), instructing it to pursue a set of objectives. In a warehouse setting, they requested the agent to carry out duties like “strategy the intense inexperienced trash compactor” or “stroll to the packed pink forklift.”
“In all three instances, the SIMA agent is ready to obtain the objective,” Parker-Holder mentioned. “It simply receives the actions from the agent. So the agent takes the objective, sees the world simulated round it, after which takes the actions on the earth. Genie 3 simulates ahead, and the truth that it’s capable of obtain it’s as a result of Genie 3 stays constant.”
That mentioned, Genie 3 has its limitations. For instance, whereas the researchers declare it may perceive physics, the demo exhibiting a skier barreling down a mountain didn’t replicate how snow would transfer in relation to the skier.
Moreover, the vary of actions an agent can take is restricted. For instance, the promptable world occasions enable for a variety of environmental interventions, however they’re not essentially carried out by the agent itself. And it’s nonetheless tough to precisely mannequin advanced interactions between a number of impartial brokers in a shared surroundings.
Genie 3 also can solely assist a couple of minutes of steady interplay, when hours can be needed for correct coaching.
Nonetheless, the mannequin presents a compelling step ahead in instructing brokers to transcend reacting to inputs, letting them doubtlessly plan, discover, search out uncertainty, and enhance by means of trial and error — the type of self-driven, embodied studying that many say is essential to shifting towards common intelligence.
“We haven’t actually had a Transfer 37 second for embodied brokers but, the place they will really take novel actions in the true world,” Parker-Holder mentioned, referring to the legendary second within the 2016 recreation of Go between DeepMind’s AI agent AlphaGo and world champion Lee Sedol, during which Alpha Go performed an unconventional and sensible transfer that turned symbolic of AI’s capacity to find new methods past human understanding.
“However now, we will doubtlessly usher in a brand new period,” he mentioned.