News

Genie 2 from Google DeepMind can Create Interactive 3D Environments

Google DeepMind has introduced Genie 2, an AI model capable of creating 3D worlds and maintaining them longer than its predecessor. Genie 2, a diffusion model, can handle first-person, isometric, and third-person perspectives and recall features even after disappearing. It's intended for training AI agents and prototyping concepts.

Google DeepMind on Wednesday unveiled Genie 2—a large-scale foundation world model. The new model can create 3D worlds and maintain them for a lot longer than its predecessor, which was only able to generate 2D worlds.

Genie 2, which is a diffusion model rather than a gaming engine, creates visuals when the player—a human or another AI agent—moves around the environment that the program is modeling. Though some of those interactions can be quite gamey, Genie 2 can represent water, smoke, and physical effects by inferring concepts about the environment as it creates frames.

Additionally, the model is capable of handling first-person and isometric views in addition to third-person perspective rendering. It only requires a single image prompt to begin, which may be a photograph of a real-world object or Google’s proprietary Imagen 3 model.

Also Read: Google DeepMind AlphaFold3 is now Open Source: Transforming Protein Research and Drug Discovery

Google DeepMind also posted about Genie 2 on X, formerly Twitter.

It’s noteworthy that Genie 2 can recall specific features of a simulated scene even after they disappear from the player’s field of vision and can precisely recreate them when they reappear. This contrasts with other world models, such as Oasis, which struggled to recall the design of the Minecraft levels it was creating in real-time, at least in the version that Decart presented to the public in October.

But even Genie 2’s capabilities are limited in this respect. Although most of the examples DeepMind published on Wednesday run for far shorter periods—in this case, the majority of the movies are roughly 10 to 20 seconds long—the company claims that the model can create “consistent” worlds for up to 60 seconds. Additionally, Genie 2 requires more time to preserve the appearance of a constant universe, which introduces artifacts and softens image quality.

Also Read: Google DeepMind Unveils AI “Habermas Machine” to Aid Consensus on Complex Issues

DeepMind only mentioned that it relied “on a large-scale video dataset” when describing how it trained Genie 2. Additionally, don’t anticipate that DeepMind will make Genie 2 available to the general public anytime soon. As of right now, the business mainly views the model as a tool for training and assessing other AI agents, such as its own SIMA algorithm, and as a tool that designers and artists may use to quickly prototype and test concepts. According to DeepMind, world models like Genie 2 will probably be crucial in the development of artificial general intelligence in the future.

The availability of sufficiently rich and diverse training environments has historically been a bottleneck in the training of more general embodied agents, according to DeepMind. “As we demonstrate, Genie 2 may allow future agents to be trained and assessed in an infinite number of new worlds.”

Also Read: How AlphaProteo by Google DeepMind is Advancing Drug Discovery with AI-Designed Proteins

This post was last modified on December 7, 2024 12:01 am

Kumud Sahni Pruthi

A postgraduate in Science with an inclination towards education and technology. She always looks for ways to help people improve their lives by putting complex things into simple words through her writing.