Google DeepMind, an Alphabet division, introduced Veo 2 just seven months after releasing its Veo AI movie generator.
While the original Veo could only handle videos with a resolution of 1080p, the new tool can produce videos with a resolution of up to 4K. Google claims that the updated Veo produces scenes with stronger physics and “camera control”—the user may instruct the model for particular camera angles and images, such as close-ups, pans, and “establishing shots,” even if no actual camera is used.
DeepMind also revised the Imagen 3 text-to-image model, but the improvements—such as “more compositionally balanced” images and better adherence to artistic styles—aren’t significant enough to merit a whole new version number. August saw the initial release of Imagen 3.
Also Read: Google Unveiled Gemini 2.0, its Next AI Model for Almost Everything
Veo 2’s 4K upgrade indicates that DeepMind is surpassing other AI laboratories in terms of video production.
A week ago, OpenAI finally made its Sora video generator available to the public after first announcing it in February. However, the output of Sora (especially the Sora Turbo version, which is now accessible to ChatGPT Plus and Pro customers) is still only capable of producing videos at a maximum resolution of 1080p. Perhaps the most widely used AI movie generator available today, Runway, can only export at a blurry 720p.
Google stated in a Veo 2 presentation that “creators want to see their work shine on the big screen, but low-resolution video is great for mobile.”
According to a Google representative, Veo 2’s 4K videos are by default only eight seconds long, but they may be made longer—up to two minutes. The duration of Sora’s 1080p videos is limited to 20 seconds.
According to DeepMind, 27% of human raters chose Sora Turbo over Veo 2, while 59% of them favored Google’s service. Similar wins are also reported versus Minimax and Meta’s Movie Gen, with Veo 2’s preference barely falling short of 50% against Kling v1.5, a service from Kuaishou Technology in China.
Also Read: Google Unveiled the Willow Quantum Chip, a quantum computing innovation of the future
According to DeepMind, Veo 2 was chosen at comparable rates for “prompt adherence,” or following instructions.
Additionally, the Google company asserts that it has made great progress in proving “a better understanding of real-world physics and the nuances of human movement and expression” as well as in thwarting “hallucinated” elements like additional fingers.
Video generators are still plagued by the physics problem. For instance, Sora finds it difficult to produce believable videos of gymnasts performing intricate motions. How much better Veo 2 will be in this area is still up in the air.
Some contend that only so-called world models with the “spatial intelligence” to comprehend and create 3D settings can truly resolve problems like physics and item permanence, such as Fei-Fei Li, Stanford professor and co-founder of World Labs. With an emphasis on creating environments that may be used to train and assess AI “agents” that function in virtual environments, Google debuted its own Genie 2 world model earlier this month.
The likelihood of picture and video generators being utilized for malicious purposes increases with their output’s plausibility. If people are looking for such obvious indications of AI origins in videos, DeepMind’s undetectable SynthID watermarks on Veo 2 clips should make it harder to exploit them for political disinformation. For more common fraudulent programs, victims might not be as inclined to look for invisible watermarks in the file.
Also Read: Genie 2 from Google DeepMind can Create Interactive 3D Environments
Sora from OpenAI, on the other hand, incorporates a visible animation into the lower right corner of its films. As an alternative to SynthID, Sora also makes use of the open-source C2PA watermarking protocol (although Google also joined the C2PA initiative in February).
Google Labs’ VideoFX production tool, which has a 720p resolution cap, is now powered by Veo 2, while the ImageFX tool may now use the updated Imagen 3 model. While ImageFX is accessible in more than 100 countries, VideoFX is only now being introduced in the United States.
Although it previously suggested that YouTube videos—both of which are owned by Alphabet—may have made up a portion of the training data for the original Veo, Google DeepMind has not disclosed what data was utilized to train Veo 2 or the updated Imagen 3.
Many photographers, filmmakers, artists, and other creators are worried that their copyrighted works have been used to train these algorithms without their permission. Although OpenAI has declined to disclose the data used to train Sora, the New York Times has revealed, according to people involved with the training process, that the business trained the AI model using films from Google’s YouTube service. According to earlier reports from 404 Media, Runway appears to have trained Gen 3 Alpha using YouTube videos as well.
Also Read:Google.org Invests $20 Million in Scientists Utilizing AI to Advance Science