Google Unveils VideoPoet, a Breakthrough in AI Video Generation

Google Unveils VideoPoet, a Breakthrough in AI Video Generation

Google Research has introduced VideoPoet, a cutting-edge large language model (LLM) designed for diverse video generation tasks. In a departure from the prevailing diffusion-based methods, the Google team opted for an LLM, based on the transformer architecture, traditionally used for text and code generation.

Pre-training played a crucial role in VideoPoet's development. The LLM underwent extensive training on 270 million videos and over 1 billion text-and-image pairs from various sources on the internet. The model's conditioning involved transforming data into text embeddings, visual tokens, and audio tokens.

Unlike existing video generation models like Runway and Pika, VideoPoet showcases remarkable results, offering longer, higher-quality clips with more consistent motion. The LLM's unique approach addresses current challenges in video generation, allowing for coherent large motions without noticeable artifacts.

Two team members, Dan Kondratyuk and David Ross, emphasized in a blog post that VideoPoet overcomes a bottleneck in video generation by producing larger and more consistent motions across videos of 16 frames. It also enables a broader range of capabilities, including simulating various camera motions, visual styles, and generating new audio.

What sets VideoPoet apart is its integration of multiple video generation capabilities within a single LLM, eliminating the need for specialized components. Viewers surveyed by the Google Research team preferred VideoPoet, citing its superior performance in following prompts and delivering more interesting motion compared to competing models.

Specifically tailored for the mobile video market, VideoPoet defaults to producing videos in portrait orientation. Google Research envisions expanding VideoPoet's capabilities to support "any-to-any" generation tasks, such as text-to-audio and audio-to-video, pushing the boundaries of possibilities in video and audio generation.

Despite its impressive features, VideoPoet is not yet available for public use. Google has not provided a timeline for its release, leaving enthusiasts eagerly anticipating its arrival and comparison to other tools in the market.