VideoPoet // new LLM for video generation by Google

sbagency
2 min readDec 26, 2023

--

Generative video is very hot market for coming years don’t fuck up it.

https://arxiv.org/pdf/2312.14125.pdf

Here are a few key points about the research described in the provided documents:

- The researchers propose VideoPoet, a large language model for video generation. It employs a transformer architecture that can process multimodal inputs like images, videos, text, and audio.

- VideoPoet is trained in two stages — pretraining and task-specific adaptation. Pretraining uses a mixture of multimodal generative objectives like text-to-video, video prediction, video inpainting, etc. The pretrained model serves as a foundation for adapting to various video generation tasks.

- Experiments demonstrate VideoPoet’s capabilities in zero-shot video generation, especially in producing realistic motions driven by text prompts. It also shows promise in coherent long video generation and converting images to videos.

- Compared to diffusion models commonly used in video generation, VideoPoet as a language model can more easily combine diverse training objectives within a single architecture. This provides flexibility in adapting it to new tasks without major architectural changes.

- Evaluations show VideoPoet achieves state-of-the-art results in text-to-video generation benchmarks. Human evaluations also indicate it generates more interesting and realistic motions compared to other recent models.

- Key advantages highlighted are the ability to leverage existing optimizations for language models, combine multiple tasks flexibly, and demonstrate zero-shot generalization capabilities. VideoPoet illustrates the potential of large language models for high-fidelity video generation.

https://sites.research.google/videopoet/
https://twitter.com/CodeByPoonam/status/1739556881511890958

Competition in generative video

https://twitter.com/How2uAI/status/1739609551656411186

--

--

sbagency
sbagency

Written by sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.

No responses yet