November 21, 2023
We are releasing Stable Video Diffusion, an image-to-video model, for research purposes:
SVD: This model was trained to generate 14 frames at resolution 576x1024 given a context frame of the same size. We use the standard image encoder from SD 2.1, but replace the decoder with a temporally-aware
deflickering decoder
.SVD-XT: Same architecture as
SVD
but finetuned for 25 frame generation.We provide a streamlit demo
scripts/demo/video_sampling.py
and a standalone python scriptscripts/sampling/simple_video_sample.py
for inference of both models.Alongside the model, we release a technical report.