Stability AI Launches Video Generation Model Stable Video Diffusion
-
Stability AI recently launched a video generation model called Stable Video Diffusion, which builds on the company's existing Stable Diffusion text-to-image model and can generate videos by animating existing images. Unlike other AI companies, Stable Video Diffusion is one of the few video generation models available in the open-source domain.
However, it's important to note that the model is currently in the "research preview" phase. Users must agree to specific terms of use that clearly define its intended applications, such as "educational or creative tools," while prohibiting its use for "representations of real events or people." Given the history of similar AI research previews, there is a possibility that the model could soon circulate on the dark web, raising concerns about misuse, especially since it appears to lack built-in content filters.
Stable Video Diffusion offers two models: SVD and SVD-XT. SVD converts still images into 14-frame videos at 576x1024 resolution, while SVD-XT extends the frame count to 24 within the same architecture. Both models can generate videos at speeds ranging from 3 to 30 frames per second. According to the white paper, these models were initially trained on datasets comprising millions of videos and then "fine-tuned" on smaller datasets ranging from hundreds of thousands to a million entries.
The model-generated four-second video clips are of remarkably high quality, considered comparable in some aspects to video generation models from Meta, Google, and other AI startups. However, Stable Video Diffusion has certain limitations, such as being unable to generate videos without motion or slow camera movements, lacking text control, failing to render legible text, and inconsistently generating faces and human figures.
Despite these limitations, Stability AI notes that these models are highly scalable and adaptable for use cases like generating 360-degree views of objects. The company plans to release "a series" of models built upon SVD and SVD-XT to expand functionality, along with a "text-to-video" tool incorporating text prompts. The ultimate goal is commercialization, with potential applications in "advertising, education, entertainment, and other fields."
However, Stability AI currently faces financial difficulties. Reports indicate the company recently raised $25 million through convertible notes, bringing total funding to $125 million. Yet, it hasn't completed a new funding round at a higher valuation, with its last valuation at $1 billion. Stability AI had aimed for a quadruple valuation in coming months despite low revenue and high burn rates.
During this period, Stability AI also experienced an executive departure. VP Ed Newton-Rex stated in a public letter that he left over disputes about using copyrighted data. This marks another setback, as Newton-Rex played a key role in launching Stable Audio, the company's AI music generation tool.
Official demo video: https://www.youtube.com/watch?v=G7mihAy691g