OpenAI Releases First Text-to-Video Model Sora: Generates 1-Minute HD Video from a Single Sentence

baoshi.rao

On February 16, OpenAI released its first text-to-video model, Sora, which perfectly inherits DALLE-3's image quality and instruction-following capabilities. It supports user input of text descriptions to generate up to 1 minute of high-definition, smooth video.

The model can deeply simulate the real physical world, marking a significant leap in artificial intelligence's ability to understand and interact with real-world scenarios.

Officially released demo videos show that, using "Chinese Lunar New Year" as a prompt, the generated video features a bustling crowd with people performing a dragon dance—the movements are incredibly smooth and precise. Some are even seen raising their phones to capture the scene, showcasing rich and meticulous details. In another segment of the "Urban Beauty" video walking through Tokyo's rain-soaked streets, the realistic details of water puddle reflections and neon light effects rival actual footage. Without labeling, many wouldn't realize this was an AI-generated video clip.

OpenAI stated that their technical team is teaching AI to comprehend and simulate the physics of the moving world, aiming to train models that can help humans solve problems requiring real-world interactions.

Generating videos from text prompts is just one step in the broader plan. Currently, Sora can generate complex scenes with multiple characters and specific movements, not only understanding the requirements in user prompts but also how these objects exist in the physical world.

However, Sora currently has limitations. OpenAI states that it may struggle to accurately simulate the physics of complex scenes and may not fully understand causality.

The model might also confuse spatial details in prompts, such as mixing up left and right, and could have difficulty precisely describing events that occur over time, such as following a specific camera trajectory. Even so, after OpenAI released its first video model, many netizens exclaimed: "Many people will lose their jobs," "The entire material industry may decline because of this," and "After language models, OpenAI is once again accelerating AI evolution."

Currently, some visual artists, designers, and filmmakers (as well as OpenAI employees) have gained access to Sora. They have already started continuously posting new works, demonstrating the infinite creative possibilities of AI-generated videos.

Below is the official website of OpenAI's video model Sora: https://openai.com/sora