OpenAI's Text-to-Video Model Sora Goes Viral: Which A-Share Listed Companies May Benefit?

baoshi.rao

Recently, OpenAI's text-to-video model Sora has gone viral.

On February 16, OpenAI officially released Sora on its website. According to reports, this model can generate videos up to one minute long while maintaining visual quality and adhering to user prompts.

As OpenAI's first major text-to-video model, the launch of Sora heralds the arrival of a new technological revolution. Sora's stunning and groundbreaking effects are shaking up the current AI industry. What transformations will Sora bring after its emergence? Which listed companies in the capital market will benefit? Sora, as OpenAI's flagship text-to-video model, can strictly follow user-input prompts, text instructions, or static images to generate videos up to one minute long while maintaining high visual quality. These videos include finely detailed scenes, lively character expressions, and complex camera movements. Additionally, Sora can extend existing videos or fill in missing frames.

OpenAI states that Sora can generate complex scenes featuring multiple characters, specific types of actions, and accurate depictions of objects and backgrounds. Beyond that, Sora can animate static images. Both text instructions and static images can produce videos that accurately reflect user prompts.

Guotai Junan Securities' research report highlights three standout features of Sora: first, its ability to produce 60-second long videos while maintaining high fluidity and stability between the subject and background; second, its capability to incorporate multiple camera angles within a single video, with logical and seamless transitions between shots; and third, its understanding of the real world, excelling in details such as light reflections, motion patterns, and camera movements, significantly enhancing realism. Compared to current competitors in the AI video field, Sora's 60-second video length per prompt significantly exceeds Pika Labs' 3 seconds, Meta Emu Video's 4 seconds, and Runway's Gen-2's 18-second video duration.

Additionally, judging from official demonstrations, Sora's performance is remarkably impressive in terms of both video fluidity and detail representation.

However, Sora is still under development. OpenAI acknowledges that the model may confuse spatial details in prompts (such as left/right orientation) and may struggle with precise descriptions of events over time (like following specific camera trajectories). Meanwhile, OpenAI states that scalable video generation models represent a potential pathway toward building general-purpose simulators of the physical world. The launch of Sora marks a quantum leap in AI video generation capabilities. This model can deeply simulate the real physical world, representing a major advancement in artificial intelligence's ability to understand and interact with real-world scenarios.

In recent years, OpenAI has been leading the AI race. In early 2021 and late 2022, OpenAI introduced the image generation system DALL·E and the chatbot ChatGPT respectively. These innovations have gradually turned AI into an assistive tool across various industries and are reshaping people's perspectives on future work.

Now, the Sora model can generate one-minute high-fidelity videos in multiple formats, including landscape (19201080), portrait (10801920), and all intermediate aspect ratios. This compatibility with different video playback devices and ability to generate content according to specific aspect ratios will significantly impact creative fields such as film production, television content creation, and self-media. Guosheng Securities believes that the text-to-video model Sora has emerged as a game-changer, capable of understanding and presenting physical laws, signaling a disruptive moment for the film and animation industry.

It is reported that Sora is built upon previous research on DALL·E and GPT, utilizing DALL·E 3's recaptioning technique to generate highly descriptive annotations for visual model training data, enabling the model to better follow textual instructions.

In response, Zhou Hongyi, founder of 360 Group, stated that Sora's technical approach is entirely different. Previously, video and image generation relied on Diffusion, which combines multiple real images. This time, OpenAI leveraged its advantage in large language models, integrating LLM and Diffusion for training, allowing Sora to achieve two capabilities: understanding the real world and simulating it. This approach produces videos that are realistic and capable of simulating the physical world beyond the limitations of 2D. "This is all thanks to large models. OpenAI likely trained this model by analyzing massive amounts of video data. A picture is worth a thousand words, but a video conveys far more information than a single image. This brings us much closer to AGI (Artificial General Intelligence) – not in 10 or 20 years, but potentially within just one or two years," stated Zhou Hongyi.

Zhou emphasized this represents the future direction. With robust large models as foundation, combining human language comprehension, knowledge of world models, and other technologies can create super-tools across fields like biomedicine (protein and gene research), physics, chemistry, and mathematics. Sora's simulation of the physical world will particularly influence embodied AI in robotics and autonomous driving.

Guosheng Securities shares this perspective, noting Sora's emergence as a text-to-video model capable of understanding physical laws signals a disruptive moment for the film and animation industry. Regarding the advent of Sora, Liu Xingliang, a member of the Expert Committee on Information and Communication Economy at the Ministry of Industry and Information Technology and a digital economy expert, stated that this marks a new era for AI technology in the field of content creation.

"Sora can generate 1080P high-definition videos lasting about a minute, featuring multiple characters, various types of actions, and detailed backgrounds, achieving almost cinematic-level realism. This capability not only provides content creators with unprecedented tools, enabling them to bring their ideas to life at lower costs and faster speeds, but also offers audiences richer and more diverse visual experiences. This significant leap in technological innovation foreshadows the increasingly important role AI will play in all aspects of human life in the future," Liu Xingliang remarked.

Market analysts believe that 2022 was the year of images, 2023 the year of sound waves, and 2024 the year of video. OpenAI stated that Sora is the foundation for building world models and will continue to advance towards achieving AGI in the future.

For Sora's development, computational power demand is intense. Guotai Junan Securities pointed out that the Sora model drives leapfrog development in the AI multimodal field. Related areas such as AI creation will undergo profound transformations, with AI's empowering scope further expanding. Multimodal-related training and inference applications will also increase the demand for computational infrastructure.

Coincidentally, Guosheng Securities holds the same view, believing that Sora still complies with AI's Scaling Law. OpenAI explained in its technical documentation that as training computation increases, sample quality significantly improves, further corroborating that in the multimodal era, computational power demand will become one of the most critical bottlenecks. Multimodal large models are driving rapid growth in global computing power demand, creating opportunities for domestic AI computing power. According to data from the Southern Wealth Network's stock trend selection system, there are currently 52 A-share listed companies related to domestic AI computing power. The domestic AI computing power industry chain includes segments such as AI server components, server manufacturing, computing power leasing, and data centers.

AI server component companies mainly include Hygon Information Technology, Cambricon Technologies, Loongson Technology, and Jingjia Micro; server manufacturing companies include Gaoxin Development, Digital China, Tonly Information, GRG Banking, FiberHome Telecommunication Technologies, and Tongfang Co., Ltd.; computing power leasing companies include Hengrun Holdings, Cloud Live Technology, and Hongbo Co., Ltd.; data center companies include OFILM Group, Sinnet Technology, Baosight Software, and Data Center Group.

Additionally, several industry giants are planning to expand their AI infrastructure. For example, Wanxing Technology, a company specializing in video and graphic creative software products, recently stated on an interactive platform that its video creative product Filmora can be used for various video creation and editing. The company's "Tianmu" large model is a multimedia large model centered on video creative AI technology, encompassing multimodal capabilities in audio, images, and videos. Kunlun Tech's subsidiaries, Star Group and Opera, have the potential to develop short video platforms, with Opera already launching short video features overseas. Additionally, Kunlun's Tian Gong large model ranked first in a comprehensive evaluation of multimodal large language models conducted jointly by Tencent YouTu Lab and Xiamen University.

Professional intelligent video solutions and video cloud service provider Danghong Tech has developed its own AIGC toolset. In the first half of last year, it released a solution for generating three-dimensional volumetric videos from static photos.

On January 5, Danghong Tech stated on an interactive platform that the company has developed its own AIGC toolset, released a solution for generating 3D volumetric videos from static photos, and achieved up to 800 times visually lossless compression through point cloud model conversion and compression algorithms, enabling switching between different modalities. InsightGPT, a subsidiary of the Insig Group, is currently capable of generating videos over 20 seconds in length. It combines large image and video models with various algorithms, including matting techniques, and integrates audio models to render and synthesize complete videos after overall processing.

According to incomplete statistics, more than 10 A-share listed companies, including Wanxing Technology, Bohui Technology, Yidian Tianxia, Digital Video, Hanwang Technology, Danghong Technology, Oriental Nations, Shensi Electronics, Insig Group, TRS, Guomai Culture, and Jiadu Technology, have disclosed their business developments in the field of video generation models on interactive platforms over the past three months.