The Real Changes Sora Brings to China's AI

baoshi.rao

OpenAI's latest technological achievement—the text-to-video model Sora—made a stunning debut during the Spring Festival holiday, keeping AI practitioners and investors worldwide awake at night.

If you haven't yet caught up with this news, here's a brief introduction: Sora is a general-purpose visual model trained by OpenAI using ultra-large-scale video data. It can understand and simulate the physical world in motion, generating videos of varying durations, aspect ratios, and resolutions. The largest version of Sora can produce high-fidelity videos up to one minute long.

Before Sora's release, there were many video generation models using various methods, but they all relied on limited visual data and could only generate short (4-second) or fixed-size videos. Therefore, Sora's realistic visual effects and overwhelming performance improvements not only shocked the entire tech community but also triggered a contagious phenomenon of 'Chinese AI anxiety.' Once again, netizens are expressing profound disappointment, directing their frustrations toward China's AI sector with pointed questions:

Why hasn't another groundbreaking AI innovation emerged from China? We seem to have taken a wrong turn in our technological development, and it's truly disheartening;

The gap between China and the US in AI is widening. With Sora's emergence, isn't China lagging behind by a decade? It feels like we're genuinely falling behind;

Replicating Sora's capabilities faces its biggest obstacle in computational power. Starting from the chip embargo, we've already suffered a complete defeat—there's simply no chance. Of course, there are also sarcastic comments like, 'When foreign Sora-like models become open-source, domestic AI companies can innovate again.'

Against the backdrop of US-China competition, such anxiety spreads every time there's a major technological breakthrough overseas. However, time has proven that as one of only two global AI superpowers, China's years of AI development mean that even if the US achieves some new AI breakthrough that other countries can't match, China would certainly not be among those left behind.

Taking the not-so-distant ChatGPT as an example, after a year of rapid development, 'whether China has its own ChatGPT' is no longer a question. In 2023, many domestic 'ChatGPT-like' large language models have been opened to public use and entered industry scenarios. Hundreds of millions of users have tested China's actual AI capabilities. While there might still be gaps with OpenAI, it's certainly not as some fear - that 'China can't do it' or that there's 'a generational technological gap'. This is similar to the sensational headlines we often see: a new drug emerges, and suddenly humanity is on the brink of immortality; an AI breakthrough occurs, and claims of AGI realization or human extinction abound. Readers oscillate between these extreme narratives, their perception of AI swinging wildly between "divine" and "fraudulent." Those truly knowledgeable about medicine would never believe in a panacea—they understand efficacy and side effects, applying treatments to specific conditions.

Similarly, those who genuinely understand the AI industry can objectively recognize China's strengths in AI while acknowledging existing gaps, responding with neither undue humility nor arrogance but with proactive measures.

Especially after the "training" experience with ChatGPT, we should now be more confident in objectively assessing the real changes Sora brings to China's AI landscape, preparing to welcome another "AI spring." After the question "Why didn't ChatGPT emerge in China?" the Year of the Dragon version has become "Why didn't Sora emerge in China?" Missing out on two consecutive "meta-innovations" has left impatient readers, who were hoping for China's AI to "overtake on the curves" or "catch up from behind," deeply disappointed.

Technological progress is never an overnight success. There is no "golden finger" in reality like the plot twists in wish-fulfillment novels—only steady, step-by-step advancement. It cannot be denied that disruptive products like large language models and text-to-video models did not debut in China. However, it must also be recognized that China's AI has always been on the right path, and its pace is accelerating.

The release of Sora will actually bring China and the U.S. closer in AI, for three reasons: First, aligned direction.

Missing a technological revolution is not about being late, but about choosing the wrong path—like Japan's historical focus on "fifth-generation computers," which led to missing an entire era. OpenAI's ChatGPT and Sora are built on the technical path of large-scale pre-trained models, with extensive engineering innovations. This shows that for any breakthrough, technical accumulation and selection are crucial. China's AI has been consistently following this "large model path" centered on the Transformer architecture, with visible progress in infrastructure and algorithmic robustness.

Second, clear objectives. OpenAI's relentless meta innovations have solidified its global AI leadership, leaving Chinese AI firms playing catch-up. However, this is no reason to mock China's AI efforts. "Not inventing xx technology from scratch" doesn’t imply inferiority—OpenAI didn’t invent Transformer either. Moreover, OpenAI is a unique AI company pooling global top talent, resources, and capital. Even Google has struggled to keep up, making it unfair to judge China’s resource-constrained AI research institutions by OpenAI’s standards.

Sora has demonstrated that "video generation models are an effective path to building a universal simulator of the physical world," reaffirming the triumph of brute-force computation and the emergent effects of "Scaling Law." This effectively "scouts the path" for China’s AI sector. With a clear target, China’s AI community can rapidly consolidate resources and ramp up R&D, narrowing the Sino-U.S. gap in text-to-video technology. Like ChatGPT, China’s development of a "Sora-like" model is inevitable—it won’t miss this wave or fall irrecoverably behind.

Finally, the capability exists. Sooner or later, China will undoubtedly develop its own 'Sora-like' AI video generator. The question is whether it will take three years, five years, or a decade. We believe that 2024 will likely witness the emergence of a domestic Sora. China already possesses all the core infrastructure required, including foundational large language models (LLMs) like Wenxin Yiyan, iFlytek Spark, and BAICHUAN, text-to-image models such as Wenxin Yige and Tencent Hunyuan, large-scale video datasets, AI computing systems, and development tool stacks for large models. With the rapid advancements in computing and infrastructure over the past year, China has both the capability and conditions to achieve success in AI video generation, replicating the ChatGPT-style breakthrough in this field.

While China must strive to catch up with Sora, there's no need for undue pessimism or panic. By staying on the right path and accelerating progress, the gap between Chinese and American AI can be narrowed.

Just like with LLMs, it's unlikely that Sora will dominate globally while China lacks usable video generation models. We sincerely hope that, in the near future, we won't shift from worrying about 'China not having Sora' to worrying about 'how to utilize so many Sora-like models in China,' as happened during the 'hundred-model battle' of LLMs. From this perspective, OpenAI's continuous output from ChatGPT to Sora will bring a bit less hype and more rationality to China's AI large model market.

Less hype means the importance of foundational models has been once again emphasized by Sora, avoiding low-level redundant developments in domestic large models.

In 2023, one large language model after another was trained and launched into the market. Among them, original foundational models accounted for the smallest proportion, with more being industry-specific large models and many privately deployed large models. These cannot compare with base models in terms of data scale and parameter size, and their generation effects are also much worse. Such low-level redundant developments also lead to waste in AI computing power and investments. Sora's stunning performance in the video domain once again demonstrates the effectiveness of brute-force aesthetics, directly surpassing the models of previously popular AI video startups. As OpenAI CEO Sam Altman stated in his speech at the YC W24 launch event: 'The right approach is to imagine a 'god-like' model in operation and then build the best products based on that assumption.'

For China's AI, it is essential to treat the few foundational models with original capabilities, such as Wenxin and Xinghuo, as the infrastructure and pillars of large models. These models should support startups and various industries in fine-tuning and optimization to avoid 'reinventing the wheel.'

Maintaining rationality means that while being amazed by Sora, we should also consider the gradual nature of application and commercialization, adopting a more reasonable approach to developing domestic Sora-like solutions. After a year of rapid advancement, large language models similar to ChatGPT have revealed several challenges in their integration across various industries, including limited practical application scenarios, modest commercial value, and a relatively low input-output ratio for large models. How to effectively utilize these large models has become a critical test for China's AI development.

Compared to the "user-friendly" large language models, video generation models have a higher barrier to entry and a smaller audience. Currently, OpenAI has only made them available to creators, unlike ChatGPT, which is open to the general public. It is evident that the journey from research and development to practical application for video generation models will be slower, with their application potential and commercial viability still awaiting exploration.

This situation provides China's AI industry, academia, and research sectors with a relatively long window to catch up. At the same time, since the extent of Sora's commercial value remains unclear, apart from companies like ByteDance and streaming platforms that need to fully commit, other tech firms and startups must consider commercialization issues. They need to refine tools for creative and commercial scenarios and improve prompt engineering for video generation models to make them accessible to a broader range of non-professional users across industries. The value of large models needs to be proven through commercialization, and Sora is no exception. The long journey of video generation models into industries has just begun. In the broader industrial space, how to make Sora-like products bring real value is a question that OpenAI hasn't answered, and U.S. AI won't answer—it's up to Chinese AI to write this answer, which is also where China can excel.

Not being anxious about Sora doesn't mean Chinese AI can just sit back and "watch the clouds roll by." It must be acknowledged that domestic large models still have many bottlenecks to overcome.

Sora's general ability to simulate the physical world can not only be applied to content creation industries like film and television production but can also serve as a technological pillar for building a world of virtual-real integration in various fields such as gaming, autonomous driving, industrial digital twins, e-commerce, and cultural tourism. So the question arises: a domestic version of Sora is bound to emerge, but are we prepared for its large-scale application across various industries? The answer today is likely still no.

As mentioned earlier, Sora's "brute-force aesthetics" once again proves the value of Scale. To achieve emergent effects, foundational models still heavily rely on vast amounts of high-quality datasets, ultra-large-scale computing power, a large pool of engineering and optimization talent, and the enormous development and operational costs that come with them.

Even OpenAI, backed by Microsoft Cloud, has not opened Sora for public use or provided API access to developers, nor has it offered a timeline for official release. The existing shortage of specialized computing power for domestic AI has become even more pressing with the advent of Sora. It's not hard to predict that new restrictions targeting AI computing power will emerge to further hinder China's AI development. Improving AI infrastructure and building a self-sufficient industrial chain remain crucial tasks for China's computing industry. The goal is to ensure cutting-edge AI technologies like large language models and video generation models contribute to China's modernization, while making computing power a sustainable driver for the country's digital economy.

Additionally, the scale and quality of data have become insurmountable barriers in the China-US AI gap. In May 2023, The Economist pointed out that China lags behind the US by two to three years in foundational model development, primarily due to data limitations that hinder effective training of AI models using internet content.

To address this situation, on December 15, 2023, China's National Data Bureau, along with 17 departments including the Cyberspace Administration and Ministry of Science and Technology, jointly issued the "Data Elements × Three-Year Action Plan (2024-2026)". The plan aims to significantly expand the breadth and depth of data element applications by the end of 2026. In 2024, we will undoubtedly witness the implementation of this initiative and see data elements become the nourishment for China's domestic AI development. It is evident that addressing the gaps and shortcomings in China's AI sector is not an overnight task, nor is it the responsibility of a single AI company or model developer. Given the proactive stance of various industries across China, perhaps a bit more patience is warranted.

As the wise are free from perplexities, the benevolent from worries, and the courageous from fears. Recognizing the changes and challenges that Sora brings to China's AI, without succumbing to anxiety over temporary absences, reflects our belief in the capability to eventually take the stage—and indeed, we will.