China Now Has 238 Large AI Models – Where Are They Heading?

baoshi.rao

November 30, 2022, was the first day ChatGPT entered the world.

A full year has passed since then, and 'the future of humans and AI' has shed its sci-fi cloak to step firmly into reality.

In the business world, new technologies have been advancing at breakneck speed, triggering a 'hundred-model battle.' Startups have mushroomed, while industry veterans have stepped out of the shadows to join the fray. The fierce competition is a mix of excitement, confusion, and setbacks.

Meanwhile, in everyday life, the rise of 'new professions' like AI models and AI illustrators has sparked fears of job losses, while some pioneers are making fortunes—enough to buy a house in a month—through AI training programs. Even mothers-in-law in Xiaoshan now know to set their sights on large model engineers.

However, amidst OpenAI's internal turmoil, what lies before us all is the emergence of AI as a new species. For thousands of years, humans have been the dominant species, but in the face of such monumental change, who are we?

American physicist Feynman once said, "Each of us holds a key that can open the gates to heaven, but unfortunately, the same key can also unlock the gates of hell."

Fear that AI is not powerful enough, yet fear that it is too powerful—this is a perpetual seesaw in the human psyche.

cy211.cn.png

Image credit: AI-generated image, licensed by Midjourney

In the movie The Wandering Earth, people in an extremely unstable reality rely on MOSS for certainty, only to later fear its potential loss of control and initiate the 'Isolation Plan', sealing it away in the Arctic and space station.

How exactly is artificial intelligence changing human society? What are the specific and profound impacts? Will it reshape our world? What opportunities and risks exist amidst this great transformation? Where does China stand in the global technology race, and what are its advantages? These seemingly distant and grand questions suddenly become urgent.

With hype and opportunities soaring, reason and ideals coexisting, the answer book about large models requires everyone's response.

2023: The Year Large Models Completed a Triple Jump

Within just eight months, 238 large models were born, with China officially announcing a new large model every other day - this is the speed of China in 2023.

Looking back over the past year, within less than a month after ChatGPT went viral, major Chinese tech companies including Alibaba, Huawei, Tencent, JD.com, ByteDance, 360, SenseTime, and iFlytek all entered the arena, either officially announcing their participation or unveiling their large models.

The battle of hundreds of models was imminent. Data shows that at the peak of the competition, over 30 large models could emerge in China within a single month. By October 2023, the number of companies and academic institutions in China with large models exceeding 1 billion parameters had reached 254, escalating from "one hundred models" to "two hundred models."

However, this chaotic competition was only a temporary phase. Just three months later, the large model industry began to undergo filtration and stratification. After the initial frenzy, the industry landscape gradually became clearer.

"Universities and researchers focus on fundamental research and talent cultivation; major tech companies provide computing power support, infrastructure construction, and MaaS services; startups concentrate on large model application development," summarized Qiu Xipeng, a professor at Fudan University's School of Computer Science and head of the Moss system, at the 2023 Tencent ConTech Conference.

At the foundation are universities and research institutions, the middle layer consists of major tech companies, and the top layer is occupied by large-scale model startups. This well-defined division of labor forms the current landscape of China's large-scale model ecosystem.

Institutions such as IDEA Research Institute, Beijing Academy of Artificial Intelligence, Institute of Automation of the Chinese Academy of Sciences, Shanghai AI Laboratory, Fudan University, and Tsinghua University were among the first to follow OpenAI's technological developments. Their early market insights, along with the published papers, open-source large models, datasets, and tools they released, laid the groundwork for the birth and iteration of China's large-scale models.

Moreover, these universities continuously supply talent to the market. Tsinghua-affiliated entrepreneurial teams account for a significant portion of China's large-scale model startups. Internet veterans like Wang Huiwen and Wang Xiaochuan have made high-profile entries into the field, while companies like Zhipu AI, FaceWall Intelligence, and Dark Side of the Moon share common academic roots. According to incomplete statistics, among the recently active AI large-scale model startups, at least 17 founders have Tsinghua backgrounds, representing 11 companies.

The middle layer of tech giants can be further divided into two categories. The first includes internet giants like Tencent, Baidu, Alibaba, Huawei, and ByteDance. Leveraging their cloud computing and technological expertise, they can independently develop both general and industry-specific large models, revamp existing products, and implement large-scale model solutions in industries they have already penetrated. Additionally, they provide computational resources for model training to enterprise clients and startups, offering comprehensive large-scale model services in the form of MaaS (Model as a Service).

The second category consists of tech companies from the AI 1.0 era, such as SenseTime, Unisound, and iFlyTek. Building on their strengths in voice and computer vision (CV), they have developed large models. SenseTime, leveraging its CV expertise, released the "Rixin" large model, while Unisound continued its language capabilities with the "Shanhai" large model, showcasing its potential in medical consultations.

With the foundation of large models becoming more stable and core capabilities maturing, the second half of this year saw entrepreneurial ideas turning into action, giving rise to a batch of "dark horse" startups.

These startups, backed by prestigious academic backgrounds, technical expertise, and industry experience, quickly accelerated on the fast track of large models.

Baichuan Intelligence iterates large models at an average speed of 28 days, exploring medical consultation, and is expected to launch its first AI application next year, aiming to create a super app. Zhipu AI has raised a total of 2.5 billion yuan in funding this year, becoming one of the highest-funded large model startups in China. MiniMax's overseas AI role-playing app Talkie has at times outpaced the growth of the popular foreign product Character AI. Moonshot AI focuses on long-text technology and is targeting consumer applications, exploring scenarios like AI role-playing and conversations.

At various hackathons and roadshows, seats are packed, and the atmosphere is electric, with entrepreneurs, investors, and audiences scrambling to get in. Entrepreneurs are passionate, declaring ambitions like "surpassing ByteDance," "creating the next TikTok," and "becoming the next Zhang Yiming," their bold statements echoing through the halls.

The youngest are born in the 2000s, with most in their late 20s or early 30s. A few tech and product people come together, form a small team, and start their entrepreneurial journey. Large models have clearly become the hot ticket. Data from a recent roadshow by MiraclePlus shows that in the AI space alone, there are 51 large model companies, with more than half focused on the application layer.

At this point, China's large models have completed a three-stage leap from foundational capabilities to the application layer. Over the past year of exploration, entrepreneurs in the field have gradually realized a key fact: China's large models are 'innately weak,' and the entrepreneurial opportunities lie not in the foundational capability layer but in the application layer.

China's Large Models

Where Does the Core Competitiveness Lie?

The 'weakness' of China's large models can be traced back to their roots. There are four key elements that constitute a large model: data, model, computing power, and application scenarios. Without a solid foundation, large models naturally progress slowly.

Currently, the training data for large models mainly comes from publicly available documents, materials, and datasets. In terms of total data volume, publicly available, coarsely processed data is still predominantly in English, with Chinese data being far less accessible. From a quality perspective, due to insufficient investment and refinement in China's data industry, Chinese language corpora are not only scarce but also largely unusable. Data quality is directly linked to model training effectiveness. For example, ChatGPT achieves relatively good results despite having only 1.5% Chinese language data, whereas the situation in China is the opposite.

In terms of model development, China faces an insurmountable gap in the short term. OpenAI, founded in 2015, has accumulated seven years of technical expertise and investment in large models, while China has only been at it for a year. Moreover, as China races to catch up, OpenAI continues to accelerate, pouring billions of dollars, top-tier talent, and supercomputing resources into the 'GPT' engine.

The situation regarding computing power is far from optimistic. On one hand, there's the common issue of computing power resource shortages; on the other, there are the layers of restrictions imposed by the United States.

At the 2023 Tencent ConTech conference, Wang Xiaochuan, founder and CEO of Baichuan Intelligence, learned during his Silicon Valley research that OpenAI is attempting to connect 10 million GPUs to train a large-scale model.

Wang expressed great astonishment at this: "NVIDIA produces about 1 million GPUs annually. Training GPT-4 requires 25,000 GPUs, while developing a domestic equivalent to GPT-3.5 needs 4,000 GPUs. For China to allocate 10 million GPUs for large model training is currently far beyond our resource capacity."

According to Qiu Xipeng's assessment: "The gap between our computing power and OpenAI's is too large. Merely keeping up is already challenging, let alone surpassing them. We must consider China's specific practical circumstances."

With experience and direction, it's even more crucial to acknowledge the gaps and shortcomings. As Wang Xiaochuan stated, the key question for current large model companies is how to create good AI-native applications using slightly weaker large models.

"Foreign countries may excel in large models, but that doesn't mean they excel in applications," many entrepreneurs have told Guangzhui Intelligence. In terms of application direction, domestic and foreign markets are currently on the same starting line.

Although foreign AI applications started earlier, their development is still in the early stages, mainly focusing on areas such as efficiency chatbots, emotional chatbots, painting, photography, and gaming.

Taking the U.S. Apple App Store as an example, among the popular efficiency tools, OpenAI's ChatGPT has 470,000 ratings; Microsoft's Bing Chat has 180,000 ratings; the hot C-end application representative, Character AI, has 140,000 ratings; Pi has 1,336 ratings; and the once-popular AI-generated camera Lensa AI has 390,000 ratings. For reference, TikTok has 16.34 million ratings in the U.S. Apple App Store; YouTube has 33.7 million ratings; and Snapchat has 1.96 million ratings.

No matter how strong the technical capabilities of large models are, they must be implemented in products to be practically applied, and this is precisely China's core advantage.

"Slow in theory, but three steps faster in implementation."

This is Wang Xiaochuan's interpretation of the current opportunities for large models in China. "Ideal" corresponds to the foundational technical capabilities of large models, while "implementation" corresponds to scenario applications. "Our opportunity in China's large models lies not in technical research but in application—this is where we can move faster."

Looking back at the internet era, China gave birth to super apps like Taobao, WeChat, and Douyin, accumulating extensive experience in product experience, market operations, and user demand insights. At the same time, the thriving internet product ecosystem has cultivated a group of product managers. Now, the accumulated experience from the previous era will serve as fertile ground for the birth of large model applications.

Wang Xiaochuan believes there are two core challenges to address. If these two issues are resolved, China's applications may surpass those of the United States.

First, having a large model is essential, even if its performance isn't perfect. The shortcomings of the large model itself can be compensated through open-source technologies, end-to-end models, human ingenuity, and collaborative development with application companies.

What's important is to put the large model into use. An AI application entrepreneur told Guangzhui Intelligence: "In practice, the requirements for large model capabilities in AI applications are far lower than imagined. GPT-3.5 can already achieve decent results."

Second, traditional product managers need to transform and upgrade. Wang Xiaochuan pointed out that the biggest difference between large model entrepreneurship and internet entrepreneurship is that large models represent technological entrepreneurship, primarily driven by technology, thus presenting extremely high entry barriers. In contrast, internet entrepreneurship is about creative entrepreneurship, with almost no technological bottlenecks—in other words, any good idea can be realized.

The nature of large model entrepreneurship requires product managers to understand which AI products match which technologies, and even further, to guide technological development based on product needs. This may involve specific issues, such as the criteria product managers use to evaluate technology, how to assess the quality of technology, and how to ensure algorithm engineers keep up with product iteration. In short, Wang Xiaochuan believes product managers must have the ability to judge and evaluate their technology.

The long history of technological development tells us that this is not the first time China has faced the challenge of lagging behind in foundational technological capabilities. Temporary technological superiority does not mean much. The reason China can develop large models is, first, because its market is large and complex enough. If large model companies can secure a foothold, they may already have significant growth potential.

OpenAI's advancements may also face 'adaptation issues' in China's environment, similar to the database industry years ago. While Oracle was advanced, China's lagging digitalization made it impractical for domestic use. It was Chinese database companies that took on the tough tasks, entering the database field by tackling manual ledger management.

The same applies to large models. Domestic enterprises fear falling behind and are eager to quickly adopt large models. However, distant water cannot quench immediate thirst—practical issues such as private deployment, security, value alignment, and scenario implementation are more suitable for Chinese large model companies to address.

The Future of Humans and AI: Coexistence in Transformation

The past cannot be changed, but the future can be created.

Whenever new technologies emerge, there are always those who actively seize the opportunity to gain technological dividends. Any technological revolution begins this way.

Today, in the battlefield of the "Hundred Models War," everyone is trying to find the ultimate chosen one. Whether it's competition among existing players or miracles from newcomers, both tech giants and startups are unwilling to miss any opportunity.

"From the Information Age to the Intelligent Age, a major era—like the Industrial Age before it—will see new companies rise." As Wang Xiaochuan noted, although large corporations have accumulated decades of capital during the internet era, the prevailing view is that small innovations come from big companies, but major innovations still rely on small ones.

Amid fierce competition, will these new companies, like the giants of the internet era, create a new era, even replacing current tech leaders and shaping a new business ecosystem?

Technological change is always cyclical, and perhaps we can find answers in the long river of history.

Just as the internet era gave rise to online retail models, fostering giants like Alibaba and JD.com, countless online shops and factories seized opportunities to ride the fast track to wealth. In this process, no one replaced anyone; instead, e-commerce reshaped and reconstructed offline retail, while also creating integrated offline-online supply chains, digital cloud warehouses, and new retail models.

Rather than a simple replacement of the old by the new, it is more about the innovation and restructuring of business models.

However, the exact form that the new AI business models will take remains unclear. Just as ByteDance once shook the BAT (Baidu, Alibaba, Tencent) giants, among the plethora of short video apps like Kuaishou, Douyin, Miaopai, Meipai, and Weishi, no one knew which would emerge as the biggest dark horse. Therefore, whether large or small companies, all are actively exploring the integration of AI into their businesses, waiting for the tipping point to explode and betting on tomorrow.

At the same time, unlike the abstract nature of past high-tech breakthroughs, the transformation brought by AI has already permeated every aspect of people's lives.

"As Masayoshi Son put it, the difference between those who use AI and those who don't is like the difference between humans and monkeys," said Wang Xiaochuan. "I keep ChatGPT at the bottom of my phone screen and use it every day. In the next two to three years, our work and lifestyles will undergo earth-shaking changes. With the arrival of intelligent agents, those you work with may not just be humans but also machines."

"Although large models grow faster than young people, young people still adapt faster than older generations. During times of transformation, the youth have more opportunities," he added.

Opportunities certainly exist, but challenges follow—while envisioning the social progress AI can bring, how do we mitigate the risks it poses?

In fact, throughout centuries of literary creation, humans have continuously explored the relationship between humans and non-human species. AI, as artificial intelligence, exists at the edge of ethics.

For AI and humans to coexist peacefully, a prerequisite is the alignment of values. As depicted in The Wandering Earth, MOSS was tasked with "protecting human civilization" but concluded that "the only way to protect human civilization is to destroy humanity." This illustrates how AI's formidable computational and judgment capabilities, coupled with its non-human cognition, become a source of fear.

This is precisely why, amid the rapid advancement of large models this year, safety has remained a fundamental principle. OpenAI, for instance, released Our Approach to AI Safety early in the year to address public concerns about the safety of its AI models.

Achieving alignment between AI and human values relies partly on technology, but more critically, it requires humans to first establish a coherent system of values themselves.

However, the challenge lies in the fact that biases have been ingrained in human thought and behavior since the dawn of society, and now they inevitably lurk within the data used to train AI. In other words, when we discuss AI ethics, we are also examining ourselves.

The development of AI propels the wheel of human civilization forward. As we stand at the threshold of a new era, the door has already begun to crack open.

2024, or perhaps a brand new world.