Are Large Models Too Competitive? Can AI Applications Break the Deadlock?
-
At the end of 2022, ChatGPT quickly became popular on social media, soon reaching 100 million monthly active users, making it the fastest-growing consumer application in history.
Shortly after, a fierce competition around large models emerged domestically, with more and more companies joining the race, all vowing to surpass ChatGPT.
A year later, large models have not brought the dawn of profitability to the participants, and the capital market is cooling down. The pioneers have realized that AI applications may have more opportunities than large models.
It is widely believed that large models are a foundational technology platform and do not directly generate commercial benefits. Currently, large models are more like the operating systems of the mobile internet era—two or three are enough. What truly dominates the commercial world and creates value are applications like WeChat, TikTok, and Didi.
Robin Li once said, 'Competing in large models is meaningless; competing in applications offers greater opportunities.'
According to SensorTower data, global AI application downloads increased by 114% year-over-year in the first half of 2023, surpassing 300 million, while revenue grew by 175%.
Recently, Douyin (TikTok's Chinese version) established a new AI department led by its large model team leader, primarily focusing on AI application-level products.
So how is the development of AI applications progressing? How are Chinese companies performing in AI applications? How can domestic enterprises seize AI application opportunities?
The hottest categories in the AI application market currently include three types: text-generating AI+Chatbot applications, image-generating AI selfie applications, and video-generating AI applications.
The difficulty level gradually increases from text generation to image generation and then to video generation, with corresponding decreases in technological maturity.
First, the AI applications that gained popularity the fastest were chatbots. As of June 2023, such AI chatbot applications accounted for 68.7% of the total traffic on the charts, with four out of the top five applications being chatbots.
Unsurprisingly, the top download was ChatGPT, which accounted for 60% of the traffic among the top 50 generative AI products. Google's Bard and Poe, an LLM chatbot created by former Facebook employees under Quora, ranked third and fourth, respectively.
The second-ranked chatbot, CharacterAI, can impersonate hundreds of virtual characters, catering to both emotional and functional user needs.
For example, the AI can role-play as Napoleon, Elon Musk, Marie Curie, and others to chat with users, or serve as a psychologist, librarian, or English teacher, offering services like fitness guidance and knowledge sharing.
The lowest entry barrier also means fiercer competition. In the first half of the year, there were over 200 chatbot applications in the mobile app market, with downloads exceeding 170 million.
Second, the more challenging automatic generation of images, text, and videos—content generation category.
According to data up to June this year, content generation applications accounted for 9.7% of the total traffic of the top 50 applications. Among them, image generation applications had the largest share at 41%, followed by writing tools at 26%, and video generation applications at 8%.
Text-based applications, such as QuillBot, ranked fifth, are AI writing tools that offer features like sentence rewriting, article rewriting, and AI-generated text. Compared to chatbots, such applications require higher demands on logic and language structure.
Image generation has seen rapid development since the open-sourcing of Stable Diffusion. Leading applications like Midjourney, Leonardo AI, and the mobile selfie app Lensa AI have lowered the user barrier compared to traditional tools like Photoshop, making it easier to turn users' ideas into reality.
Video generation is more challenging. On one hand, describing videos requires more complex language, and the model's understanding of text descriptions affects the output. On the other hand, generating videos from text requires extensive data to learn subtitle relevance, frame realism, and temporal dynamics. Additionally, video data faces more limitations in terms of style, quantity, and quality.
Recently, there have been breakthroughs in video generation. Pika gained widespread attention with a trailer showcasing its upcoming product, which promises advancements in extending video length, expanding the canvas, transforming videos, and modifying local details.
The collaboration with Guo Fan, director of 'The Wandering Earth,' has brought significant attention to Pika.
In the field of video generation, another notable application is Runway. Its technology has been widely used in movies, television, and advertising, enabling AI tools to remove backgrounds, slow down videos, create infinitely extendable images, and more.
In this wave of AI applications, we are followers. The download volume and revenue scale of domestic AI applications are relatively small compared to global AI applications.
Due to policy regulations and other reasons, China has yet to introduce a well-developed companion AI chatbot.
Xiaoice Company once launched the X Eva App, similar to Character.AI, where users could interact with clones of internet celebrities and stars, as well as use features like copywriting and coding. This app was once widely discussed but has now almost faded into obscurity.
In terms of text-to-text and text-to-image generation, products from major companies like Wenxin Yiyan (Wenxin Yige for text-to-image) and Spark are somewhat competitive. Others either have limited visibility, such as Wanxing AI Paint, or lack sustainability, like Miaoya Camera.
Overall, China's AI applications in text-to-image and text-to-video generation lag significantly behind, not to mention monetization.
A September analysis report from the renowned Silicon Valley venture capital firm a16z showed that as of June 2023, 90% of the developers of the top 50 most-visited AI applications (including websites and apps) had achieved profitability.
SensorTower data shows that in the first half of 2023, in the mobile sector (including only App Store and Google Play), the United States was the market with the highest in-app purchase revenue for AI applications, while the European market also demonstrated impressive revenue generation capabilities. They accounted for 55% and 20% of the total revenue of the world's leading AI applications, respectively.
Profitable AI applications primarily rely on C-end payments, which aligns with the experience from the mobile internet era.
Drawing from the development experience of the mobile internet era, sectors that occupy more user time, involve more challenging information matching, have higher learning barriers, and require more repetitive operations are more likely to see breakout applications emerge first and achieve monetization more easily.
First, let's look at two cases of overseas commercialization.
The first is Character.AI, which operates on a subscription model where users pay to chat with AI characters. In October, it also introduced a paid group chat feature.
Character.AI is an excellent C-end product that allows for interaction and customization. As mentioned earlier, it can meet both emotional and functional needs.
However, reports indicate that Character.AI is currently in a phase of burning money to scale, with its business model still under exploration. The team plans to expand into B2B services as it grows.
The second example is the image-generation application Midjourney.
This is an AI application designed for professional illustrators, also adopting a SaaS subscription model. Under monthly payment plans, the prices for basic, standard, and professional packages are $10, $30, and $60 respectively. Reports indicate that Midjourney achieved profitability in the second month after its public beta launch, proving its monetization model viable.
This means that at the current stage, demand does not equate to payment—only essential needs do. In domestic markets where free services are the norm and payment willingness is lower, purely consumer-facing products will face even greater monetization challenges than Character.AI.
AI applications are technology-driven products that create demand through supply. So, what kind of supply can generate demand?
For new AI applications, successful product monetization involves five steps: attracting users, initial experience (user activation), user retention (repeat customers), revenue conversion, and achieving word-of-mouth promotion.
For example, during the mobile internet era, WeChat captured users' social needs by cultivating user habits, continuously optimizing the social experience, achieving viral growth, and forming a network effect, ultimately becoming one of Tencent's most important monetization products.
Currently, in the domestic AI application supply chain, there is a lack of high-quality foundational models on one hand, and on the other, products based on these models struggle to create sufficiently attractive user-retaining experiences through innovation and fine-tuning.
First, China still lags behind in foundational model development.
For instance, in the field of AI chatbots, public perception may still focus on ChatGPT's explosive popularity, massive download numbers, and top revenue rankings. However, the most profitable product is actually 'Chat with Ask AI,' a conversational bot application built on the ChatGPT API. With its intelligent Q&A capabilities and smooth user experience, it achieved over 25 million downloads and $16 million in revenue in the first half of the year.
Due to its strong foundation in large language models, OpenAI launched ChatGPT on mobile devices in May this year, and since then, its downloads and revenue have continued to grow. Additionally, API-based products like Chat with Ask AI have also seen increasing revenue. According to Appfigures data, in September this year, Chat with Ask AI generated $5.51 million in revenue, while ChatGPT earned $4.58 million.
In contrast, domestic products like Wenxin Yiyan have only just begun monetization, and API-based AI applications have yet to produce a breakout hit or sustainable revenue-generating product.
Second, domestic companies lack sufficient AI technical expertise for fine-tuning large models.
For example, Lensa AI introduced the Magic Avatar feature, which allows users to upload 10-20 selfies to generate avatars in 10 different styles, such as fantasy, anime, and fashion. Of course, this feature requires payment to use.
Lensa AI, Waifu Diffusion, Stablediffusion Infinity, and some features of Midjourney mentioned earlier are all developed based on the open-source Stable Diffusion. Therefore, at the technical level, Lensa AI does not have significant barriers compared to similar products.
However, Lensa AI has fine-tuned Stable Diffusion and leveraged its research in the Neural Algorithm of Artistic Style (NAAS) to optimize computational requirements and improve efficiency, enabling it to run quickly on mobile devices and generate images.
As a result, after the launch of the Magic Avatar feature, its accumulated technical expertise differentiated it from similar products, garnering widespread user attention.
Domestically, the imitation of Lensa AI, such as the Mini Duck Camera WeChat mini-program, also requires users to upload 20 photos and pay 9.9 yuan to generate high-definition photos in different styles. Tech enthusiasts generally speculate that Mini Duck Camera also uses the Stable Diffusion open-source model at its core, fine-tuned with the LoRA (Low-Rank Adaptation) plugin.
Some tech enthusiasts can quickly generate photos using free tutorials and open-source models. Without proprietary AI technology, Miao Ya Camera's popularity faded soon after its initial hype.