OpenAI Accelerates the Survival of the Fittest as AI Agent Vendors Rebuild Their Moats

baoshi.rao

AI Agent (intelligent agent) - even if you don't know what it is, you must have heard this term from certain AI experts this year:

On November 13, 2023, Microsoft founder Bill Gates wrote a thousand-word blog post about Agents, claiming they would disrupt the software industry and human-computer interaction: "Whoever wins the personal agent will have won the big game. Because you will never go to a search site again, never go to a productivity site, never go to Amazon."

Columbia University computer science professor Jeff Clune sees enormous business opportunities in Agents: "Potentially worth trillions of dollars." Furthermore, NVIDIA senior researcher Jim Fan predicted that Agents would "drive the evolution of the entire civilization."

Rewinding to June 27, OpenAI's Head of Applied Research Lilian Weng's six-thousand-word blog post "LLM-powered Autonomous Agents" quickly pointed the way for the burgeoning AI application layer: build Agents.

The so-called AI Agent can be understood as an 'artificial brain' capable of autonomously using tools and executing tasks.

How popular are Agents this year? 'After June, almost no one in the AI field talks about building large models; instead, everyone is claiming to work on Agent projects,' an investor told 36Kr. Over the past month, she has spoken with over 20 companies claiming to develop Agents: 'Some previously worked on RPA, others on AIGC, but more than half of these projects aren’t actually Agents.'

In Silicon Valley, the 'heartland of AI,' according to renowned AI journalist Matt Schlicht, there are at least 100 serious projects commercializing Agents, with nearly 100,000 developers building them. 'A new Agent company is born every week,' said E2B, an AI application cloud service provider, describing the boom in Agent startups.

AI Agent

△A list of well-known Agents, image source: E2B

While there's frequent activity across the Pacific, domestic tech giants and startups have quickly embarked on the pursuit of Agent technology.

In just two months from September to October, major companies and AI unicorns like Baidu and Zhipu AI successively released Agent development frameworks or developed their own Agent applications. Startup projects branded as Agents have also sprung up like mushrooms after rain—at a recent hackathon hosted by Alibaba Cloud, 7 out of 18 AI projects mentioned Agents.

However, five months later, at the first Dev Day (Developer Day) on November 6, OpenAI quietly snapped its fingers: it released GPT Builder, a low-code development tool for customized ChatGPT (OpenAI calls them GPTs)—downstream customers and developers only need to upload training data and configure model parameters, taking just days or even hours to develop their own Agents using the world's most powerful large model base.

GPT Builder发布仅一天，就有上千基于GPT的AI应用上线了GPT Store；三天内，定制化的GPTs以每分钟一个的惊人速度新增。截至12月4日，即便在非官方商店GPTs Hunter，也已经上线了3.3万个GPTs。

OpenAI用一场发布会，让Agent创业一下子陷入全网唱衰的境地。前有OpenAI利用GPT的基座能力优势自己搞开发，后有下游客户和开发者用GPT Builder低门槛做自研——Agent企业，似乎已经到了随时可能被上下游吞并的“存亡之秋”。

先前被OpenAI指路的不少Agent创业公司迅速陷入恐慌：

在Twitter Space上，一场关于Dev Day的实时讨论会吸引了近百人。当GPTs出现在Sam Altman身后的屏幕上，讨论会的“国粹”立刻此起彼伏：“woc，这半年都白干了！”几名开发者在网上开玩笑：“我们和OpenAI的差异性就是比他差。”

A financial advisor (FA) helping two AI agent companies secure funding is so anxious she can't sleep at night. She urgently convened a meeting with the founders, insisting they must highlight any technological differentiators in their business plans—no matter how minor. 'Also, focus on the domestic market first since OpenAI hasn't entered yet,' she advised.

However, overseas startups appear remarkably calm about Dev Day. Barkley Dai, Growth Lead at U.S. AI 3D startup Luma AI, told 36Kr that panicked companies often merely slap the 'Agent' label on their products without identifying real-world use cases. 'AI firms that have found viable applications are already commercializing and building data flywheels—they won't be easily phased out,' he noted.

'This is actually an opportunity to separate the wheat from the chaff and reignite interest in the AI application space,' echoed another overseas developer with a similar perspective.

Even though developers hold divergent views, OpenAI's aggressive deployment in the agent domain precisely demonstrates that no one can deny the value of agents in AI implementation today.

This article will address the following questions:

What is the practical value of agents? How are industry players categorized?

What is OpenAI's impact on agent-based enterprises? What constitutes the core competitiveness of agent enterprises?

What is the commercialization status of Agents?

"ChatGPT can only engage in casual conversations, write poetry, or create art—these are all frivolous activities. But it can't book tickets, handle reimbursements, or make PowerPoint presentations."

This widely circulated statement in the industry reflects the limitations of large language models: they cannot actively perceive environmental information or make decisions and take actions. Turing Award winner Yann LeCun has also asserted that large language models cannot lead to AGI (Artificial General Intelligence).

However, AI Agents based on large language models are regarded by OpenAI technical experts like Andrej Karpathy and Lilian Weng as an essential pathway to AGI.

How to understand the revolutionary nature of Agents? We might as well imagine the implementation of AI as the process of completing a project.

Whether it's AIGC (AI-generated content) technologies like Midjourney or ChatGPT, they can be regarded as 'strategists' within the team. These 'strategists' can brainstorm based on the knowledge stored in their minds, providing preliminary ideas and approaches for projects assigned by superiors.

At the same time, the extent to which the 'strategists' can realize their potential largely depends on the quality of human instructions—that is, the quality of the input Prompt.

However, to deliver an outstanding project, it's not enough for 'strategists' to merely theorize. It also requires retrieving information online, analyzing past business data from databases, and using office software to create a presentation that superiors and collaborating departments can understand.

This means that for large models to be truly useful in practical tasks, they must be able to call third-party tool APIs and learn to use these tools.

In March 2023, Microsoft released 365 Copilot, which has initially enabled large models to use practical tools, assisting humans in creating PowerPoint presentations, drafting documents, and summarizing content.

However, since Copilot cannot autonomously execute and complete tasks, users still need to modify Copilot's execution results and provide feedback by adjusting prompts and other methods during its use.

Going further, an AI entity that can almost autonomously execute tasks without requiring real-time high-quality prompts is called an Agent.

Illustration of AI Agent

△Image source: Tencent Research Institute, China Merchants Securities

Lilian Weng's blog points out that the reason agents can liberate human hands lies in their ability to imitate four components of human task execution: large language models + memory + planning capabilities + tool usage.

"Memory" ensures consistency between sequential goals, while "planning capabilities" manifest in task decomposition and verification. The remaining two components form the core of an agent: "large language models" serve as the brain that understands tasks and makes decisions, and "tool usage" represents the execution of actions.

AI Agent Components

△Image source: Lilian Weng's "LLM-powered Autonomous Agents"

To date, the implementation directions of Agents have diverged into four exploration paths based on "number of Agents invoked" and "whether specific goals are set".

Just like in real project teams, where one person can lead all processes or multiple people can divide the work, based on the number of Agents invoked for a task, the modes of Agents are essentially two: Single Agent and Multi Agent.

In the domestic context, Single Agents are currently more commonly applied to specific processes or tasks with particular scenarios.

For instance, Airgram, a voice transcription platform invested by Hillhouse Capital, has launched a meeting Agent for sales scenarios. MagicVoice Intelligence, founded in 2021, focuses its Agent on private domain operations and customer service. Meanwhile, major players like Baidu, Didi, and Lanling have introduced Agent solutions for specific corporate functions such as expense control, data analysis, and communication.

However, as business processes become increasingly complex and difficult to segment into isolated components, the most straightforward solution is to have multiple Agents collaborate through division of labor.

Since the second half of this year, more companies have been developing group Agent systems. KeepChat, which recently completed its angel round of financing, integrates four collaborating Agents behind its AI sales system to handle complete sales processes and customer needs.

FaceWall Intelligence, founded by Associate Professor Zhiyuan Liu from Tsinghua University's Department of Computer Science and Technology, has transformed its intelligent software development platform ChatDev into a software company staffed entirely by Agents. The CEO Agent receives user requirements and assigns development and delivery tasks to other Agent roles including the CTO, development manager, product manager, and testing specialist.

Based on whether specific goals are set, Agents can be divided into autonomous (Autonomous) and generative (Generative).

Autonomous Agents are often constrained by specific task objectives, such as delivering software with specific functions or creating PPTs with specific content. However, creative work like scriptwriting and game design often requires serendipitous sparks of inspiration. To explore the creative potential of Agents, generative Agents without specific goals emerged.

The milestone event in generative Agent exploration occurred in April 2023 - in the "Virtual AI Town" developed by Stanford University and Google Research, 15 Agents with diverse identities freely engaged in social interactions.

Virtual AI Town

△Image source: Stanford University, Google Research

The birth of the 'Virtual AI Town' has shown many developers and manufacturers the potential of Agents to reconstruct gaming and social interaction methods. For instance, ICEGamer, a game studio established by Xiaoice, has introduced Agent NPCs into games. Developers only need to write essential world-building scripts and character settings for NPCs, while the iteration and evolution during gameplay are entirely entrusted to the Agents and players.

'In an ideal scenario, generative Agents can autonomously construct game instances based on player behavior,' Zhang Haoyang, former AIGC planner for 'Peacekeeper Elite,' told 36Kr. His AI gaming company, AutoGame, is exploring not only the use of Agents as intelligent, interactive game NPCs but also employing them as digital employees to write game scripts, create game components, and devise gameplay mechanics.

It is evident that expectations for Agents have expanded beyond merely freeing humans from repetitive tasks. There is now a vision for Agents to truly become 'digital counterparts' of humans, establishing new modes of production.

The November 6 Dev Day is widely regarded as OpenAI's official entry into competing with Agent manufacturers.

Most believe that intermediary Agent companies providing development frameworks and tools, like OpenAI, will be the first to feel the impact. Atom Capital explicitly stated in an official tweet: "Many Agent framework companies will lose their value as developers migrate to OpenAI's official framework due to ecosystem convenience."

When OpenAI directly "sells water" (provides tools) to downstream developers, competition in the Agent ecosystem will intensify. The existing tens of thousands of GPTs already cover work needs such as design, writing, and troubleshooting, extending even to lifestyle and entertainment scenarios like fortune-telling, teaching, and recipe generation. "Any other manufacturers trying to develop Agents for specific scenarios will face direct competition," a developer told 36Kr. "It’s equivalent to competing with thousands of developers within OpenAI’s ecosystem."

OpenAI Dev Day

△Image source: GPT Store

As the initial excitement from Dev Day fades, companies are regaining their composure. During the event, Sam Altman referred to GPTs as 'precursors to agents,' clearly stating that GPTs are more like chatbots and not yet capable of autonomous actions.

After a month of testing and research, the aforementioned developer told 36Kr that most GPTs, created primarily with simple instructions, fall far short of the enterprise-grade standards required for client delivery.

This means OpenAI's GPTs are not yet at a level to compete with Agent providers. However, OpenAI's ambitions in the Agent space have prompted both domestic and international companies to reassess their own competitive advantages.

To build a 'moat' in the Agent competition, data barriers are a critical wall.

However, constructing data barriers is not easy in China. On one hand, private data in most fields is scattered among different enterprises and experts, characterized by high sensitivity and difficulty in integration. On the other hand, 'process data' generated in business operations is often stored in unstructured formats on corporate servers or even in experts' 'brains.' Zhou Jian, CEO of Lanma Technology, believes that the digitalization of expert knowledge is a necessary condition for the implementation of AI Agents.

Some companies' 'smart approach' is to seek cooperation with midstream enterprises or third-party service providers to share downstream industry customer data. For example, 'Lanma Technology,' which focuses on the human resources industry, first collaborates with headhunting platforms that have numerous corporate clients. This serves as an entry point to accumulate business data such as resume screening and job matching.

However, process data is often difficult to share through third-party service providers. Many manufacturers believe that one of the few ways to obtain such data is to start with related businesses through a 'cold start' approach, completing the initial accumulation of process data. For example, if you want to develop a game Agent, you might first create a traditional game.

In industries where data privatization is not high—such as video generation or novel generation, where data mainly comes from online sources—many practitioners believe that what Agent companies need to focus on is data governance.

Transforming public data into semi-private or even private data requires not only cleaning techniques but also a deep understanding of the business.

"All data has value, and classifying it based on business needs is more critical than cleaning. Classification tests a company's business understanding—the deeper the understanding, the clearer the importance of specific data," explained Jiang Yuchen, CEO of Waveform Intelligence, a content creation Agent provider.

She used novel writing as an example, noting that fluent and elegant prose isn't the key marker of high-quality data. Instead, market-driven metrics like reader ratings and page views are the most important standards for data quality.

"The other wall of the 'moat' is technology."

In building Agents, many unresolved technical challenges remain, many stemming from the 'brain'—large language models. In June 2023, venture firm a16z pointed out in discussions with four AI unicorn CEOs that current LLMs need to address issues like uncontrolled 'hallucinations,' consistency in long-term memory, and improving multimodal understanding capabilities.

Many practitioners have told 36Kr that after the OpenAI Developer Conference, other competitors still have a 'buffer period' to address technical challenges point-to-point and acquire customers through technical solutions.

For example, Waveform Intelligence has chosen to tackle the challenge of developing its own enhanced memory technology solution called RecurrentGPT to improve the memory capabilities of large models. At the same time, during the decoding phase, it controls the number of iterations in text generation to manage the quadratically increasing memory and inference costs.

Another example is the multimodal exploration of human-computer interaction interfaces, which remains a largely untapped field. Currently, the most mainstream method of human-computer interaction is still natural language input. However, in specific business scenarios, the role of LUI (Language User Interface) is quite limited. "For instance, analyzing the operational status of a store often requires inputting a segment of store surveillance video," said Zhou Jian, CEO of Lanma Technology. "Due to the early stage of multimodal technology, exploration into multimodal UIs for images, videos, charts, and other formats is still minimal."

Moving from the lab to the vast fields of application is the destiny of Agents.

This year, as technology makes it possible, the commercialization of AI Agents has officially exploded. For example, in Silicon Valley, there are at least 100 serious projects advancing the commercialization of Agents.

However, how to control the high invocation costs is the primary challenge troubling many Agent providers.

After integrating Agents, all business scenarios requiring processing are transformed into data that the underlying large models need to understand, resulting in high inference costs. A typical case is that after Stanford's virtual town framework was open-sourced, each Agent consumed $20 worth of Tokens per day, which is even higher than human labor costs.

Tokens are the smallest units that models can understand and generate (1 Token ≈ 750 words). Zhang Haoyang also calculated: in gaming scenarios, invoking Agents consumes massive amounts of Tokens, with costs as high as 1 yuan per person per hour—once the user base reaches tens of thousands, companies will find it difficult to bear the costs.

For Agents to truly achieve large-scale implementation, multiple vendors told 36Kr that the primary consideration for Agent players isn't profitability, but how to pass the high inference costs onto users.

Currently, whether for B2B or B2C Agent players, 'Pay by Token' is the most basic business model. The so-called 'Pay by Token' payment model works like a water faucet: users pay the Agent vendor for the computing power costs corresponding to the Tokens consumed during usage.

At present, B2B Agent vendors have developed a relatively mature charging model: customization/deployment fees + Pay by Token. Accordingly, the value generated by Token consumption can be objectively measured, such as labor cost savings, increased sales, or improved office efficiency.

However, for B2C Agent applications primarily focused on gaming and social products, making the 'Pay by Token' model work is not easy. The value of Tokens needs to be transferred to subjective product experiences, making it difficult to establish measurable dimensions. User adoption and willingness to pay cannot be guaranteed.

Zhang Haoyang provided an example: Currently, the primary monetization methods in mainstream games rely on battle passes (monthly cards) and in-game item purchases, where monthly card subscribers gain additional in-game benefits. With the integration of generative AI into games, under the premise of limited player willingness to pay, if a pay-per-use business model isn't adopted, the computational costs generated by highly engaged players could become unsustainable.

This creates a commercial paradox for consumer-facing Agents: the more players there are and the longer they play, the more money the company loses.

The core reason why consumer applications struggle to achieve true commercialization lies in the fact that Agents haven't yet created new demands for users.

Taking games as an example, "Most current 'AI game' products only apply Agent capabilities to NPC dialogues, which doesn't bring fundamental innovation in gameplay but rather enhances existing experiences with new technology," Zhang Haoyang summarized. "After integrating generative AI capabilities into games, it's essential to create entirely new gameplay mechanics, truly achieving AI Native, before players will genuinely pay for AI games."

However, even though the monetization model remains undecided, Agent technology has shown potential in IP creation by meeting user needs effectively. Character.AI, founded in 2021, utilizes Agents to build a role-customization social platform. This year, the AI unicorn's app reached a peak of 4.2 million monthly active users.

△Source: Character.AI

Currently, many manufacturers are experimenting with adding digital avatars to Agents, enabling capabilities such as outbound calls and web searches—features that traditional digital humans could not perform autonomously. Going further, Agent IPs with memory can establish emotional connections with fans, rivaling the experience of real-life fan interactions.

Now it seems that OpenAI has warmed up the stage for Agents with GPTs, but there is still a long way to go, both technologically and commercially, before Agents can truly fly into the homes of ordinary people.