How to Determine the Intelligence Level of a Chatbot?

baoshi.rao

Some chatbots can be frustrating, perhaps due to their low intelligence level. This article explores how to evaluate the intelligence of a chatbot.

With the advancement of conversational AI technology, chatbots are increasingly becoming part of people's daily work and lives.

From a business perspective, deploying chatbots in appropriate scenarios can significantly enhance service efficiency and user experience. Therefore, integrating chatbots has become a crucial strategy in the digital and intelligent transformation of enterprises.

So, how should we assess the capabilities of a chatbot, or more specifically, how can we determine its intelligence level?

This article introduces a classification system for chatbots, helping us better understand the capabilities and suitable scenarios for different levels of chatbots.

Currently, there is no mature classification system for chatbots in the industry, but the autonomous driving classification can serve as inspiration.

According to industry standards, autonomous driving is divided into five levels, from L1 to L5 (L0 represents full human control and is not counted among the five levels). Higher levels indicate stronger autonomous capabilities. For example, L1 represents assisted driving, while L5 represents full autonomy.

At different levels, the roles of the driver and the vehicle vary, with higher levels requiring less human intervention (as shown in the figure below).

Image Source [1]

Referencing the classification of intelligent assistants by foreign developer communities [2], we categorize chatbot capabilities into five levels (L1 to L5) based on the problems they can solve and their application scenarios. Higher levels indicate stronger chatbot capabilities (as shown below).

Below, we detail these five levels.

L1: Message Push
The robot can push messages to users but lacks conversational abilities.

L1-level chatbots can only push messages unidirectionally. Today, apps and official WeChat accounts commonly use this method to interact with users.

The advantage is broad reach and high efficiency, but the downside is that users can only passively receive messages without interactive dialogue. Thus, L1-level chatbots cannot strictly be called "conversational robots."

L2: FAQ Response
The robot can answer common user questions but lacks contextual understanding and cannot initiate interactions.

L2-level chatbots begin to exhibit conversational abilities, specifically by answering frequently asked questions. A typical application is simple Q&A customer service bots, where users ask a question and the robot provides an answer.

These chatbots rely on a predefined knowledge base. When a user asks a question, the robot must understand the semantics and retrieve the corresponding answer from the knowledge base.

Evaluating L2-level chatbots primarily involves two metrics: recall rate (how many user questions the robot can answer) and accuracy rate (how many answers are correct).

For those interested in the technology behind L2-level chatbots, refer to the previous article Deconstructing the Technology Behind Conversational AI Platforms.

Overall, L2-level chatbots are suitable for simple customer service scenarios where the robot answers questions accurately, and users leave after getting their answers.

L3: Contextual Understanding & Task Completion
The robot can understand context, engage in multi-turn conversations, and help users complete tasks.

While L2-level chatbots can answer questions accurately after training, they still have two major shortcomings:

Lack of Contextual Understanding: For example, if a user asks, "Is it open on Sunday?" without specifying the subject (e.g., "the Zhongguancun store"), an L2 chatbot cannot infer the full question ("Is the Zhongguancun pickup point open on Sunday?") and thus fails to answer.
Inability to Solve Problems: For instance, when a user asks to schedule a pickup, an L2 chatbot might provide a clickable link for the user to manually enter details (time, address, phone number) in a GUI. This back-and-forth between conversational (CUI) and graphical (GUI) interfaces is inefficient.

L3-level chatbots address these issues by:

Understanding context to infer missing information.
Engaging in multi-turn dialogues to gather necessary details and complete tasks.

Thus, beyond accuracy and recall, task completion rate is a key metric for L3 chatbots.

L3-level chatbots are ideal for complex scenarios requiring proactive information gathering, such as marketing and customer acquisition.

L4: Personalized Interaction
The robot can leverage user tags to deliver personalized conversational experiences.

Ideally, chatbots should not only understand what users say but also know who they are. By using user tags (e.g., attributes, interests), chatbots can offer tailored interactions and improve efficiency.

For example, an L4 chatbot scheduling a pickup might already know the user's address and phone number from past interactions, eliminating the need to ask for them again. This significantly enhances efficiency and user experience.

The core of L4 chatbots lies in tagging users and leveraging these tags during conversations. However, poorly executed personalization can harm user experience.

Thus, user satisfaction is a critical metric for L4 chatbots.

L4-level chatbots are best suited for long-term user relationships, such as virtual assistants.

L5: Full Autonomy & Emotional Intelligence
The robot can operate autonomously, understand emotions, and adapt dynamically.

L5 represents the highest level, where chatbots can handle complex tasks independently, recognize user emotions, and adjust responses accordingly. This level is still largely theoretical but represents the future of conversational AI.

In summary, evaluating chatbot intelligence involves assessing their ability to understand context, complete tasks, personalize interactions, and eventually operate autonomously. Businesses should choose the appropriate level based on their specific needs and scenarios.

Multiple robots collaborate to meet more complex user needs. In some cases, a single conversational agent cannot fulfill the user's requirements.

Take restaurant booking as an example. A user can state their needs to a smart assistant, which collects information such as the restaurant name, time, and number of diners through conversation. At this point, the smart assistant needs to engage another phone-calling robot to make the reservation with the restaurant.

We can see that fulfilling this need involves collaboration between two conversational agents: the smart assistant and the phone-calling robot. In the future, more scenarios and requirements will necessitate multi-robot collaboration.

On the other hand, conversational agents are just one form of intelligent robots; there are many other types. For instance, Robotic Process Automation (RPA) is a type of robot that can control software to automatically complete specific task processes. There are also many scenarios where conversational agents and RPA robots can work together.

Returning to the restaurant booking example, if the restaurant offers an online booking website, the conversational agent can collect the information, while the RPA robot completes the booking operation.

In the future, both enterprises and individual users will be able to use L2 to L4 level conversational capabilities to enhance interaction efficiency and experience based on their specific business scenarios.

Notes:
[1] Want to learn about SAE's classification of autonomous driving? Visit: http://www.lilunpai.com/w/466
[2] Conversational AI: Your Guide to Five Levels of AI Assistants in Enterprise. Visit: https://blog.rasa.com/conversational-ai-your-guide-to-five-levels-of-ai-assistants-in-enterprise/