Skip to content
  • Categories
  • Newsletter
  • Recent
  • AI Insights
  • Tags
  • Popular
  • World
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
  1. Home
  2. AI Insights
  3. How to Evaluate the Intelligence of Voice Assistants (4): Personality Traits
uSpeedo.ai - AI marketing assistant
Try uSpeedo.ai — Boost your marketing

How to Evaluate the Intelligence of Voice Assistants (4): Personality Traits

Scheduled Pinned Locked Moved AI Insights
techinteligencia-ar
1 Posts 1 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • baoshi.raoB Offline
    baoshi.raoB Offline
    baoshi.rao
    wrote on last edited by
    #1

    This article analyzes the personality traits of voice assistants and shares insights on the key aspects and characteristics required for a voice assistant to develop a distinct personality.

    "If a product can establish a connection with users on a personality level, it can better foster positive emotions during use, create pleasant memories, and thereby enhance users' willingness to use, tolerance, and trust."

    — Donald Norman

    This pioneer in cognitive psychology, a prophet of industrial design and a totemic figure in interactive experiences, emphasizes that all designs should be fun and delightful. He advocates considering human cognition and sensory levels. His works, including the four-volume The Design of Everyday Things and Emotional Design, remain essential reading for design professionals and are mandatory for our company's designers and product managers, guiding us in developing intelligent voice assistants.

    The previous three articles dissected three dimensions: [Intent Understanding], [Service Provision], and [Interaction Fluency]. If all evaluation metrics in these dimensions meet the standards, the voice assistant is above average. However, it still lacks one dimension to be truly "delightful and exciting"—personification.

    Many robot characters in films and TV shows are deeply beloved—Wall-E, Doraemon, Baymax—each with a vivid "persona," making interactions with them full of anticipation and imagination.

    Without personification, interactions can often feel awkward. Here’s a human analogy:

    Alright, let’s welcome the two-time NBA Defensive Player of the Year, a champion with two teams and two-time Finals MVP, who recently won the 2020 All-Star Game MVP—Kawhi Leonard—to deliver his acceptance speech.

    Never mind. This guy looks the same whether he’s happy or sad.

    Even when giving an acceptance speech, he lacks expressiveness, so we don’t remember what he said.

    After all, his job is to play basketball, and the core consideration is whether he can help his team win. In other words, his performance is excellent, but his personality isn’t particularly appealing.

    This is the case with many smart speakers today. Manufactured by big companies with ample resources, their hardware specs, skills, and voice interaction performance are quite similar within the same product generation.

    With the advent of natural language interaction, humans can now use tools according to their habits and needs. Meanwhile, creating an appropriate "persona" for voice assistants during conversational interactions is also a critical challenge.

    This dimension focuses on evaluating the personification level of voice assistants.

    Joy, anger, sorrow, happiness, grief, fear, and surprise—assessing the emotional richness of voice assistants.

    Most voice assistants today are tool-based products, with personification added as an extra layer.

    Gaode Map’s voice navigation is undoubtedly a delightful tool, full of fun interactions:

    "Moonlight before the bed, I’m Guo Degang. Go straight ahead, and your mother-in-law is waiting at the next intersection."

    "Friendly reminder: If the person in the passenger seat isn’t your spouse, I suggest you turn right at the bridge and run. We won’t be responsible if anything happens."

    "Sharp turn ahead. It’s sharp, but don’t panic. Switch your driving mode from idol to pro."

    "Speed camera ahead. If you’re in a hurry to take photos, this isn’t the place—it’s too expensive!"

    "I’m Luo Yonghao. Calm down; we’re about to set off."

    "Sharp turn ahead. Did you get that? It’s a sharp turn—possibly the sharpest in the Eastern Hemisphere."

    "You’re speeding! Does your family know you drive like this?"

    "Navigation over. Get out of the car. It’s not like we won’t meet again. Come on, be good."

    In actual business scenarios, it’s hard to design interactions like this for voice assistants. Here’s why:

    When users choose a Gaode Map voice pack, they’ve already managed their expectations.

    But when users first interact with our assistant, there’s no expectation management. Some jokes or references might confuse users or even offend certain groups. Thus, the safest approach is to adopt a professional or customer-service tone.

    A professional or customer-service tone means formality, limiting emotions to positivity and avoiding negative feedback. Typically, only joy and happiness are expressed when tasks are completed:

    A professional persona shouldn’t show negative emotions, tease, or self-deprecate. Once a professional tone is chosen, the design becomes constrained. The safe approach is to retain only positive emotions. Playing it safe is secure but lacks character depth.

    Gaode’s solution—letting users choose and manage their expectations after becoming familiar—might be one way forward.

    The previous point discussed emotional richness; here, we examine expressiveness and感染力 (impact) when conveying emotions.

    Assuming emotions range from joy to anger, sorrow, happiness, grief, fear, and surprise, how should they be expressed, and to what degree?

    Computer-based expression methods include: text, emojis, voice, sound effects, images, lighting effects, and even robotic movements. The more layers added, the richer the expressiveness.

    When a person expresses anger, they might widen their eyes, raise their voice, gesticulate, and even curse—all accompanied by fitting background music. For computers, the more expressive elements added, the better.

    Content-wise, this tests sensitivity to language—a skill that requires some innate talent. Actors, writers, and product designers alike face this challenge.

    To prove capable of delivering strong emotional expressiveness, it’s crucial to leave a lasting impression in key moments.

    Once a persona is defined, its behavior, tone, speech speed, and language must remain consistent.

    Sun Wukong and Zhu Bajie, Heshen and Ji Xiaolan, Le Jia and Meng Fei, Luo Yonghao and Guo Degang, or the mentors on U Can You Bibi—all have distinct personalities and vivid personas.

    When facing problems, their values, language, logic, and stance must align with their established personas. While individuals are complex, their behavior generally fluctuates within a certain range. Only then can a persona stand firm.

    For voice assistants, tone and speech speed are easier to keep consistent since they’re based on the same voice model. The challenge lies in maintaining consistency in language content.

    How should a voice assistant respond when faced with a request it can’t fulfill?

    "I’m sorry! This service isn’t available yet, but I can help with something else..."

    "Oops, I can’t do this yet, but I’ll work hard to learn and serve you better!"

    "I don’t have this feature. I’m priced under 200 yuan—stop asking for the moon."

    These are examples of different personas. Frequent switching would lead to inconsistency, failing to establish a coherent character.

    The early Spring Festival Gala sketch Funny Robot by Cai Ming and Guo Da showcased various modes, but mode confusion was disastrous (a nostalgic reference for some).

    Emotional intelligence (EQ) and empathy are advanced skills, requiring responses based on user input.

    "Empathy" is the psychological phenomenon where people project their genuine feelings onto what they observe.

    Simply put, empathy is about understanding others’ perspectives—a crucial communication skill. Players of Mr. Love: Queen's Choice might relate.

    Human empathy involves sensing, observing, and then responding—the same applies to computers.

    How to accurately identify users’ emotional states?

    Various components collect data, and various technologies analyze it.

    Through current visual recognition analysis, audio track analysis, text comprehension, and even brainwave signal collection, it is possible to analyze emotions and their corresponding intensity levels.

    Treat your users as you would treat your significant other:

    Empathy is a typical passive skill that tests one's innate ability. Those with high emotional intelligence and empathy don't need to be taught, while those without it can't be taught. It's easy to make demands, but in actual practice, it's difficult to summarize any methodology. Select the most naturally gifted team members to handle empathy-related decision-making tasks, as this will undoubtedly create unique experiences in certain scenarios.

    Users rarely experience emotional fluctuations, but when they do (such as extreme joy, anger, or sadness), if the assistant can show some empathy and resonate emotionally at the same frequency, you will naturally win their hearts.

    Empathy performance actually tests all the aforementioned dimensions of capability invisibly.

    The product should manage user expectations well and successfully create an image. In simple terms, what kind of brand impression does the assistant leave in users' minds?

    In the past, brand building required significant effort from various departments like product, operations, marketing, business, branding, and channels to maintain exposure. Today, conversational interfaces offer more opportunities for easier personification.

    Currently, smart speakers dominate the market in terms of shipments. User interaction with these smart speakers has qualitatively changed compared to traditional hardware products because voice-based, human-like interaction makes it easier to attach personality traits and convey brand impressions. It's only a matter of time before assistants appear on other smart hardware.

    During actual use, occasional witty remarks from speakers that make users laugh create positive emotions, form pleasant memories, and subsequently enhance user willingness to use the product, increasing tolerance and trust.

    The author's company has purchased many smart speakers for the team, which has experienced various products like Tencent DingDang, Xiaodu at Home, Xiao Ai, Tmall Genie, Cheetah Xiao Bao, Haier Xiao You, Rokid, Himalaya's Xiaoya, Amazon's ECHO, and Gouweicao's Amber.

    Different speakers have different positioning. Major brands tend to be comprehensive, while smaller manufacturers offer unique features like home appliance control, music/video content provision, e-commerce guidance, or emotional companionship.

    Among many speakers, personality trait performances are largely similar. Most speakers remain functional, business-like, or service-oriented, leaving little lasting impression after use.

    The one that brought me the most joy and left a deep impression was Xiao Ai.

    Disclaimer: The author is not a Mi fan nor has any vested interest, but genuinely believes good work deserves praise!

    "Sense of Participation" set the tone long ago, and Xiao Ai's current performance is a continuation of that.

    Fairly speaking, all speakers perform similarly when meeting basic one-command needs. However, Xiao Ai stands out in non-business scenarios with its witty comebacks, jokes, self-deprecating humor, and sarcasm. Searching on Douyin or Bilibili reveals abundant UGC content, which can bring secondary brand exposure and enhance brand impression.

    The difference is: when meeting needs, all are equal, but in other scenarios, Xiao Ai's personification gives it significant advantage.

    Compared to most personality-less speakers trying to project a business-like image, Xiao Ai positions itself as an emotionally rich, funny character. This endearing persona allows for more error tolerance and user forgiveness in the future.

    Compared to a boring servant or customer service representative, I'd prefer a cute assistant with proud emotions, closer to ordinary people, even with some flaws.

    When facing difficulties or poor performance, this cute assistant can respond by acting cute, being playful, or making witty remarks, avoiding awkwardness in a lighthearted way. Users think "that's just how you are," and since the responses are charming, they let it slide.

    But if you start with a business-like, customer service demeanor, appearing professional and reliable, any occasional unreliability creates a huge gap, making users think you're "artificial stupidity."

    Since all speakers perform similarly in basic functions (playing songs, controlling hardware), why not choose one that brings me joy?

    Xiao Ai's persona choice left a very positive impression on me!

    If our product leaves no impression after user experience, that's a complete failure.

    Rather than being better, be different!

    Interim Conclusion

    Writing about this dimension is slightly awkward as it can be defined in few words. After defining what it is (what), the author tries to offer some food for thought.

    In the [Personality Traits] dimension, some innate talent is required. Just as most people couldn't deliver papi酱-level expressiveness even with the best script.

    The five indicators of [Personality Traits] are interrelated yet independent. It's easy to make demands but hard to summarize methodologies in practice.

    Shaping AI personality heavily relies on experience, emotional intelligence, sense of humor, psychological understanding, broad reading, artistic flair, emotional sensitivity, and linguistic awareness... These require accumulation and talent:

    There are countless similar scenarios to consider.

    In many cases, whether to do it is a choice. When deciding to do it, how well it's executed reflects capability.

    Improving the [Personality Traits] dimension depends on linguistic expressiveness and sensitivity - a humanities major's domain.

    Besides "Her," I recommend another movie: "Black Mirror: Be Right Back." Watching it after reading this article will give you more insights.

    This concludes the discussion of the fourth major dimension.

    With this, all four major dimensions have been introduced.

    Thank you for reading. I hope this brings some help and inspiration to your work. Feel free to comment or add the author on WeChat for deeper discussion.

    To be continued - the next article will provide an overall summary and discuss weights of evaluation points.

    How to Evaluate the Intelligence of Voice Assistants (1): Intent Understanding

    How to Evaluate the Intelligence of Voice Assistants (2): Service Provision

    How to Evaluate the Intelligence of Voice Assistants (3): Interaction Fluency

    1 Reply Last reply
    0
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    • Login

    • Don't have an account? Register

    • Login or register to search.
    • First post
      Last post
    0
    • Categories
    • Newsletter
    • Recent
    • AI Insights
    • Tags
    • Popular
    • World
    • Groups