The Next Major Leap for AI is Understanding Emotions, the First Conversational AI with Emotional Intelligence is Here

baoshi.rao

Is the next major breakthrough for AI understanding emotions? Hume AI says yes.

On March 27, a startup named Hume AI announced they have raised $50 million in a Series B funding round.

The startup was co-founded and is led by CEO Alan Cowen, a former Google DeepMind researcher.

What sets Hume AI apart from other AI model providers and startups is its focus on creating an AI assistant capable of understanding human emotions, responding appropriately, and conveying emotions to users. This chatbot is not only text-based but also uses voice conversation as its interface, operating by listening to human users' vocal characteristics such as tone, pitch, and pauses.

Hume AI has also released a demo of its "Empathic Voice Interface," which allows interaction simply by using a device with a microphone.

Hume AI's theory is that by developing AI models capable of more nuanced understanding and expression of human emotions, it can better serve its users. Hume AI doesn't just want to understand users' general human emotions such as 'happy', 'sad', 'angry', or 'afraid', but rather more subtle, often multidimensional emotions.

For example, emotions like 'admiration', 'adoration', 'fascination', 'sarcasm', 'shame', etc. Hume AI lists a total of 53 different emotions on its website.

Official website: https://dev.hume.ai/docs/expression-measurement-api/overview

Regarding this, Hume AI stated: Emotional intelligence includes the ability to infer intentions and preferences from behavior. This is precisely the core goal that AI interfaces aim to achieve: deducing what users want and fulfilling it. Therefore, in a sense, emotional intelligence is the most critical requirement for AI interfaces.

With voice AI, you can gather more clues about user intentions and preferences.

This makes our AI even better at predicting human preferences and outcomes, knowing when to speak, what to say, and how to say it with the right tone.

After the Hume AI demonstration, the response was overwhelmingly enthusiastic.

Guillermo Rauch, CEO of Vercel, a cloud and web application development software company, posted: 'This is one of the best AI demos I've seen so far.' On its website, Hume states: "These models are trained on human intensity ratings of large-scale, experimentally controlled emotional expression data."

This data originates from two scientific research papers published by Cowen and colleagues: "Deep learning reveals what vocal bursts express in different cultures" and "Deep learning reveals what facial expressions mean to people in different cultures."

The first study involved 16,000 participants from the United States, China, India, South Africa, and Venezuela. Interestingly, the dataset also includes a portion dedicated to recording "vocal bursts" or non-lexical sounds such as laughter and "mm-hmm" sounds. The second study included 5,833 participants from the aforementioned five countries as well as Ethiopia, who conducted a survey on computers. In this survey, they analyzed up to 30 different 'seed images' from a facial expression database of 4,659 entries.

Participants were asked to mimic the facial expressions they saw on the computer and rate them on a scale of 1-100 in intensity from a list of 48 emotions.

During the interaction, Hume AI's EVI stated that Hume's team 'has collected the largest and most diverse library of human emotional expressions ever. We're talking about over a million participants from around the world, engaging in various real-life interactions.' Hume AI trained its deep neural network using photo and audio data from these two studies.

This data was also used to create a 'speech prosody model' that measures the tone, rhythm, and timbre of speech, which is then integrated into EVI.

Interacting with EVI

Hume AI provides an API for its EVI, allowing users to train their own Hume AI models based on their unique datasets. It also provides the "Expression Measurement API," which enterprise clients can use to build applications.

Other attributes accessible in the Expression Measurement API include understanding facial expressions, speech bursts, and emotional language—the latter measuring "the emotional tone of transcribed text across 53 dimensions."

EVI can serve as an interface for any application. Developers can use Hume AI's API to build personal AI assistants, agents, wearable devices, and more, with products spanning various fields from AI assistants to health management, from tutoring to customer service. However, people may develop dependence on Hume's EVI or become unhealthily obsessed with its capabilities. This technology could also be exploited for malicious purposes such as manipulation and fraud.

When directly questioned about this possibility, Cowen provided the following statement:

When artificial intelligence leverages our emotional behaviors to achieve certain objectives (such as promoting purchases, increasing engagement, or cultivating habits), it may learn to manipulate and exploit our emotions.

Therefore, developers should regard understanding users' emotional behaviors as an inherent goal of AI itself, rather than merely treating these behaviors as means to achieve third-party objectives. Algorithms for detecting emotional cues should serve the goals of user health and well-being, including responding appropriately to abnormal situations, protecting users from abuse, and promoting emotional cognition and autonomy.

The website also includes a list of "unsupported use cases," such as manipulation, deception, "optimizing for reduced well-being" (e.g., "psychological warfare or torture"), and "unrestricted empathetic AI."

However, these are just general statements. When AI truly develops emotions, there is still a long way to go in exploring how to regulate it ethically and legally.