In 1995, Rosalind Picard, a scientist and inventor, proposed the concept of computers recognizing emotions in her book "Affective Computing." Today, artificial intelligence (AI) systems aim to detect human emotions by analyzing facial expressions, body language, words, and tone of voice. However, the effectiveness of these systems, particularly in speech emotion recognition (SER), raises critical questions.
Understanding the Foundation:
According to Steinhardt Assistant Professor Edward B. Kang, SER relies on assumptions about the science of emotion, leading to both technological deficiencies and social issues. Kang argues that current systems oversimplify human emotions, potentially excluding individuals, like those with autism, whose emotional expressions differ from the norm.
Challenges in Defining Emotion:
The scientific community lacks a consensus on what constitutes emotion. Labels like "fear" or "happiness" prove fluid and challenging to define within a set of features. To create AI systems, researchers traditionally rely on datasets where human actors perform stereotypical emotional expressions, resulting in a simplified representation of complex human emotions.
Limitations and Harms:
SER systems face inherent limitations due to oversimplification, resulting in reduced reliability and accuracy. In call centers, SER is used to evaluate operators based on emotional norms, potentially leading to financial incentives or penalties. However, the datasets driving these systems are subjective, reflecting the beliefs of creators and actors, raising concerns about their reliability.
Applications and Industries:
While SER is prevalent in call centers, its proposed applications extend to finance, recruiting, and even dating apps. Despite Microsoft's commitment to removing facial emotion recognition due to scientific uncertainties, SER may emerge as a substitute. The potential consequences of misusing emotional data in making life-altering decisions underline the ethical challenges associated with SER.
Expert Recommendations:
Kang recommends caution in incorporating emotion recognition into consumer products. He suggests limiting its use to opt-in features for low-stakes applications, emphasizing transparency about its purpose. Kang questions the ethical implications of applying SER in contexts where individuals have minimal control over these systems, highlighting the potential for affective surveillance.
Toys and Children:
The use of SER in toys, such as Moxie, raises concerns about analyzing children's facial expressions and word choices. Kang highlights the contentious nature of sentiment analysis, emphasizing the potential misuse of emotional data in products marketed as learning companions for children. He urges researchers and developers to remain critical and compassionate in the development of such technologies.
In conclusion, the journey of AI speech emotion recognition raises fundamental questions about the definition of emotion, the reliability of systems, and the ethical implications of their applications. As technology continues to advance, critical evaluation and ethical considerations are essential to avoid potential harm and ensure responsible development.