top of page

Imagine if the Little Mermaid had an AI Voice

Updated: 6 days ago

By Yuting Zhang, MA Graduate in Translation & Localization Management at Middlebury Institute of International Studies and GlobalSaké NextGen Roundtable Member


“Out of the sea, wish I could be part of that world,” sings Ariel, the little mermaid whose sweet, heavenly voice was taken away by the sea witch Ursula in exchange for legs to walk on land. But in the age of AI, Ariel’s problem might have an easy fix: an AI-generated voice. If AI voice technology existed under the sea, Ariel wouldn’t have to play deaf and mute on land.


But imagine this “AI voice.” The sultry tone of Scarlett Johansson wouldn’t be the first thing that comes to mind. On the contrary, this term is often associated with adjectives like “robotic,” “monotone,” “stone-cold” and even “dead.” However, AI voice technology has come a long way. Its quality is improving with AI model training on the basics of human emotions: Happiness, Sadness, Anger, Fear, Disgust, Surprise, complemented by human-in-the-loop editing to refine prosody and pronunciation. 


And the market is taking notice. In the era of content explosion, the demand for scalable voice production is surging. Entertainment is shifting towards live services, where more content means better engagement. Just look at Netflix, rolling out new shows weekly, or consider how many TikTok videos an average user scrolls through daily. Riot Games releases new game updates every two weeks, and an immense number of media assets: e-learning courses, video games, movies, TV series require dubbing. The sheer volume is staggering.


With a limited pool of voice actors, scheduling conflicts, long turnaround times, and high recording studio costs, traditional dubbing is struggling to keep up. Companies looking to scale need a more sustainable solution, giving rise to AI voice startups like Eleven Labs, Voiseed, Deepdub and NeuralGarage (VisualDub.ai), just to name a few. Even Amazon is piloting AI dubbing for some of its content, and EGA (Entertainment Globalization Association) has dedicated an entire panel on voice, with AI as a key topic. The industry is curious about the possibilities, but I believe two primary concerns remain:


1. Is It Legal⁉️🤔


Remember the controversial lawsuit between OpenAI and Scarlett Johansson, which is still unresolved as of today. Unlike image rights, vocal identity is harder to define because perception is inherently subjective. Have you ever mistaken one person’s voice for another? As AI-generated voices become more sophisticated, this will happen more frequently. Without clear legal frameworks, the unauthorized use of someone’s voice could become rampant. A person’s voice is their vocal brand and should be legally protected, just like their image.


Additionally, voice actors’ unions, such as the Screen Actors Guild (SAG) in the U.S., have long established partnerships with recording studios and clients, and rooted themselves in their client’s work, creating stickiness and brand consistency. After all, no one wants their favorite character to suddenly sound completely different. A shift from human dubbing might risk alienating the audience and lose a brand’s vocal branding. 


2. Is It Good (Enough)?💯🧐


Quality is subjective, a compliance to an expectation. While AI voice technology is improving, the real question is whether it can fully deliver the unique characteristics of a voice that clients seek. AI voices may be sophisticated, but are they good enough to replace human dubbing? And even if they are, will audiences embrace them?


Welcome or not, AI voice technology is here to stay and play.  But will it be the modern-day equivalent of Ursula, taking away voices, displacing voice actors, or will it voice the unvoiced, empowering billions and making content more accessible?


What’s your prediction? 

 
 
 

Hozzászólások


bottom of page