Losing his voice? Voice-over master Dean Compoginis sounds an alarm about AI

Quick Take

Aptos resident Dean Compoginis has made a handsome living as a voice-over narration artist. But, he says, with its increasingly realistic mimicking of the human voice, artificial intelligence now poses an existential threat to people in his field.

Dean Compoginis has a nice voice — rich, expressive, with a bit of musicality to it.

It’s so nice, in fact, that he’s been able to extract a good living from it, first as a radio on-air host, then as a busy voice-over professional.

At 65, the Aptos resident still feels that he’s on top of his game, providing voice-over narration for advertisements, promos, video games and other formats. His voice has been heard in ads for the California Lottery and Burger King, and promos for Comedy Central and “Family Guy.”

But he also feels that he might be soon losing control over his own voice — thanks to artificial intelligence.

Humans are mostly visuals-first creatures. Our eyes are the primary way we investigate the world around us, and crucially the primary way we recognize what is familiar, and what is trustworthy. That’s why much of the debate on AI as an agent of deception and trickery has revolved around video and still images.

But what about the auditory world? If AI has not quite succeeded in creating faces and images that can reliably fool the human eye, how much closer is it to mastering the voice that could fool the ear?

For more than a year now, stories have been surfacing about scammers using AI-generated voices on the phone to convince the unsuspecting that friends or family members are in trouble, in order to pry away money or sensitive information.

Earlier this year, OpenAI, the developer of ChatGPT and one of the AI industry’s leading players, announced the development of Voice Engine, which needs only a 15-second sample to essentially clone a distinct voice. The technology has not so far been released in anything other than a preview, largely, according to the company, “to start a dialogue on the responsible deployment of synthetic voices. (OpenAI also absorbed its share of crossfire when one of its AI-generated “personal assistants” may or may not have cloned the voice of movie star Scarlett Johansson.)

What has gotten less attention than AI voice scams is what the AI revolution is doing to the $4.4 billion voice-over industry. Ever since the beginnings of sound recordings, people like Dean Compoginis have been lending their real voices to scripted off-camera narrations, from plummy tones in old newsreels, to voice acting in cartoons, to perhaps the GOAT of movie-trailer voice-over, Mr. “In-a-world…” himself, Don LaFontaine.

Today, those opportunities are more in explainer videos, podcasts, e-learning videos, and audiobook narration.

Aptos-based voice-over artist Dean Compoginis — Credit: Natasha Loudermilk / Lookout Santa Cruz

For years, Compoginis has been a part of that tradition, recording from his own home studio in Aptos, an arrangement made all the more convenient during the pandemic. He is one of about 45 voice-over artists for hire from in and around Santa Cruz listed by Voices.com, one of the leading marketplace platforms in the industry.

Compoginis has done well for himself in the business, but now he’s experiencing a foreboding turn toward a future that threatens to turn his industry upside down.

“I think we’re only at the beginning stages of what AI will do, as far as replacing actual human creatives,” he said.

Of course, it’s one thing for AI to provide fully computer-generated voices to convert text to speech that is sufficiently human-sounding enough to push real human professionals out of jobs. There are already tools available to do exactly that. But it’s quite another when you find your own distinctive voice has been replicated and put to use in ways that you didn’t authorize.

A few years ago, Compoginis voiced a character in a video game that was popular enough to attract millions of downloads. Not too long ago, someone sent him a link to a site that promised to produce a narration in that character’s voice. Compoginis had never heard of the site, and was certainly not getting compensated for it. He felt his voice was being hijacked.

“I thought, ‘Boy, this is the exact thing that the SAG-AFTRA strike was all about,'” he said.

He is referring to the actors union strike that started exactly a year ago and lasted almost four months, the longest strike in the history of the Screen Actors Guild and the American Federation of Television and Radio Artists, of which Compoginis is a member. The strike came about, in part, because of concern over the growing viability of AI and its power to mimic real people. The strike’s settlement brought about some protections to prevent Hollywood studios and producers from using AI-generated performances. But, in the same way that building a sea wall does nothing to stop the rising tide, the momentum of AI will continue to exert pressure on creative industries.

When the possibilities of AI first began bubbling up in the creative fields, Compoginis was hearing from voice actors, directors, agents and others in the industry that “they’ll never replace the emotion and nuance of a real human voice.” But, Compoginis soon came to realize, “it’s just a matter of time before they do.”

Another element at play here is how AI-generated voices are already being normalized with the use of such voices in social-media environments. “There is a kind of lowering of standards of what people accept for quality of content,” he said. “Certainly as younger generations get phones in their hands, they’re already used to listening to something that’s not a human voice, and it’s perfectly acceptable. They don’t give it a second thought. It seems like anything I see on Instagram — I don’t use TikTok, but I’m sure it’s the same — it’s the same five AI voices.”

It’s difficult to get too outraged at an AI-generated voice, say, on hold, telling you that “your call is very important to us.” Most of us know when we’re hearing an obvious bot voice. But, as the phone scams and the ScarJo scandal demonstrate, AI is moving headlong into convincing replications of unique voices. And that can fool even a professional’s ear.

“There are times when I do get fooled,” said Compoginis. “Sometimes, I really have to listen closely to it before I finally think, ‘Oh, this is an AI voice because [no real person] would put the emphasis on that syllable or whatever.’ But I think that the person who doesn’t do this professionally won’t notice it, in many cases.”

AI voiceover companies might even be taking advantage of early-career people looking to get into voice-over work. Compoginis said that there is plenty of work online for voice talent in the field known as text-to-speech (TTS) voice modeling, which creates artificial voices from the raw material of real voices, to repurpose for everything from advertising to podcasts to audiobooks.

“You’ll see places where they’re paying what seems like a substantial sum of money to spend 20 hours or so reading endless lists of words,” he said. “And then you sign away your voice, and they can use that for anything they want.”

Today, many people — public figures, celebrities, podcasters, social-media users — have already created a boundless catalog of speech from their own recorded voices that can allow unscrupulous opportunists to bring about a world where, for example, a political candidate is recorded saying something outrageous that they didn’t actually say.

Within an apocalyptic world where you can’t trust your own ears, there is also the possibility that AI could profoundly shape the evolution of language itself. Language changes over time because each generation of human speakers and writers push it into new realms, through slang, jargon, memes and other innovations. Where is the tipping point between where AI is merely mimicking language to where it is influencing and shaping language through the creation of new words and idioms?

For now, though, Dean Compoginis and other voice-over professionals are engaged in a race to stay ahead of the technology to preserve that thing that is uniquely, irreducibly human.

“One of the secrets [of good voice-over work] that’s hard to teach people who are learning,” he said, “is that the magic is in the pauses, in the spaces between the bars. And there’s something about just delaying a little bit or speeding up a little bit, whether it’s in music or in a great script that gives it humanity. I’m sure that there are AI scientists who are breaking that down right now. ‘How do we get the pauses in there so they sound perfect?’, you know? But for now, at least those pauses, spaces in between the words. They are what really makes it resonate with us as humans.”

Have something to say? Lookout welcomes letters to the editor, within our policies, from readers. Guidelines here.