The Power of Synthetic Voice
AI-powered voice tech has gone mainstream. That's not a controversial statement. Case in point, the overwhelming majority of us have interacted with a voice assistant in one context or another, whether to order food, play a song, get driving directions, or turn on the lights. Voice tech has demonstrated its effectiveness as a task-based assistant, and the number of skills and actions it can accomplish grows by the day.
But the power of voice tech goes beyond voice skills and actions. While they were instrumental in raising awareness on the efficiency and usefulness of voice-first interfaces, there are clearly other use cases in which the tech can produce brilliant outcomes. Indeed, the worlds of advertising, film and television, sports, and gaming have recently embraced another facet of AI voice tech—synthetic voice.
And though one doesn't exclude the other, task-based voice assistants will continue to develop and grow. In this post, we want to highlight how various industries have used synthetic voice to great effect outside the scope of user assistants.
Of course, every voice assistant out there has a "synthetic" voice. Whether it's Alexa or Siri, they each have their own unique voice—and that voice is synthetic (i.e., an AI-generated voice). And while recordings of actual human speech serves as the basis of that synthetic voice, AI can use those voice samples to produce utterances that the human behind the recordings never pronounced. If we want our voice assistants to sound natural, there's no other way. It's hard to imagine how long it would take for human talent to record every possible response a voice assistant might have.
The above implies that from a relatively small sample of actual human speech, AI can produce almost any utterance in that voice. So that's likely to turn on a lightbulb in the heads of those working in advertising, film and television, sports, and gaming. All of these industries rely on human voice talent for their craft. For the latter, synthetic voice opens the door to a whole world of new possibilities.
Let's look at some of the ways these industries stand to benefit from synthetic voice.
Success in global markets is tied to global messaging. International brands need to advertise in dozens of languages for success. However, finding voice talent that speaks 36 languages is going to be challenging, to say the least. And even if you did find such talent, it would be a costly and time-consuming process. And then, what about last-minute changes to the product, the service, or the messaging? Back to the drawing board again…
Advertisers perhaps understand the benefit of synthetic speech better than any other industry. They need to produce a lot of content to keep up with brands' current offerings (whatever they may be), and they have a short turnaround time.
With synthetic voice, the human talent can license their AI voice for all of the brand's adverts with absolutely no drop in the performance of the ads. Brands can then produce adverts in multiple languages and with different messaging to cater to local markets, using the talents’ voice—all without having them come into the studio for more recording. This simplifies their logistics and planning for brands, saving them time, money, and hassle. For voice talent, it saves them time and provides them with a passive source of income as their voice gets repurposed.
Film and Television
The film and television industries also benefit from synthetic voice. Films and television programs, like adverts, are usually translated into various languages. Typically, this means hiring voice actors for each language to record overdubs of the entire film. That's not only costly and time-consuming, but it also means that the actors' voices will have to be different.
With synthetic speech, we could translate films or TV shows into as many languages as we like using the original actors' voices without having to book hours upon hours of studio time. Other use cases would be modifying off-screen dialogue, changing the tone of an existing line, or synthesizing the speech of actors who have passed away, lost their voice, or their voice has changed due to an illness or aging.
On the latter point, synthetic voice was used in the films "Roadrunner," a documentary on the life of the late Anthony Bourdain, using some of Bourdain's previously recorded speech to synthesize a few sentences he had never spoken; Top Gun: Maverick cloning the voice of Hollywood star, Val Kilmer; and Star Wars: The Mandalorian, synthesizing the voice for young Luke Skywalker played by Mark Hamill
The world of sports also stands to benefit from the advent of synthetic voice. An obvious application will of course be advertising. Promoting sporting events using real players’, referees’, or commentators’ (synthetic) voices make for very compelling advertising without the logistics hassle of bringing the talent into the studio, especially during their busy season.
Another example comes from the Pittsburgh Steelers. The American football team recently launched an application that uses actual players’ voices to communicate with fans. Users of the app can ask just about any question concerning a game or a player and they’ll get a response in the voice of one of the team’s star players. You can even ask a question to a specific player and their synthetic voice speaks up to provide an answer to your question.
Like film and television, video games also get translated into multiple languages. And the games industry similarly benefits from synthetic speech as well. Of course, as mentioned above, the ability to translate dialogue using the original voice actors' voices is a massive advantage, as is the ability to make changes to a character's dialogue "on the fly" (without the need to re-record anything).
But many games - typically AAA titles- tend to provide extra levels and new characters that impact the storyline, requiring new dialogue. Synthetic voice simplifies that process and saves the game developer enormous amounts of time, making video game companies able to release quicker, more frequently, and with less overhead. If you've ever worked in software, you'll know that the latter is somewhat of a “holy trinity” of software development—synthetic speech can help them get there.
So, while voice tech is extremely compelling as a task-based human assistant, that's only part of the story. Its speech-generation capabilities represent a massive innovation in the way we produce adverts, films, TV shows, sports, and video games. The above provides an overview of some of the main benefits AI-powered synthetic voice can bring to those industries—and they're pretty huge.
The future is vocal.
Modev was founded in 2008 on the simple belief that human connection is vital in the era of digital transformation. Modev believes markets are made. From mobile to voice, Modev has helped develop ecosystems for new waves of technology. Today, Modev produces market-leading events such as VOICE Global, presented by Google Assistant, VOICE Summit the most important voice-tech conference globally, and the Webby award-winning VOICE Talks internet talk show. Modev staff, better known as "Modevators," include community building and transformation experts worldwide. To learn more about Modev, and the breadth of events and ecosystem services offered live, virtually, local and nationally - visit modev.com.