Why Voice Tech Will Be the Post-Crisis Standard -- and Not Just for Ordering Pizza
My kids, ages 8 and 5, are showing me the future. When I want to watch a movie or turn out the lights, I instinctively reach for the remote or flick a switch. My children find it far more natural to just ask Siri for Peppa Pig, or tell Alexa to darken the room. Tapping a keyboard or clicking a mouse? Lame and old-fashioned. Why not just talk to the machines around us like we talk to each other?
Of course, right now talking — rather than touching — also has serious safety upsides. Voice tech adoption has accelerated as the coronavirus pandemic makes everyone touchy about how sanitary it is to poke buttons and screens. But the reality is the 2020s were poised to be the decade of voice technology well before the crisis hit.
Indeed, thanks to a convergence of technology, necessity and demographic shifts, voice is in the unique position to become not just increasingly popular but the dominant user interface going forward. Before long, we’ll all be conversing with our devices pretty much non-stop, and to do much more than just set timers and fetch weather reports.
And much like the desktop software industry back in the day and smartphone apps after that, a multibillion-dollar business ecosystem is about to surge around voice tech — at least, for entrepreneurs and businesses ready to ride the voice wave.
How voice tech went from talk to action
Getting to the point where we can casually ask our Apple Watches for nearby dinner recommendations is no small feat. It required the integration of decades of advancements in AI-driven natural-language-processing, speech recognition, computing horsepower, and wireless networking, to name just a few building blocks.
And yet, we’re just starting to grasp the potential of these technologies. Voice is the ultimate user interface because it’s not really a UI, but part of what we are as humans and how we communicate. There’s almost no learning curve required like there is when people take typing classes. Voice-enabled machines learn to adapt to our natural behaviors rather than the other way around. My kids love joking with Siri — nobody clowns around with a keyboard.
The business model around voice tech is crystallizing, as well. Developing AI and related technologies is complex and costly, so mega-capitalized giants like Google, Apple and Amazon have built an insurmountable first-mover advantage and dug a moat behind them. But they’ve also created countless lucrative niches in their ecosystems for other companies.
Just as the iPhone gave birth to a $6.3 trillion dollar mobile app economy, platforms like Alexa and Google Assistant have already created opportunities for developers to create more than 100,000 Alexa “skills” and 4000 Google Assistant apps or actions. In the years ahead, that ecosystem will likely grow to rival traditional apps in number and value.
The coronavirus pandemic is further boosting the adoption of voice-enabled technology, with 36% of U.S. smart-speaker owners reporting they’ve increased their use of their devices for news and information. And hygienic concerns are bringing contactless technologies like voice-controlled elevators out of the realm of fiction (and sketch comedies) and into offices and public spaces, so people don’t have to touch the same buttons and keypads as countless strangers.
How voice can take us “back to the future” in terms of human interaction
Yet for all the advances we’ve achieved, we’re still in the Voice 1.0 era. We’re mostly just commanding our devices to execute simple tasks, like setting alarms or telling us sports scores. In reality, this is just the beginning of what’s possible.
Machine learning underpins voice technology, and the AI gets smarter as we feed it more data. The number of voice-enabled devices in use is soaring — sales of smart speakers increased by 70% between 2018 and 2019 — flooding computers with more data to learn from. And that doesn’t count the billions of smartphone users talking to Siri and Google Assistant. Machines are growing much smarter, much faster.
Amazon and Google may soon take machines’ conversational skills to a deeper level. Both companies have filed patents for technology to read emotions in people’s voices. Marketers might salivate over the prospect of advertising products that suit how customers are feeling at the moment (“You sound hangry — how about a takeout pizza?”), but the applications for emotionally attuned bots don’t have to be so crassly commercial.
Spike Jonze’s movie Her, for example, tells the story of a lonely writer who develops a passionate relationship with his computer operating system, Samantha, as Samantha learns to become more conscious, self-aware and emotionally intelligent.
Robotic companionship seemed far-fetched when the film came out in 2013, but when this year’s pandemic locked millions down into isolation, hundreds of thousands downloaded Replika, a chatbot phone app that provides friendship and human-like conversation. People can develop genuine attachment to conversant machines, as seniors do with Zora, a human-controlled robot caregiver.
Why the booming voice market is just beginning
Coming months and years will see not only improved tech, but an expansion of voice to nearly all areas of business and life. Ultimately, voice technology isn’t a single industry, after all. Rather, it’s a transformative technology that disrupts nearly every industry, like smartphones and the internet did before. The voice and speech recognition market is expected to grow at a 17.2% compound annualized rate to reach $26.8 billion by 2025. Meanwhile, AI — the technology that underpins voice, and in many respects parallels its true potential — is estimated to add $5.8 trillion in value annually.
But unlike other technological advances that have radically changed how we live, voice technologies promise to make machines and people alike behave more like humans. In terms of adoption rates, applications and market, the possibilities are enough to leave one, well, speechless.
This article was written for Entrepreneur.com. You can view the original article here.