Three Game-Changers for Voice Technology in 2021

VOICE Talks 21-03-11 Jowie El Hakim, Product Marketing Intern at Rumble Studio 5 min read

As part of the VOICE Talks series, the CEO of Rumble Studio presents his pick of the most important voice technologies in 2021, and how brands can leverage these for success.

The VOICE Talks series, presented by Modev in collaboration with Google, showcases the latest trends and tech in the voice technology industry. The upcoming episode, ‘Starting A New Decade With VOICE And AI’ examines how artificial intelligence will make voice-enabled technologies, machines and devices behave more like humans.

Our very own Carl Robinson, CEO of Rumble Studio, was invited to join the panel of experts and discuss his pick of ‘game-changing’ technologies for voice in 2021.

Check out the full discussion that is now available to watch on Youtube anytime, anywhere.

Brands successfully leveraging voice tech and AI

There are plenty of marketing brands that are succeeding with voice tech and AI as part of their strategy. For example, marketing services company, Meredith, showcased innovative audio programming in 2019, leveraging trusted brands like PEOPLE and InStyle.

Companies are also using digital assistants for internal use. GreenKey makes law enforcement patrol jobs easier by using speech AI to turn spoken police vocabulary into real-time actionable data.

Satisfi labs creates custom conversational search engines for brick and mortar establishments and consumer experiences. Their solution is often implemented in sports venues. By only supporting a constrained range of request intents and vocabulary, their Speech To Text (STT) is extremely accurate.

All of the above mentioned companies have leveraged the latest voice technologies, which can be game-changers for brands in many different verticals.

Three game-changers in voice

NLP and NLG

Machines will soon be able to understand human speech on a previously unimagined level, thanks to Natural Language Processing (NLP).

NLP involves the automatic manipulation of natural language, like speech and text, by software. NLP makes computer programs capable of "understanding" the contents of documents, including the language’s contextual nuances.

Natural Language Generation (NLG), on the other hand, is an AI-powered technique that transforms structured data into natural language. It is used to generate new and original text.

Brands better understand their customers

By using NLP and NLG, retail and e-commerce companies are able to convert structured data like product specifications into textual descriptions that are easy to read by humans.

Retail executives capture large amounts of data from analytics. NLG can be used to create personalized product descriptions for customers from this analytics data, which can help increase sales.

How Google is leading the way

Google is a leader in AI research, developing models, frameworks and infrastructure solutions.

Google AI Language recently published a paper on BERT. It presents state-of-the-art results in a wide variety of NLP tasks.
TensorFlow is an open-source library developed by Google primarily for deep learning applications.
Google Cloud is a planet-scale infrastructure designed to provide security through the entire information processing life cycle at Google.

Google has also developed a conversational AI product called Google Duplex. It was demonstrated autonomously to make restaurant reservations by phone, and is now being extended to other use-cases.

Here at Rumble Studio, we use Google's speech-to-text (STT) cloud service to generate transcriptions of all user-generated audio content.

Voice synthesis and voice cloning

There are many services and types of software that offer to turn your text into speech.

Voice synthesis is the conversion of language text into speech using a text-to-speech (TTS) system.
Voice cloning is the creation of an artificial simulation of a person's voice.

They can both be used by brands all over the world for marketing purposes.

Brand distinction and identity

Every brand wants a unique identity and voice that is clearly distinguished from its competitors.

Synthetic voices can be personalized to the user, their culture, or even a specific use-case or context, promoting inclusivity and variety. For example, users of Google assistant can personalize it by choosing its voice.

Text-to-speech also permits automated or semi-automated audio content creation, an often time-consuming process that companies strive to automate. This is the main idea behind our audio content marketing platform Rumble Studio, which uses TTS to create podcasts asynchronously.

While voice synthesis and voice cloning can elevate a brand’s identity, if users feel like they’re talking to a “machine” the effect can be the opposite.

One step closer to reality

Google offers customers a range of premium voices generated using their WaveNet model, the technology used to produce speech for Google Assistant, Google Search, and Google Translate.

WaveNet models improve over standard many TTS systems, generating very natural sounding speech.

Experimental results of their new Parallel Tacotron model have generated even more natural sounding voices that are difficult to dissociate from regular human speech.

Audio content marketing via Voice

Consumers prefer Voice Search

Voice search is one of the fastest growing e-commerce trends. Reports estimate that 111.8 million people in the US are using voice search features.

People now prefer to talk to their Assistants rather than merely Googling their questions the traditional way. Brands will soon make all of their content audible to continue to be a part of their consumers’ lives.

Audio content for SEO

Brands are looking to get ahead of their competitors by including voice content marketing in their strategy.

Building an SEO strategy that takes voice and audio content into consideration is a smart move for B2B businesses. This could simply mean using more conversational language in blog and website content, or actually producing branded audio content.

Podcasts audio content

Google Podcasts is automatically transcribing full podcast episodes and returning these in search results. Using these transcripts as metadata helps users find the podcasts they want to listen to even if they don’t know their title or when it was published.

As the volume of audio content and the number of voice searches increases, the audio content returned will become more fine-grained, answering ever-more specific queries. Indeed, companies have already started answering long-tail questions with bite-sized audio content created on Rumble Studio.

Join the discussion at VOICE Talks

These three game-changing technologies were chosen by our CEO. To hear the other thought-leaders on the panel, tune into the discussion on Youtube.

Don’t forget to follow Voice Talks on Twitter, Instagram, and LinkedIn to stay updated with the voice industry!

Jowie El Hakim, Product Marketing Intern at Rumble Studio

Jowie El Hakim is the Production Marketing intern at Rumble Studio, where users can create audio as quickly as a blog post and record and publish guest interviews in minutes with their unique conversational A.I. no skills required!