Modev Blog

Subscribe Here!

3 Tips From Alexa Developers on Building Voice-First Apps

Siri, Cortana, Alexa, and Google Now were all once the stuff of science fiction. With non-stop developments in machine learning and voice control, voice-first interfaces are steadily changing how we prefer to interact with our devices.

Forget trying to type where the nearest coffee shop is, now all you have to do is ask your phone. (It’ll even give you a nicer answer than most passersby.)

The wonders of voice technology are gradually absorbing society, and it’s up to the developers of voice-first applications to build our next experiences. This is no small challenge. In most cases, voice apps are unsupported by any visual content and there is no way to hide blips in their service. As Nico Acosta, Director of Product Management for Programmable Voice at Twilio says, “Voice applications can’t fail.” He’s not wrong.

To avoid being ridiculed by the public (remember Alexa’s creepy laugh?) or left with a stunted voice app, here are some useful pointers from top voice-first application developers.

Set up your backend infrastructure

TribalScale, a developer shop building Alexa Skills, told IT World that they recommend putting in the work to ensure your backend supports the delivery of data requests in real time and on demand. For voice-first apps, everything lives in the cloud. The last thing you want is your user staring at a revolving circle or flashing light while your app is retrieving the necessary information.

The dev shop themselves solves this latency by gathering information asynchronously from multiple endpoints into a single endpoint portal. They use AWS Lamba to host the cloud-based Skills, and when a data request hits, the information is retrieved and cached for next time the user asks for it.


Write a script

To know what your users may ask of your voice app, look no further than the FAQ section of your website. Listening in on customer support calls can also give you an insight on the most common questions. Creating a VUI document defining all the interactions with the app is key. It’s like a wireframe, but for voice.

On another note, John Kelvie from Bespoken explained to VoiceBot that leveraging recorded audios can provide a personality for your Skill or Action. This helps to create a stronger connection with your user. Text-to-speech (TTS) APIs like Amazon Polly can result in pronunciation issues that put users off, so if you’re developing for Alexa, consider using Bespoken’s encoding tool to encode and upload your audio files.


Test, then test again

Voice apps need good UX just like any other app. Especially since in some cases there’s no visual way for users to quickly backtrack if they make a mistake. Ahmed Bouzid from Witlingo told VoiceBot that devs should identify the right use cases, build an MVP, and validate any assumptions with actual users. They’re bound to find a missing phrase prompt or get confused by a complex question. This feedback is vital to improving your voice app and helping users interact more naturally with it.

Clearly, developing for voice-first apps is no small feat, and there’s much to learn about the future of conversational and voice interfaces. Where will the wave of AI focus our sights next? Are we slowly progressing towards Singularity? Here’s a cool idea, you can ask one of the many tech leaders at our three-day VOICE summit on July 24th. Follow all the meaty details on Twitter right here.


Voice Technology, VOICE18