Building Voice Interfaces For Older Adults
Most would agree that AI-powered voice tech is convenient. Using your voice to turn on the lights or to play a song is pretty neat. But beyond its "cool" factor, AI voice tech can be life-changing to some, namely, older adults (i.e., all of us, at one point in time). As our technological landscape grows and our bodies get older, it becomes more and more challenging to keep up with the pace. And at one point, many older adults simply get overwhelmed and give up trying to learn the tech. That directly translates into difficulty gaining access to the services they need and less participation in society in general.
Voice tech has the potential to empower older adults by enabling them to use their voice, rather than complex smartphones with apps, logins and passwords, pop-ups about cookies, and obnoxious ads. But despite voice tech's life-changing potential for older adults, ironically, that group is underrepresented among voice tech users. Why is that?
The most likely answer is that this demographic is generally underrepresented among internet users. That means that we lack data about that group. And hence, if you don't have the data, you can't train an algorithm to meet that group's specific needs. In this post, we look at what we can do to build AI voice assistants that are more inclusive - particularly for older adults.
Voice assistants can sometimes struggle to understand your requests in certain contexts. Perhaps the assistant misinterprets the request and serves up an answer that makes little sense. Or maybe it just keeps asking you to confirm you stated something that has nothing to do with what you actually asked. While you may have just laughed it off, it's a real problem to some. Data tells us that adults over 65 — a demographic expected to double between 2010 and 2050 — tend to give up on these tools after just a few unsuccessful attempts to be understood by voice assistants.
The problem lies with the state of existing natural language processing (NLP) systems. NLP systems are artificial intelligence models that train computers to understand human language, whether written or spoken. The main issue with current NLP systems is that they're optimized to understand and interpret short, formal questions. For those who grew up with computers, this isn't an issue. They know how to phrase their questions in a way that the device will understand. But older adults aren't used to speaking with machines; they're used to talking to people. And so they tend to struggle with downsizing their conversational questions into a short request the AI voice assistant can interpret correctly.
VOLI stands for Voice Assistant for Quality of Life and Healthcare Improvement in Aging Populations. It's a research project that brings together geriatric physicians, language processing researchers, and human-computer interaction researchers at the University of California in San Diego. Its main goal is to develop a personalized and context-aware voice-based digital assistant aimed at older adults to improve their access to healthcare and quality of life. They hope the device will enable older adults to remain independent and facilitate interactions with healthcare service providers.
As such, they've developed an end-to-end question-answering system that accomplishes three crucial tasks after the speaker has made their request and before the assistant provides an answer. These are:
- Shorten the user's question down to its essential components;
- Match the shortened medical question to a frequently asked question from a database of 17,000 medical questions sourced from official national health services;
- Select the relevant portions of the corresponding longer answer to pass on to the user.
The above is referred to as a question summarization AI model. It can either be implemented as a standalone feature into existing voice assistants like Alexa or as the first step in any existing question-answering system as a means to shorten user questions and provide better answers.
Currently, the VOLI team has completed step one: training the AI assistant to summarize long questions into shorter equivalents. This was achieved by simultaneously training the NLP model on summarizing questions and classification tasks, called question entailment. Question entailment trains the AI to be able to validate whether answering the short version question satisfies the conditions for answering its longer counterpart.
Khalil Mrini, one of the UC San Diego Ph.D. students working on the project, says, "We found that if you're training on both summarization — which is basically feeding longer user questions and the model learns to generate the short question — and at the same time training on question entailment, that classification task [point 2, above], then you're able to generate better results."
At this stage, they're still working on getting the AI to match the (shortened) question with a question from their pool of FAQs and then select and provide the relevant pieces of the longer answer (from the FAQs) back to the user.
We're looking forward to seeing this model fully developed and implemented into our devices. This truly is an example of using technology for the common good.
So that was a bird's eye view of VOLI's question summarization AI model. It really has the potential to improve the quality of life of millions of older adults. And it's refreshing to see some in the tech industry acknowledge that more can be done to bridge the gap between that demographic and edge technology.
We agree with Professor Ndapa Nakashole, one of the language processing researchers leading the project, when he states that, "the potential impact of this work is substantial because it aims to broaden the population of people who can benefit from conversational agents, by targeting an important segment of the population, older adults."
The future is vocal.
Modev was founded in 2008 on the simple belief that human connection is vital in the era of digital transformation. Modev believes markets are made. From mobile to voice, Modev has helped develop ecosystems for new waves of technology. Today, Modev produces market-leading events such as VOICE Global, presented by Google Assistant, VOICE Summit the most important voice-tech conference globally, and the Webby award-winning VOICE Talks internet talk show. Modev staff, better known as "Modevators," include community building and transformation experts worldwide. To learn more about Modev, and the breadth of events and ecosystem services offered live, virtually, locally, and nationally - visit modev.com.