Meet the Founder of - the Company Behind Multimodal

We all know that AI-powered vocal assistants are more than capable of providing the weather and turning the lights on. And that’s great. But the folks at Openstream have bigger ideas. Raj Tumuluri, founder of, ranked one of the top 10 AI companies by Gartner, joined us at VOICE Global 2021 to discuss the the next evolutionary step in enterprise AI: multimodal. 20 years in the making, multimodal conversational engines are the future of AI-powered voice tech.

What is multimodal?

Answering a question with another question can be confusing but bear with me. How can we provide business users - whether they are customers or employees - with information on demand without having to click check boxes or drop-down menus, using a flat user interface like conversational AI? How can we build a platform in which users can ask whatever they want - complex requests that touch upon multiple data points - and quickly obtain the appropriate information, much like what Google did to the data that is presented on the web? That is what multimodal is.

As Raj Tumuluri says, “We are trying to see how we can understand what the user's true intent or goal is, as opposed to just literally following what the user is asking.”

Take the example of a health insurance company providing a virtual assistant to its customers. Some customers may be interested in knowing whether there are any Covid test centers nearby. Your typical vocal assistant would provide a binary answer: Yes, there are or No, there aren’t. Perhaps a slightly better assistant would state that there are three testing centers nearby and would locate them on a map. But that’s where it ends.

A multimodal virtual assistant, on the other hand, could say “The nearest one is one mile away, but it doesn’t take appointments. The next one is 20 minutes away, and it accepts walk-ins. Shall I get you directions for that?”

To be able to provide that kind of advanced assistance, the AI needs to determine the user's end goal. Openstream’s conversational engine is not based on “if-then-else” coding or decision trees. “We are on the next level, which is a goal-based dialogue engine. When a user asks a particular question, we kind of understand their plan,” Tumulri tells us.

Next-Level AI

Multimodal is able to achieve this level of understanding and anticipation because it’s designed to easily ingest as much data as possible within a given organization, to be able to search, sort, and tie multiple data points together. This could be anything from policy documents, database information, schemas, to contact center logs or customer support logs.

And unlike most AI-driven conversational engines, multimodal can handle things like negations and superlatives, which make vocal commands more complex and difficult to interpret by voice assistants.

So with troves of pertinent data in one hand and the ability to interpret complex natural language commands, multimodal could, within the context of property insurance, answer queries such as: “Which of these properties have these coverages but not these other coverages; which exclusions apply to this property?” Or “Are there any catastrophic events happening that could affect my properties?”

Multimodal is also suited to predictive analytics, which is achieved by analyzing pertinent past trends for the given subject of the query and predicting an outcome. So it could even answer queries like: “How many units of this product will we sell in the two first quarters of the fiscal year?”

As Raj Tumuluri says, “That’s really what we mean when we talk about a full manifestation of enterprise AI. Whether you're an enterprise CEO that's trying to find some quarterly results, or the Systems Admin trying to find out how the network is doing, what the network threats are, or an enterprise customer that's trying to find out about a product or a claim, that's really how the enterprise AI manifests itself in terms of benefits for the three classes of users.”

That’s pretty powerful stuff. That’s next-level AI. That’s

