It wasn’t so long ago that we were amazed that we interact with an IVR to solve basic customer service issues or that we could type on a piece of glass to communicate with our mobile device. The problem was that most IVR systems didn’t learn about our needs over time and typing and gesturing is really not the most efficient way to interact with a computer or device for the vastly most common uses most people have. Even for the most basic routines require far more physical and mental effort than what should be required in today’s digital age.
What if each time you needed to reach out to your bank (or any other business) the organization remembered your entire history of interactions (needs, behaviors, preferences, transaction patterns and timing, etc.), allowing it to have an intelligent ‘conversation’ without going over often repeated steps? What if basic and more involved interactions could be done with the most effective and powerful communication tool ever invented? Your voice.
“Voice recognition accuracy has rapidly surpassed 90% and is approaching the 99% threshold for accuracy.”
— Mary Meeker, Partner at KPCB
The power of voice-based systems has grown powerful with the addition of always-on systems, combined with machine learning (Artificial Intelligence or AI), cloud-based computing power and highly optimized algorithms. Modern speech recognition systems, combined with almost pristine text-to-speech voices that almost indiscernibly resemble human speech, are ushering in a new era of voice-driven computing.
Voice First systems fundamentally change the process by decoding volition and intent using self learning artificial intelligence. The first example of this technology was with Siri. Prior to Siri, systems like Nuance were limited to listening to audio and creating text. Nuance’s technology had roots in optical character recognition and table arrays.
According to , founder and chief analyst of Technalysis Research, “As the technology improves and people become more accustomed to speaking to their devices, digital assistants are poised to change not only how we interact with and think about technology, but even the types of devices, applications and services that we purchase and use. The changes won’t happen overnight, but the rise of these voice-driven digital helpers portends some truly revolutionary developments in the tech world.” Apple’s Siri, Google Voice, Microsoft’s Cortana, Amazon’s Echo/Alexa, Facebook’s M and a few others are the best consumer examples of the combination of speech recognition and text-to-speech products today.
Players in the voice computing marketplace are spending a lot of time fine-tuning various parts of the interaction chain, from initial speech recognition to server-based analysis. And the efforts are paying off. According to from KPCB, voice recognition accuracy has rapidly surpassed 90% and is approaching the 99% threshold for accuracy.
The use of voice has been also risen dramatically. Voice search queries on Google, for example, are up 35 times since 2008, while sales of voice-based devices such as the Amazon Echo are increasing. Interestingly, mobile device sales appear to have peaked in 2015.
Moving Beyond Siri
One of the foremost authorities on the advancements in voice technology is , founder of . He is also the publisher of his own , where continually covers the advancements in technologies that will impact the banking industry. I asked Brian whether the shine has worn off Siri now that new entrants have introduced technology that is obviously more advanced. He warned that while there is a narrative that Apple has not understood AI very well, most forget about a number of acquisitions that position Apple well for the future.
In addition to the acquisition of Siri, Apple’s acquisition of Emotient, Perceptio, VocalIQ and perhaps a number of not yet disclosed AI acquisitions have allowed Apple to develop new solutions behind the scene. Some of the technology is quite unique, such as Emotient’s, which will read the movement of the 43 muscles in a person’s face to decode emotional intent which will aid machines to understand a consumer better. This technology is rather critically important to machine learning and everything Apple will do with AI.
“Vocal IQ introduced the world’s first self-learning dialogue API – putting real, natural conversation between people and their devices. Every time an application is used it gets a little bit smarter. Previous conversations are central to it’s learning process — allowing the system to better understand future requests and react more intelligently,” states Roemmele. “We will see some of the implementations of this AI technology during the 2016 World Wide Developers Conference and the beginnings of ‘Siri 2.o’ for iOS, on Apple Watch, on Apple TV and on OS X computers.”
Amazon Echo: From Fun to Functional
Amazon CEO Jeff Bezos has always been a leader in the field of artificial intelligence. The original premise of Echo was to be a portable book reader built around 7 powerful omni-directional microphones and a surprisingly good WiFi/Bluetooth speaker (with separate woofer and tweeter). This humble mission soon morphed into a far more robust solution that is just now taking form for most people.
Beyond the the power of the Echo hardware is the power of Amazon Web Services (AWS). AWS is one of the largest virtual computer platforms in the world. Echo simply would not work without this platform. The always-on, always-listening nature of Alexa encourages natural dialogs, with much more in-depth learning over time compared to the Siri interface.
According to Bezos, Amazon has invested four years of research into its key project in AI — and stacked that project with a sizable staff. Bezos said at the Code Conference that the team on Alexa, Amazon’s smart voice-assistant software, and Echo, its flagship device, is now more than 1,000 employees.
Echo is a step forward from the current incarnation of Siri, not so much for the sophistication of the technology or the open APIs, but for the single purpose dedicated Voice First design. Evidence of this commitment arrives in the form of an email each week as Amazon introduces more unique ways Alexa can make a user’s life easier.
Far beyond a device to tell the weather or provide sports scores, recent updates discussed how Alexa can help schedule the watering of a lawn, setting of a thermostat or even the automatic backing out of a new Tesla from the garage without driver assistance. It represents the integration of functions far beyond just verbal conversations.
”We have more than 1,000 people working on Alexa and Echo.”
— Jeff Bezos, President/CEO of Amazon
“We’ve been working behind the scenes for the last four years,” he said about Echo and Alexa. “It’s just the tip of the iceberg.” While he said voice interfaces won’t replace phones altogether, he stated that the field of AI is only getting bigger — and noted that one of Amazon’s advantages is their gathering and analyzing of vast amounts of data. “There will be huge advances,” he said. “Bigger companies like Amazon have an advantage because you need a lot of data to do extraordinary things.”
Note: If you want to try Alexa, the powerful AI behind Amazon’s Echo home personal assistant device, Amazon has put the power of Alexa into a freely accessible website, . You will need to have an Amazon account that you can use to login, but after that, all you need to do is to click and hold on the webpage’s mic button and begin asking questions. The app uses your computer’s microphone to show how Alexa can answer questions about weather, news, offer up information about web queries and more. It can also control smarthome gadgets, and now offers “skills,” which are third-party apps developers can build to expand the capabilities of the device.
Google Home: AI for the Home
Google Home is designed with a very significant influence from Amazon’s Echo product, especially in it;s design. Home is a device with the central purpose of interacting with Google Assistant, combining Google Search and the Google Now natural language processing platform for uses designed specifically for in-home use.
According to Roemmele, Voice First devices like Echo and Home will be driven by Voice Commerce and Voice Payments. Advertising simply will not exist in the current form with Voice First devices. Google built its business on search and advertising and it is abundantly clear Google understands search and also understands that Voice First represents a paradigm shift in the advertising model they built up to this point.
Home and Echo represent just the start of the Voice First revolution, according to Brian Roemmele. He believes this is the start of the first wave Voice First devices and it will be likely these companies will enter with consumer grade and enterprise grade systems and devices:
Viv: From the Makers of Siri
The founders of Siri spent a few years thinking about the direction of Voice First software after they left Apple. The results of this thinking were just introduced. Viv is orders of magnitude more sophisticated in the way it will act on the dialog you create with it. Viv more natural and intelligent. This is based upon a new paradigm the Viv team has created called “Exponential Programming”.
According to Roemmele, “As Viv is used by thousand to millions of users, asking perhaps thousands of questions per second, the learning will grow exponentially in short order. Siri and the current voice platforms currently can’t do anything that coders haven’t explicitly programmed it for. Viv solves this problem with the ‘Dynamically Evolving Systems’ architecture that operates directly on the nouns, pronouns, adjectives and verbs.”
Viv is built around three principles or “pillars”:
- It will be taught by the world
- it will know more than it is taught
- it will learn something every day.
The experience with Viv will be far more fluid and interactive than any system publicly available. The results will be a system that will ultimately predict your needs and allow you to almost communicate in the shorthand dialogs found common in close relationships.
In a Voice First World, Marketing and Payments Change
In the world of Voice Commerce, advertising as we know it will not exist primarily because we would not tolerate commercial intrusions and interruptions in our dialogs. It would be equivalent to having a friend break into an ment about a new gasoline. Instead, the importance of how a solution or product is found will be most important, since all responses to requests will be based on search and advanced leaning algorithms.
Payments will also change in profound ways, states Roemmele. “Many consumer dialogs will have implicit and explicit layered Voice Payments with multiple payment type. Voice First systems will mediate and manage these situations based on a host of factors.” The companies that prevail will have identified the best ways to gain prominence and positioning in the algorithm that connects merchants to customers.
This new advertising and payments paradigm will form a convergence. Voice Commerce will become the primary replacement for advertising and Voice Payments will be the foundation for Voice Commerce. The shift will impact what we today call online, in-app and retail purchases. Interestingly, without human mediated searches on Google, there is no pay-per click. Without a scan of the headlines at your favorite news site, there is no banner advertising.
The Future of Voice Banking
A major part of the Voice First paradigm is a modern Intelligent Agent (also known as Intelligent Assistant). Over time, all of us will have many, perhaps dozens, interacting with each other and acting on our behalf. These Intelligent Agents will be the “ghost in the machine” in Voice First devices. They will be dispatched independently of the fundamental software and form a secondary layer that can fluidly connect between a spectrum of services and systems.
Santander UK recently launched a voice assistant in its student-geared mobile banking app, SmartBank. It was the first bank in the U.K. to introduce a voice technology offering. The bank opted to test the voice technology (offered by Nuance) in its student app because millennials are known early adopters.
As the solution is introduced, developers are manually changing how the software is programmed, so that it can “learn” from each experience. Bank officials stated that as the technology develops and becomes more used, it will be able to learn from its interactions with customers and improve without help from human developers.
”In the next 10 years, 50% of all banking interactions will be via Voice First devices.”
— Brian Roemmele, Founder of PayFinders
The current use of voice technology at Santander is limited to specific banking activities, like analyzing card spending, using voice biometrics for authentication purposes is also being considered. “We want to test one thing at a time,” a bank official said. In addition, security for voice interactions presents a unique challenge. For instance, the ability to conduct private transactions in a public location requires additional precautions.
Most financial organizations will move from basic dialogue and account inquiries to doing transactions using voice commands. This can include being able to execute payments using voice commands, as well as doing account transfers and executing account alerts using voice commands. Roemmele believes that in the next 10 years, 50% of all banking interactions will be via Voice First devices.
The real value of Voice First Banking will be when the interactions include a little more complexity, like ‘can I afford to purchase a dinner at Maxwell’s for 2 this week?’ as opposed to ‘how much is in my account?’ With the vast majority of consumers having banking relationships spanning a decade or longer, the integration of voice, long-term transactional analysis, geolocation, and current contextual learnings combined with preferences and behaviors outside of banking over time, is where the power of AI and Voice Commerce become really exciting.
According to , founder and CEO of and author of the new best-selling book, , “Regardless, the forces driving us to a personal AI and voice interfaces are becoming increasingly clear. We have lots of devices, lots of screens and soon more data than we will logically be able to process personally or collectively as humans, so it will need to be curated by algorithms that are conversational in natue. Whatever curates all that data and allows us to interact will be the personal interface to these systems. Whoever cracks this problem will have a business bigger than Facebook by the middle of next decade.”
Brian Roemmele added, “In the next 5 years, customers will demand Voice based interactions in banking. Some of the technology will be supplied by the VoiceOS. However some of this technology will be supplied by the bank. Banks are in a unique position to advance a higher degree of customer service via a Voice First interface. Voice First experiences can drive a level of delight with customers because they can access information simply by asking.” He added a cautionary note, “Banks should also lead with unique security to assure that the person accessing the Voice First system is authenticated. The unique Voice interface experience must not be made too cumbersome with the security measures, however.” He believes that being a leader in Voice First technology may be best done by partnering with startup and legacy companies.