In a Research and Markets report released March 2017, the number of active virtual digital assistant (VDA) users worldwide was expected to increase from 390 million users in 2015 to 1.8 billion by the end of 2021. The same report said that VDA revenue should increase to $15.8 billion in 2021, up from $1.6 billion in 2015. According to an eMarketer report released in May 2017, 60.5 million people in the United States alone will use an assistant that recognizes voice commands once a month during 2017, which represents more than one quarter of U.S. smartphone users or nearly one in five Americans.
The Siri virtual digital assistant can be found in Apple products like the iPhone. 98 percent of iPhone users have at least tried the Siri VDA. The usage of VDA services such as Amazon’s Alexa, Apple’s Siri, Google Now and Microsoft Cortana was projected to grow by 23.1 percent in 2017, according to eMarketer. In 2017 it appeared as though Alexa was the dominant app as early as January, according to Business Insider.
By July 2017, Apple’s Siri had 41.4 million users, the largest user base among all VDAs, yet that number represented a decline in Siri’s user base of about 15 percent since 2016. In September, it was reported that Amazon’s Echo held 76 percent of the market of voice-enabled speakers, marking a large consumer shift towards Alexa. In response to disappointing user engagement, Apple is forced to continue to pursue innovations to its own Siri digital assistant.
Apple only sold from 30 million to 35 million iPhone X units this holiday season, about 10 million less than previous estimates for the 2017 holiday season. Through 2017, however, the iPhone still outsold the Samsung Galaxy S8 and Note 8 overall, the iPhone selling 223 million units while Samsung only sold 33 million units of its rival products. Further analysis of data involving device activations showed that Apple iPhone products represented 44 percent of all new device activations during the 2017 holiday season, beating second-place Samsung which accounted for 23 percent of new activations during that time. In predictions for 2017 smartphone usage released in late 2016, forecasters said that over 90 million U.S. smartphone users would use an iPhone representing just over 40 percent of the total 223 million smartphone users in the U.S., according to the online statistics portal Statistica.
At the release of iOS 11, in September 2017, a few changes were noted in Siri’s speech. Apple had taken steps to improve Siri’s pronunciation of words and phonemes, the sounds that make up words, and recorded speakers in different locales to more closely match regional dialects like British. In a Popular Science article about Siri upgrades, it’s reported that Siri is now able to translate English spoken words into five different languages.
One digital assistant technology developed by Apple allowing Siri to respond to whispered voice commands is disclosed by U.S. Patent Application 20170358301, titled Digital Assistant Providing Whispered Speech.
The claimed invention describes a system in which one or more programs stored in memory include instructions for receiving speech input from a user, and processors to determine whether a whispered speech response is to be provided to that user based on the speech input received. In some places, such as libraries or board meetings, the use of voice-activated digital assistants is discouraged because of the intrusion of sound, so this patent application would protect a technology that recognizes a user’s command, even when the user is whispering. The device would then respond in a similar whispered tone so as to be less distracting in quiet settings.
Of course, not all human speech is grammatically perfect and sometimes the meaning of an incomplete sentence has to be determined. Apple’s U.S. Patent No. 7478037, titled Assigning Meanings to Utterances in a Speech Recognition System, covers a machine implemented method which determines a set of possible operating contexts of a data processing system and then generates a language model for recognizing a spoken set of words received through audio input relating to each of these possible operating contexts.
In other words, the invention associates meanings with each spoken word or phrase to determine actions to be made by the device. To do this, a plurality of speech rules are generated from the user’s speech to recognize certain sequences of words or expressions. Each speech rule associates with a language model and an expression associated with the language model. This invention reduces the wait time that mobile device users experience when devices use conventional speech recognition process algorithms while providing accuracy in determining the meaning of utterances.
Speech recognition systems often have problems recognizing proper nouns, like a person’s name, which are not typically modeled in a phonetic dictionary. A technology addressing this problem is protected by Apple’s U.S. Patent No. 9721563, titled Name Recognition System.
This patent protects a process in which a machine readable non-transitory storage medium stores executable instructions which are executed by a data processing system. The execution of these instructions results in a method which stores a phonetic dictionary for speech recognition, obtains words from a user’s set of one or more databases, and receives a speech input from the user. This data processing system is responsive to speech input, and then utilizes a plurality of pronunciation guessers to determine the words; the pronunciations guessers can determine if the proper noun is the name of a location in a foreign language or a regional dialect. Additional processing produces more phonetic data derived from the speech input, and the additional phonetic data is used to form an extended phonetic dictionary unique to the user. Through processing this speech input by comparing phonemes that are detected in the speech input and compared to the phonemes within the phonetic dictionary, including the extended phonetic dictionary unique to the user, the best set of one or more possible matches is determined.
It’s interesting to see that Apple is patenting these speech recognition inventions given the implications of the U.S. Supreme Court’s 2014 ruling in Alice Corp. v. CLS Bank International. In that ruling, the Court found that patent claims involving a method for exchanging financial obligations and computer-related elements of that system were unpatentable as they were directed to an abstract idea under 35 U.S.C. § 101. Given that a system for a third-party intermediated settlement to mitigate settlement risk was found unpatentable by the Court as such a system was “a fundamental economic practice long prevalent in our system of commerce,” some of Apple’s inventions here could be of questionable patentability. After all, humans have likely been whispering to each other since they first began using speech, so a computer program for providing whispered speech could conceivably be considered unpatentable as directed to an abstract idea — depending upon the judge and how far they want to carry the undefined abstract idea test.
The Alice decision has made the patentability of software questionable, and it will be interesting to see whether Apple faces any Section 101 rejections for its whispered speech patent application. The Supreme Court did recognize in Alice that on some very basic level every invention starts with an idea. Perhaps that saves Apple’s whisper identification technology. One thing we’ve learned, however, is what is and what is not an abstract idea is quite subjective.