On March 8, 2017, I published the results of our study to see which was smarter, the Google Assistant on Google Home or Alexa on the Amazon Echo. As it happens, we tested Bing’s Cortana, and Apple’s Siri all at the same time. We also tested Google search for reference purposes, since it’s historically been the leading place to get direct answers for questions via its Knowledge Panel and Featured Snippets.
Structure of the Test
We collected a set of 5,000 different questions about everyday factual knowledge that we wanted to ask each personal assistant and Google search. This is the same set of queries we used in the Google Home vs. Amazon Echo test linked to above. We asked each of the contestants the same 5,000 questions, and noted many different possible categories of answers, including:
- If the assistant answered verbally
- Whether an answer was received from a database (like the Knowledge Graph)
- If an answer was sourced from a third-party source (“According to Wikipedia …”)
- How often the assistant did not understand the query
- When the device tried to respond to the query, but simply got it wrong
All four of the personal assistants include capabilities to help take actions on your behalf (such as booking a reservation at a restaurant, ordering flowers, booking a flight), and that was not something we tested in this study. Basically, we focused on testing which of them was the smartest from a knowledge perspective.
Which Personal Assistant is the Smartest?
Here are the results of our research:
|Personal Assistant||% Questions Answered||100% Complete & Correct|
|The Google Assistant on Google Home||68.1%||90.6%|
|Alexa on the Amazon Echo||20.7%||87.0%|
|Google search (for comparison purposes)||% Questions Answered||100% Complete & Correct|
Note that the 100% Complete & Correct column requires that the question be answered fully and directly, and that very few of the queries were answered by any of the personal assistants in an overtly wrong way. You can see more details on this below.Google Assistant answers the most questions, but Microsoft's Cortana is not far behind.Click To Tweet
Detailed Personal Assistant Test Results
Let’s start by comparing how often our four contestants responded verbally to answer the questions asked:
As you can see, the Google Assistant on Google Home provided verbal responses to the most questions, but that does not mean that it gave the most answers, as the phone-based services (Google search, Cortana, Siri) all have the option of responding on screen only, and in fact each of them did so a number of times. It’s also interesting to see how non-verbal Cortana is in this particular test.
Here’s a look at the number of questions each attempted to answer with something other than a “Regular Search Result”:
When the personal assistant (or Google search) responded with something other than a regular search result, how often did they respond to the question 100% correctly and completely? The data follows:
Google, Alexa & Cortana all answer ?'s correctly most of the time, but one answers far more questionsClick To Tweet
As it turns out, there are many different ways to not be 100% correct or complete:
- The query might have multiple possible answers, such as “how fast does a jaguar go.”
- Instead of ignoring a query that it does not understand, the personal assistant may choose to map the query to something it thinks of as “close” to what the user asked for.
- The assistant may have provided a partial correct response.
- The assistant may have responded with a joke.
- Or, it may simply get the answer flat out wrong.
An example of number 2 in the above list is the way that Siri responds to a query such as “awards for Louis Armstrong,” where it responds with a link to a movie about Louis Armstrong. These types of scenarios accounted for a large number of the “not 100% correct” scenarios on Siri.
One of the big differentiators is the degree to which each personal assistant supports featured snippets. Let’s take a look at the data:
Study shows Cortana is catching up to Google Assistant in Featured Snippet answers.Click To Tweet
A couple of major observations from this data:
- Cortana has become very aggressive with pushing out featured snippets. When we last ran this particular test including Cortana in 2014, it basically had zero featured snippets. Now here in 2017, it has almost as many as Google search or the Google Assistant does.
- The personal assistants that can’t leverage web crawling lag far behind in this type of answer.
Examples of Wrong Answers
With that in mind, let’s take a look at the percentage of answered questions that were simply wrong:
Note that in our test, we were not trying oddball queries with a goal of tricking the personal assistants into giving us grossly incorrect facts, so these may not be as spectacular as some of the errors shown in other articles (some of which I’ve contributed to). We also tested our non-personal assistant, Google search, and here is an example from that:
The actual question was “are cocona a fruit or a vegetable.” I checked this query personally, and saw that I could get Google search to recognize what I said (specifically, it recognized me as saying “cocona” in the query), as you can see here:
Now it turns out that cocona are actually a rare Peruvian fruit. Google search nonetheless insisted on translating the word into “coconut,” so I did not get the answer I wanted.
Now to look at our personal assistants, here’s an example of an error from Cortana:
For this one, it’s interesting that the response is for the highest paid actors. That’s likely to be frustrating to anyone who enters that query! Now, let’s look at one for Siri:
Siri does draw some of its answers from Wolfram Alpha, and this looks like it’s one of those. But, it seems to have gone off the deep end a bit here and returned something that is largely a nonsense response. Next up, here’s one from the Google Assistant on Google Home:
As you can see, the answer actually provides the location of Dominion Resources instead. Last, but not least, here’s one from Alexa on the Amazon Echo:
The answer given is for penicillin, not for the pen.
Which Personal Assistant is the Funniest?
All of the personal assistants tell jokes in response to some questions. Here’s a summary of how many we encountered in our 5,000 query test:
Siri is definitely the leader here, but I find it interesting that the Google Assistant on Google Home is quite a bit funnier than Google search. For example, “do I look fat?” with Google search simply gives me a set of web search results, yet with the Google Assistant on Google Home the answer is, “I like you the way you are.”
There are a few jokes in Google search though. For example, if I search “make me a sandwich”, it does give me a set of regular search results, but its verbal response is “Ha, make it yourself.” With Siri, if you ask “what is love?”, you get a response that varies. One time you might get “I’m not going there,” but if you ask it again you may get a different answer. Interestingly, if you ask it a third time, it seems to give you a serious answer to the question, on the off chance that this is what you actually want.
If you ask Cortana “What’s the meaning of life,” it may say “we all shine on, my friend.”With Alexa, a query like “Who is the best rapper?” will net you the answer: “Eminem. Wait! I forgot about Dre.”
Google still has the clear lead in terms of overall smarts with both Google search and the Google Assistant on Google Home. Cortana is pressing quite hard to close the gap, and has made great strides in the last three years. Alexa and Siri both face the limitation of not being able to leverage a full crawl of the web to supplement their knowledge bases. It will be interesting to see how they both address that challenge.
One major area not covered in this test is the overall connectivity of each personal assistant with other apps and services. This is an incredibly important part of rating a personal assistant as well. You can expect all four companies to be pressing hard to connect to as many quality apps and service providers as possible, as this will have a major bearing on how effective they all are.