Google’s Knowledge Graph has been the center of much attention lately. We have been hearing a lot about another concept called the Knowledge Vault as well. But just how extensive is Google’s capability? And what about Siri and Bing/Cortana? How do they stack up? To find out, we loaded the Google App onto an iPhone (Google Now is part of the Google App), tested out Siri, and got our hands on a Windows phone so we could test Cortana, and took them all for an extended test drive.
UPDATE! (20 September 2015) With the introduction of a Siri for iOS 9, Re/code Magazine asked Stone Temple Consulting to rerun our question set to see how much Siri has improved (if at all). Read the results here.
These are the things we set out to measure in this study. To do that, we took 3086 different queries and compared them across all three platforms. These were not random queries. In fact, they were picked because we felt they were likely to trigger a knowledge panel.
In addition, this was a straight up knowledge box comparison, not a personal assistant comparison. In addition, please note that Cortana is in beta, and is promoting itself as a personal assistant. For purposes of this study, a “knowledge box” or “knowledge panel” is defined as content in the search results that attempts to directly answer a question asked in a search query. Others in the industry sometimes refer to these as “Answer Boxes”. Here is a simple example of one:
Knowledge boxes can show up in many forms, including:
- On the right rail of the search results
- As step by step instructions above the regular web search results
- As a structured snippet incorporated into the regular web search results
- In the form of a carousel above the search results
All queries in this test were done using voice commands via their respective apps), even when using Google and Bing. The reason we did this is that there are many commands in Google and Bing that behave differently when the search query is typed in, and we wanted to do a straight apples to apples comparison. The devices used were:
- Cortana running on a Nokia lumia 635 Windows Phone
- Siri running on the iPhone 4s and iPhone 5
- The Google App (of which Google Now is a part) running on the iPhone 4s and iPhone 5
You can see Stone Temple staff members Caitlin O’Connell and Justin Markuson demonstrate some basic queries in this short 3 minute video
Types of Results
Google uses many sources of data for the Knowledge Graph. Here is what Google’s Amit Singhal told us about that back in May 2012:
Google’s Knowledge Graph isn’t just rooted in public sources such as Freebase, Wikipedia and the CIA World Factbook. It’s also augmented at a much larger scale because we’re focused on comprehensive breadth and depth. It currently contains more than 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects. And it’s tuned based on what people search for, and what we find out on the web.
Note: When this study was first published, the screen shots we showed here were from their desktop search, and we have now changed that. However, ALL searches performed in the test were performed via phone and voice search as detailed above. This was simply an author error in taking the screen shots.
Not only does Google pull from many sources, but they also have many different types of ways of presenting results. Let’s look at a few interesting examples:
Not only do we get our answer, but Google offers us info on three other tall buildings. Google noticed that people who search on the height of the Eiffel Tower often want to know the height of other tall buildings. If you click on the “Burj Khalifa” link, it becomes even more interesting. Here is what you get:
The fascinating part about this result is that it has dramatically expanded the number of options with regard to other famous buildings, by presenting us a carousel (the common industry name for the strip of results up at the top) of results. I don’t get that result if I simply search on “burj khalifa height.” Instead, I get a much simpler variation as follows:
As you see here, the results are quite different. The version with the carousel reflects the fact that I clicked on the link for Burj Khalifa when viewing the Eiffel Tower result. As soon as Google saw I was interested in more than one building, they gave me an even larger set. They could potentially figure out which buildings to show by seeing which queries typically follow your current query.
I.e., of all the people who search “how tall is the Eiffel Tower,” how many of them then search on some other building? Chances are, the most popular follow on queries relate to Burj Khalifa, the Empire State Building, and the Statue of Liberty, which is why these are shown in the Eiffel Tower result above. However, it seems that not many people do follow on queries for other buildings after searching on “burj khalifa height.”
It’s important to note that this is speculation, and it could also be that Google is simply testing different variants to see what works best. But in the long run, you can anticipate that a combination of statistics, testing, and UI design significantly increase the variety of possible search results that you might see, based on the order in which you perform the searches.
You can also see other types of results. Some of these are extracted from third party web sites, such as this one:
This one is drawn from Wikipedia, but Google may also draw information from other web sites. One example of this is shown here:
This result is an example of what we call step-by-step instructions, where you can actually receive a full procedure in the search results. These also divide into three different types:
- All the required steps are presented in the results, so visiting the web site is not needed to get the requested info.
- Only some of the steps are provided, and therefore getting the complete process requires going to the source site.
- All of the steps are provided, but some of the steps are not completely detailed, so this still requires going to the specified site to get all the info needed by the searcher.
Another type of result is what we refer to as a “structured snippet.” These are results that look like the following:
Notice how we see a regular search result, but some information has been extracted directly from that result and shown inline. For now, these are relatively rare. They may just be something that Google is testing for the moment, and which could expand significantly in the future if Google likes the results.
One last consideration that we examined is accuracy, or whether or not Google provides the answer to the question answered. Here is one fun example of a query where Google did not answer your question shown from their desktop search when we first tried it:
Note that when we originally did the study, we saw a similar result in the phone results, but it appears that Google has fixed it, as you can see here:
At least they left the snarky part in ;->
What About Siri?
We tested all the same queries on Siri as well. It is been well known that Siri sources data from Wolfram Alpha (a knowledge-based search engine that was at one time touted as a Google killer), but our testing showed that it also pulls in results from Wikipedia, Yahoo, and Bing. Here is a sample result that pulls in data from Wolfram Alpha:
Here is a sample result using Wikipedia as a source:
In case you were wondering, I happen to like black russians ;->. Next up is a sample result from Yahoo:
Note that the question on this one was “when is the sunrise,” and the answer I get is when the sun rose this morning here in Southborough, Massachusetts. It also appears that Siri draws its image search results from Bing, as shown here:
Like Google, Siri does make mistakes too. For example, when I ask “what does a cardiologist do,” I get this answer:
As you can see, Siri provides me information on two cardiologists located near where I am, which does not relate to the question asked at all. Last, but not least, Siri provides some very entertaining results as well. Here is one of the more fun ones:
So now you know. ;->
As with Google, there are queries that respond differently when spoken (using Cortana) than when simply entered into the search box using your keyboard. For that reason, all queries tested were spoken. Here is an example of a simple direct answer query:
For a large number of the tested queries, Cortana returned YouTube videos that purport to answer the questions. We did not count these as knowledge panel results. Cortana also drew upon the Oxford dictionaries to get definition type terms, such as you can see in this result:
Cortana also appears to draw data from Wikipedia, Freebase, the New York Times, and other web site sources. Here is an example of a query that appears to be drawn from the Facebook.com web site:
You can also find some fun stuff using Cortana. Here is the answer to the classic question “what is love?”:
Here is hoping that Cortana can speed up its investigation on this matter. ;->
Some Notes on Bing vs. Cortana
It was interesting to note that in many cases Cortana would not return knowledge panels when a text-based search in Bing would. Google actually tended to do the opposite (voice search would bring up results that a regular text search would not). We did a spot check of scenarios where Cortana returned some type of knowledge result, but it did not fully answer the question to see in how many cases Bing returned a more enhanced result.
We checked a total of 234 of these, and 78 of these (33 percent) provided fully complete answers in Bing. So Bing is further along in what they are doing than what is integrated into Cortana at this point.
Detailed Study Results
The study data shown below is as of October 4, 2014. Note that the engines all make changes in the results on an ongoing basis, and we do intend to monitor these results over time. With that said, let’s get to it:
Percent of Queries Showing Some Type of Enhanced Result
This includes knowledge boxes on the right, knowledge panels in the main column, and/or structured snippets. Here is what we found:
Google Now (this was the Google App running on the iPhone) returns twice as many results as Siri and nearly three times as many results as Cortana. This is clear evidence that Google is much further down the path with this type of work than either Apple or Cortana. As noted above, Bing, using text-based search queries, returns knowledge boxes for more types of results than Cortana does at this time. [Tweet This!]
Do Enhanced Results Fully Answer the Question?
This section focused on whether or not the returned query fully addressed the question. The scoring here was harsh. If you asked “how old is the great wall of China” and the knowledge panel result showed that the Great Wall was completed in 206 BC, you got no credit. In addition, even if the first regular web search result shows the result in its description or title, you also got no credit. Keep in mind, this was a knowledge panel test.
Looking at the scores here, one might conclude that Cortana and Siri are genuinely bad based on the scores. However, please bear in mind that this was a knowledge base test. The enhanced results returned in both systems had a far higher rate of being at least somewhat helpful, and in Cortana’s case, had a high rate of improving the standard search results. But you still need to click to see what you were really looking for. [Tweet This Result!]
Here is an example of a query for which Cortana returns a result, but which does not directly answer the question:
Here is one for Siri for the phrase “who has the most patents”:
Last, but not least, here is one for Google Now that when we first tried it did not really get the job done:
Note that we saw a similar result in our phone query, but I mistakenly took a desktop for this post. It appears that Google has now fixed this problem, as shown by this phone screen shot:
Each of these shows the examples of the struggles that each vendor has in truly nailing down a definitive answer to the question. The information is potentially helpful, but the answer we requested was not included.
More Specifics on Google Now
Google presents many results without providing attribution. These are generally in the form of well established facts, such as “what is the capital of Maine?”. The split works out roughly to 75/25, as shown here: [Tweet This Result!]
Also of interest is a closer look at the step-by-step instructions. We actually found 276 examples of step-by-step instructions. One concern that many have expressed is that this might steal traffic from the publisher’s web site from which the information was taken. However, we only found 59 different scenarios where the complete instruction set was provided.
I am betting that for those other 217 web sites, being the identified authority on answering this type of query is absolutely awesome:
So there you have it. As of October 4, Google Now has a clear lead in terms of the sheer volume of queries addressed, and more complete accuracy with its queries than either Siri or Cortana. All three parties will keep investing in this type of technology, but the cold hard facts are that Google is progressing the fastest on all fronts.
Share this study!
View and share our infographic of the results
View and share our slide deck on Slideshare
View and pin the results on Pinterest
Please check out some of our other studies:
Study Credits: Thanks to Caitlin O’Connell and Justin Markuson for their hard work on this study, and to Mark Traphagen for creating the opening image.
Here is the full set of queries used in the study