Page:Untangling the Web.pdf/53

From Wikisource
Jump to navigation Jump to search
This page has been proofread, but needs to be validated.

DOCID: 4046925

UNCLASSIFIED//FOR OFFICIAL USE ONLY

much, if any information because I am looking for specialized information (Mexico) within a big topic (NAFTA).

Compared to directories and metasearch services, individual search engines offer much greater flexibility and many more options for searching, not the least of which is the ability to search using boolean expressions. Search engine companies have concluded (probably rightly) that boolean searches are beyond the ken of most users, although you may find the boolean queries permitted by the best search engines are inferior to what you've used before.

One of the hottest areas of contention surrounding search engines has always been and continues to be search engine index size. I recommend you take size claims with a grain of salt. Search engine index sizes are self-reported and not validated by any objective third party. This old contest came to a head in 2005. First Yahoo claimed to have indexed over 20 billion "items" in its index. These items included "just over 19.2 billion web documents, 1.6 billion images, and over 50 million audio and video files."[1] Yahoo's claim at first appeared to mark the beginning of another competition to retain the "honor" of having the biggest search engine database, something Google had prided itself on for years. This time, however, instead of fighting back with bigger number counts on its homepage, Google dropped those numbers entirely as part of its seventh birthday celebration in September 2005. At the same time, Google announced a "newly expanded web search index that is 1,000 times the size of our original index… which makes Google more than 3 times larger than any other search engine."[2] Google did not offer any specific number but insisted it offers the most comprehensive collection of websites and documents on the Internet. Yahoo makes a similar claim. The answer? There is no one "best" search engine or site; researchers need a good toolkit of many resources when looking for rare information.

Determining search engine database size is something more akin to alchemy than arithmetic, so I suggest you take all such estimates of size with a large dose of skepticism. Besides, numbers are one thing and good search results are quite another. What good do 20 billion web documents do if not one of them provides the results you are seeking? Relevant results are the best measure of a search engine's value, but from my experience, having a larger pool in which to fish for these answers really does make it more likely that a search engine will retrieve the results users seek in the case of obscure information, which is after all the kind of information we are often seeking. Search engine size wars are almost always a good thing for researchers because it keeps the big players on their toes and


  1. Our Blog is Growing Up–And So Has [sic] Our Index," Yahoo! Search Blog, 8 August 2005, <http://www.ysearchblog.com/archives/000172.html> (15 November 2006).
  2. "We Wanted Something Special for Our Birthday," Google Blogspot, 26 September 2005, <http://googleblog.blogspot.com/2005/09/we-wanted-something-special-for-our.html> (15 November 2006).
UNCLASSIFIED//FOR OFFICIAL USE ONLY
45