Page:Untangling the Web.pdf/19

From Wikisource
Jump to navigation Jump to search
This page has been proofread, but needs to be validated.

DOCID: 4046925

UNCLASSIFIED//FOR OFFICIAL USE ONLY


Introduction to Searching


Search Fundamentals

The September-October 1997 issue of IEEE Internet Computing estimated the Worldwide Web contained over 150 million pages of information. At the end of 1998, the web's size had grown to more than 500 million pages. By early 2000, the best estimates put the number over 1 billion and by mid-2000 there was a study showing that there are over 550 billion unique documents on the web.[1] Netcraft, which has been running Internet surveys since 1995, reported in its November 2006 survey that there are now more than 100 million websites. "The 100 million site milestone caps an extraordinary year in which the Internet has already added 27.4 million sites, easily topping the previous full-year growth record of 17 million from 2005. The Internet has doubled in size since May 2004, when the survey hit 50 million."[2] The major factors driving this boom are free blogging sites, small businesses, and the relative and lower cost of setting up a website. Another recent survey found:

  • The World Wide Web contains about 170 terabytes of information on its surface; in volume this is seventeen times the size of the Library of Congress print collections.
  • Instant messaging generates five billion messages a day (750GB), or 274 Terabytes a year.
  • Email generates about 400,000 terabytes of new information each year worldwide."[3]

The numbers hardly matter anymore. The enormous size of the Internet means we simply must use search tools of some sort to find information. Otherwise, we are voyagers lost on a vast uncharted ocean.


  1. Michael K. Bergman, "The Deep Web: Surfacing Hidden Value," BrightPlanet, August 2001, <http://www.brightplanet.com/technology/deepweb.asp> (14 November 2006).
  2. "November 2006 Web Server Survey," Netcraft.com, 1 November 2006, <http://news.netcraft.com/archives/2006/11/01/november 2006 web server survey.html> (15 November 2006).
  3. School of Information Management and Systems, University of California at Berkeley, "How Much Information? 2003," 27 October 2003, <http://www.sims.berkeley.edu/research/projects/how-muchinfo-2003/execsum.htm#summary> (14 November 2006) Executive Summary.
UNCLASSIFIED//FOR OFFICIAL USE ONLY
11