Page:Untangling the Web.pdf/54

From Wikisource
Jump to navigation Jump to search
This page has been proofread, but needs to be validated.

DOCID: 4046925

UNCLASSIFIED//FOR OFFICIAL USE ONLY

motivates them to improve their services. This past year's competition was no exception.

Another important fact to remember is that most search engines do not index entire websites or documents. It is no longer clear exactly how much of a webpage the major search engines index. For example, Google used to only index approximately the first 100KB of HTML, and reportedly the first megabyte of PDF documents, but in October 2005, Google dramatically increased the size of its cache limit. Yahoo indexes at least the first 500KB of HTML and PDF documents. As for Microsoft files types, my experimentation with them indicates that, in most cases, Yahoo indexes virtually the entire file, even in the case of very large documents.

The following is an overview of the major search engines in terms of their features, how to use them effectively, and what makes each one distinctive. It is important to remember there is no such thing as a perfect search engine. Each one has its advantages and drawbacks. The only way to fully exploit a search engine is to take the time to learn to use it, which means you must read the instructions.

Rule Five

Read the instructions.

46
UNCLASSIFIED//FOR OFFICIAL USE ONLY