;login:
Finding Files Fast
James A. Woods
Informatics General Corporation
NASA Ames Research Center
Moffett Field, California 94035
January 15, 1983
ABSTRACT
Introduction
Locating files in a computer system, or network of systems, is a common activity. UNIX users have recourse to a variety of approaches, ranging from manipulation of cd, ls, and grep commands, to specialized programs such as U. C. Berkeley’s wherels and fleece , to the more general UNIX find. The Berkeley fleece is unfortunately restricted to home directories, and whereis is limited to ekeing out system code/documentation residing in standard places. The arbitrary
- find / -name "*< filename >*" -print
will certainly locate files when the associated directory structure cannot be recalled, but is inherently slow as it recursively descends the entire file system to mercilessly thrash about the disk. Impatience has prompted us to develop an alternative to the “seek and ye shall find” method of pathname search.
Precomputation
Why not simply build a static list of all files on the system to search with grepl Alas, a healthy system with 20000 files contains upwards of 1000 blocks of filenames, even with an abbreviated /u (vs. tusr) adopted for user home prefixes. Grep on our unloaded 30-40 block/second PDP 11/70 system demands half a minute for the scan. This is unacceptable for an oft-used command.
Incidently, it is not much of a sacrifice to be unable to reference files which are less than a day old—either the installer is likely to be contactable, or the file is not quite ready for use! Well-aged files originated by other groups, usually with different filesystem naming conventions, are the probable can- didates for search.
Compression
To speed access for the application, one might consider binary search or hashing, but these schemes do not work well for partial matching, where we are interested in portions of pathnames. Though fast, the methods do not save space, which is often at a premium. An easily implementable