Page:Crowdsourcing and Open Access.djvu/19

From Wikisource
Jump to navigation Jump to search
This page has been validated.
2010]
CROWDSOURCING AND OPEN ACCESS
609

historical texts and make them freely available online would appear to be one obvious solution.

1. Distributed Proofreaders and Project Gutenberg

Project Gutenberg is one of the oldest digital library projects, having been launched in 1971 at the University of Illinois.[1] In the decades since, founder Michael Hart and many other Project Gutenberg contributors have made tens of thousands of books available for online browsing or download in a variety of formats.[2]

In 2000, a group of Project Gutenberg contributors launched a companion Web site, Distributed Proofreaders (“DP”), with the aim of using collaborative techniques to help expand the library of texts available through Project Gutenberg.[3] Their efforts have paid handsome dividends; indeed, as of the time of this writing, a majority of all texts available through Project Gutenberg were contributed via Distributed Proofreaders.[4]

Distributed Proofreaders hosts scanned images—that is to say, pictures—of the pages of new texts that are candidates to be added to Project Gutenberg. Registered users of the site may select a text of interest to them from the listing of currently active proofreading projects. The site then displays to the user a split-screen window showing both a scanned image of the selected page of the work and the corresponding text that appears on that page (generated initially via optical character recognition (OCR) software).[5] Because the uncorrected OCR output frequently contains errors, the text in the lower portion of the split-screen display may not exactly match the


  1. Project Gutenberg Main Page, http://www.gutenberg.org (last visited Feb. 10, 2010). A 1992 essay describing the history and goals of the project, written by the project’s founder, is available at Michael Hart, Gutenberg: The History and Philosophy of Project Gutenberg, http://www.gutenberg.org/wiki/Gutenberg:The_History_and_Philosophy_of_Project_Gutenberg_by_Michael_Hart (last visited Feb. 10, 2010). A lengthier history of the project is Marie Lebert, Project Gutenberg (1971–2008), http://www.gutenberg.org/etext/27045 (last visited Feb. 10, 2010).
  2. The total exceeds 31,000 titles at the time of this writing. http://www.gutenberg.org/dirs/GUTINDEX-2010.txt (last visited Feb. 10, 2010).
  3. The home page of Distributed Proofreaders is online at http://www.pgdp.net (last visited Apr. 17, 2010). A short history of the project is available via the site’s entry in Wikipedia. See Wikipedia, Distributed Proofreaders, at http://en.wikipedia.org/wiki/Distributed_Proofreaders (last visited Apr. 17, 2010).
  4. The listing of texts that have achieved “Completed” status on Distributed Proofreaders (indicating that they have passed through all stages of the site’s multi-step proofreading process) included over 17,000 titles at the time of this writing. See DP: Complete Gold E-Texts, http://www.pgdp.net/c/list_etexts.php?x=g&sort=5 (last visited Feb. 10, 2010).
  5. See Figure 1, infra, at 31.