Page:Crowdsourcing and Open Access.djvu/20

From Wikisource
Jump to navigation Jump to search
This page has been validated.
610
SANTA CLARA COMPUTER & HIGH TECH. L.J.
[Vol. 26

page image in the upper portion. Users of the site provide the human judgment that is necessary to match the text to the scanned page image and make any necessary corrections in the text box. The Distributed Proofreaders site provides special instructions concerning how users should mark punctuation and special characters that appear in the scanned page image.[1] Proofreading at Distributed Proofreaders proceeds in multiple stages, with each stage representing progressively greater progress towards a completed text that multiple persons have verified against the scanned source images.[2] First, each document goes through at least two, and optionally three, proofreading rounds, designated “P1” through “P3” in the nomenclature of the site.[3] At the P1 round, users correct the raw OCR output to match the appearance of the corresponding scanned page image. The P2 and P3 rounds, in turn, take as their input the corrected text produced during the P1 and P2 rounds, respectively. After all the proofreading rounds have been completed, the document proceeds through two formatting rounds (“F1” and “F2”), where the goal is to check the proofread text to make sure that the visual appearance (not merely the text) mimics the scanned original. There is a final, optional, smooth reading (“SR”) round aimed at ensuring that the final digitized text has been correctly transcribed and formatted.[4]

Unlike wiki-based projects, not all users of Distributed Proofreaders may participate in each of the site’s activities. Distributed Proofreaders limits users’ eligibility to engage in various proofreading activities according to whether the user has created an account and the user’s prior history with the project.[5] Access to all the later proofreading and formatting stages is granted only via application to the site’s administrators based upon certain eligibility criteria, as follows:

  • Unregistered users of the site may participate only in SR rounds.[6]

  1. See DP: Proofreading Guidelines, at http://www.pgdp.net/c/faq/proofreading_guidelines.php (last visited Feb. 10, 2010). A pocket summary of the lengthy Proofreading Guidelines is available at http://www.pgdp.net/c/faq/proofing_summary.pdf (last visited Feb. 10, 2010).
  2. A diagram illustrating the full workflow of a Distributed Proofreaders project (including preparation and post-processing activities that occur largely “behind the scenes,” invisible to ordinary users of the site) is available at http://www.pgdp.net/c/faq/DPflow.php (last visited Feb. 10, 2010).
  3. See id.
  4. See id.
  5. See id.
  6. See id.