Page:Crowdsourcing and Open Access.djvu/30

From Wikisource
Jump to navigation Jump to search
This page has been validated.
620
SANTA CLARA COMPUTER & HIGH TECH. L .J.
[Vol. 26

far to at least demonstrate the viability of crowdsourced proofreading of legal texts, specifically works that would be unlikely to be included at Distributed Proofreaders.[1]Wikisource can also serve as a repository for legal scholarship that meets the site’s inclusion criteria—thus potentially bringing together scholarship and primary source materials in a way not presently replicated by any other open-access repository.[2]

By virtue of its design, Wikisource comports with many (although certainly not all) of Professor Ian Gallacher’s proposed design standards for open-access archives of primary legal source materials.[3] Wikisource’s collection is universally accessible worldwide. It can be presented in a variety of formats (or downloaded freely and further processed to meet a user’s specific presentation needs), and its contents are open to indexing by Google or other standard search engines. The output format of any work hosted on Wikisource is an XHTML web page, an open vendor-neutral format that nevertheless enables preservation of a great deal of the original work’s formatting. [4] The Wikimedia Foundation’s globally distributed server architecture yields adequate response speeds in ordinary use. The site offers permanence in the form of downloadable snapshots of the full database as it existed at various points in time; if


    http://en.wikisource.org/wiki/Index:United_States_Statutes_at_Large_Volume_1.djvu (last visited Apr. 17, 2010). Clicking the volume links for any of the other scanned volumes in the Statutes at Large (all of which are linked from the page for Volume 1) will reveal the overwhelming predominance of page links that appear against a red background, signifying “not proofread.” See supra note 126.

  1. In addition to the sheer size of the dataset (the Statutes at Large scans alone presently account for over 20% of all the scanned pages available at Wikisource), the process of proofreading and correction is doubtless slowed by (1) the complex, multi-column page format employed in the original work; and (2) the poor quality of the raw OCR output from the software employed to date, which necessitates substantial human effort to proofread and correct a single page. There is nothing inevitable or irremediable about either of these problems; more technologically skilled users of the site may, in time, identify common OCR errors that may be auto-corrected across many pages at once using search-and-replace scripts, or may apply improved OCR software to the stored page images to yield a better baseline text that may be proofread more rapidly.
  2. See Timothy K. Armstrong, Fair Circumvention, 74 Brook. L. Rev. 1 (2008), http://en.wikisource.org/wiki/Fair_Circumvention. In the version of the article online at Wikisource, many citations to key statutes or cases appear as clickable hyperlinks that take the user directly to the work referenced by the citation. Links to explanatory content available on Wikipedia or other WMF wikis also appear throughout the document.
  3. See supra note 41 and accompanying text.
  4. When viewing any page within Wikisource (or any of the other Wikimedia Foundation wikis, such as Wikipedia), using the “View Page Source” function within one’s web browser will indicate, in the <DOCTYPE> declaration on the first line of the page source, the type of document being viewed.