Help:Project Gutenberg

From Wikisource
Jump to navigation Jump to search
Help:Project Gutenberg

This page discusses Project Gutenberg, a venerable free-text project parallel to Wikisource. Its main purpose is to define the ways in which the two projects complement each other, and to suggest ways we can cooperate.

Please formulate the policies and suggestions on this page based upon the wide-ranging discussion at the talk page.


Importing works[edit]

Quite a few Project Gutenberg works have been imported to Wikisource by copying and pasting a chapter at a time into the main namespace. While this works, it means that the work is not backed by scans. A better option is to use the PG work as a basis for a proper Wikisource transcription. This is slower, but still probably quicker than a standard transcription (because most typos/scannos have already been caught, and the work is mostly a matter of formatting).

The process is usually something like this:

  1. Set up the Index page as usual, with a correct pagelist etc.
  2. Download the HTML version of the work from Project Gutenberg.
  3. Open the HTML version in a text editor, and strip paragraph tags (<p>…</p>). Leave most other HTML, especially <i>, <u>, etc. as these will work without modification in the wikitext version.
  4. Copy and paste from the HTML, page-by-page.

The Clean up OCR script from Wikisource:TemplateScript is very helpful for this type of work.