Help:Beginner's guide to proofreading

From Wikisource
Jump to navigation Jump to search

How to proofread a book for Wikisource.

Proofreading is the foundation of Wikisource, providing the best quality texts in our library. The process involves two "namespaces" (sections of Wikisource; included at the start of the page title) and a special piece of software. Both together, these two namespaces (Index and Page) are sometimes called the "workspace". This is where the proofreading, editing and other "back room" processes are done.

The process is based on page scans of a physical book, usually in the form of a DjVu file. This is used to make an Index page, which is a page in the "Index" namespace with the same name as the DjVu file. Each individual page in the book is a separate page in the "Page" namespace. The Index page will link to the pages and each page needs to be proofread.

The following guide will explain how to proofread a page, with pointers to other pages with more detailed information. For a guide to the Index page portion of proofreading, see Help:Beginner's guide to Index: files.

How to proofread a page[edit]

Note: To get an idea about how this process works, it is a good idea to try a few pages of the current Proofread of the Month.

Proofreading is based around the Index page and all of the connected Page-namespace pages.

  1. If you click on any of the numbers on an Index page, you will see an image of that page side-by-side with a text field. The text field may be blank or it might have been automatically filled with the text of that page.
    • If it is blank: write the text you see in the image into the text field.
    • If it is not blank: correct the text in the text field so that it matches the text in the image.
  2. Preview your work, set the status to "Proofread" (which is yellow), then save. — see Help:Proofreading and Help:Page status for more information.
    • If you have not finished proofreading the page but you want to save it, set the status to "Not proofread" (which is red).
  3. Repeat the last two steps for every page in the scan.

The side by side layout[edit]

Screenshot from the Page namespace, showing the text field side-by-side with the scanned page image.
(Fig 1) Side-by-side layout in Page namespace

When you view a page in the Page namespace, the screen will be split into two sections (fig 1). This is the default side-by-side layout that allows users to proofread the text on Wikisource (left section) against the scanned text (right section). When you click Show Preview on a page in the Page namespace, the screen will then have three sections (fig 2)where?. The text edit window and the scanned text section remain as they are, with the previewed text showing in an area above the other two sections.


To proofread a page, you should edit the text in the left section so that it matches the scan in the right section as much as possible.

You do not have to make an identical, photographic copy of the scan. Wikisource is a website, not a book, and the text is more important than the typography. You should just try to get as close as possible. Some things work in books but do not work on Wikisource. For example, columns of text are not necessary and do not work well on Wikisource; they should be ignored during proofreading. Remember that several pages will be added together in the main namespace when proofreading is finished. Things like columns will not be readable.

Page status[edit]

Screenshot from the Page namespace, showing the page status radio buttons.
(Fig 3) Page status buttons

When you save the page, you should also set the page status. You should see a row of color-coded radio buttons just above the save button (fig 3). If you have just started a page with no (or not many) changes, then select the red button (for "Not proofread"). If you have completely proofread the page and corrected every error you can find, then select the yellow button (for "Proofread").

Some pages will have been proofread already by other people. You can check these and upgrade the page status. Look through the page for any remaining errors or things that need to be changed. If there are no errors, or you have fixed everything that needs to be fixed, increase the page status by one level. "Not proofread" (red) pages become "Proofread" (yellow), which become "Validated" (green). Validated pages are finished and should not need any more editing. Blank pages (gray) and Problematic pages (blue) are special cases; see below for more information.

Header and footer[edit]

Blank pages[edit]

Blank pages can be left blank and set to the "No text" (gray) page status. These pages will be ignored when pages are added to the main namespace.

This includes book covers, unless illustrated. This does not include pages with an illustration, which should be proofread as normal. If the illustration is unavailable at present, see Problematic pages.

Problematic pages[edit]

If you have a problem while proofreading a page and cannot finish it, you can set the page status to "Problematic" (blue). This will alert other people that a problem exists, which they may be able to solve.

Common problems include pages with illustrations (if no image file is available), pages with equations, pages with foreign text (especially text that does not use the Latin alphabet) and pages with special formatting. In some of these cases, special templates exist to identify the problem (see Problem templates, below). These are useful to anyone else looking at the page and they can attract the attention of people able to fix the problem.


Do include[edit]

  • Text formatting, such as bold or italics: using '''bold''' or ''italics''.
  • Different text sizes, using {{smaller}} or {{larger}}
  • Special typography, such as:
    • Dropped or raised initials
    • Capitalisation. If the capital letters are the same size as the normal text, use {{smallcaps}}
    • Horizontal lines: {{rule}}
    • Section breaks (rows of asterisks: * * * * * )

Do not include[edit]

  • Any marks or additions—including handwriting, ex libris bookplates, library stamps, stains, scratches, watermarks, dirt, etc.—that are not part of the original book.
  • Columns are not necessary. The text columns should just continue from the previous column on the page
  • Do not correct spellings. Use the template {{SIC}} instead.


  • Line breaks. Webpages will normally ignore single linebreaks, so text broken into different lines (common with scanned text) will be seen normally by a reader. Line breaks can cause problems (especially with templates, links and tables, and italics/bold which are closed by the line ending) but removing them is a matter for the individual proofreader.
For example
Original "Hello," said the example. This is
an example of a broken line.
Corrected "Hello," said the example. This is an example of a broken line.
  • Pages that are not part of the work itself, such as adverts, do not need to be proofread or included in the main version. On the other hand, if a proofreader wants to proofread and include these pages, that is allowed.
  • Advanced typography. Creating a page that looks like the original is nice. However, the text itself is more important. Some typography can be difficult to produce. Some can cause problems with the website.

Common OCR errors[edit]

Optical Character Recognition (OCR) is the function used by computers to read text. This is often saved within DjVu files and is extracted by the computer when a new page is started in proofreading. However, computers are not very good at reading printed text and errors (sometimes called "scanos") can be quite frequent. This table shows some common errors made by computers that will need to be found and corrected during proofreading.

For example
OCR error Correction
tlie the
a11, aH, aU all
au an
\vas was
mc me

Other common things to correct[edit]

  • Paragraph breaks. A blank line should be left between paragraphs, as standard for electronic and internet formatting.
  • Spaces before punctuation should be removed (when the mistake is due to the OCR, and not in the original text)
For example
OCR error foo bar ; lorem ipsum
Corrected foo bar; lorem ipsum
The space before the semicolon has been removed.


There are some templates that can be necessary when proofreading a page.

Proofreading templates[edit]

Problem templates[edit]

These should be used if there is a problem that you cannot fix yourself. When using one of these, also set the progress to "problematic" (blue).

Template Used where..
{{missing image}} image should be included.
{{missing table}} ..a table should be included.
{{missing score}} ..a musical score should be included.
{{missing math formula}} ..a mathematical formula should be included.
{{illegible}} ..the text cannot be read.
{{arabic missing}} ..Arabic characters are used.*
{{chinese missing}} ..Chinese characters are used.*
{{greek missing}} ..Greek characters are used.*
{{hebrew missing}} ..Hebrew characters are used.*
{{russian missing}} ..Russian characters are used.*
{{symbol missing}} ..unknown symbols are used.
* Where you cannot read or write in these languages.