Help:Side by side image view for proofreading

From Wikisource

(Redirected from H:SIDE)
Jump to: navigation, search
Side by side image view for proofreading
This help page is an introduction to proofreading in Wikisource. If you are looking for info about the Proofread Page extension, check the links in the "See also" section.
Shortcut:
H:Side

Contents

[edit] Introduction

Proofreading at Wikisource is reserved for users with Userids: IP addresses cannot proofread. (This permits us to guarantee that at least two different sets of eyes have reviewed a document.)

Proofreaders are also expected to correct errors they observe (which can be quite extensive for some OCR'd documents). This can involve quite complex template and markup usage, so proofreaders should be quite proficient at editing in a wiki environment.

Novice proofreaders can experiment to gain quick insight into the concept, and test their abilities with these simple introductory tests on Distributed Proofreading's website, and are then encouraged to find a project in progress, perhaps through Wikisource:Proofread of the Month.

[edit] Overview of the process

Once you've found a project you want to work on, you'll want to go to the index page. In it you'll find links to many pages for the project, colored by their status. After selecting a page that needs work (not green), you'll go into the page, open up the editor, and make whatever changes (either to the document or the status) are appropriate, preview & save.

[edit] Side-by-Side advantages

Wikisource uses the ProofreadPage extension, which allows you to render text along with its corresponding scanned image (DjVu). It is intended to allow easy comparison of text to the original. It has the following advantages :

  • credibility : it makes it possible for Wikisource to guarantee that the text corresponds to its scanned source.
  • improved collaboration : texts can be proofread and typos can be fixed by everyone, not just those who have access to the book. This restores the wiki way of collaborating.
  • security : text is better protected against vandalism (any falsification can be detected immediately; texts are not accessed directly, but through transclusion, which deters unexperienced vandals)
  • no limitations on rendering : a book can be rendered in two different ways, without duplicating data :
  1. as a set of pages. Each page is a column of OCR text beside a column of scanned image. This mode is meant for contributors.
  2. broken into its logical organization (such as chapters or poems) using transclusion. This mode is meant for readers.
  • fairness of comparisons : Since book pages are not in the 'main' namespace, they are not included in the statistical count of text units. A count of pages is available here. This method of comparison uses the same unit of measure for all texts (the page), which puts an end to the temptation of slicing texts into arbitrarily small units in order to increase statistics.


[edit] Looking at documents

[edit] Page, Index and main Namespaces

  • "Namespaces" are used to differentiate page types at Wikisource. For example, the page you are reading now is in the "Help:" namespace.
The Index namespace, holds indexes for projects. They tie all the pages of a text together. A sample index can be found at Index:Equitation.djvu. Basically, an index page is a chained list of all the Pages of a book. It is used by the ProofreadPage extension to build navigation buttons:
previous page 1leftarrow.png, next page 1rightarrow.png and up to index 1uparrow.png.
Index pages on this wiki are listed in Category:Index.


  • The "Page:" namespace is used to display page text side-by-side with individual page images, and allows reproduction of the original formatting. A sample page can be found at Page:Equitation.djvu/1. You can zoom in on the page image by clicking on the image in the right-hand pane.
Text corresponding to a scanned image belongs on a new "Page:" namespace page whose name matches the image. For example, the page "Page:The Time Machine (page 1).jpg" will automatically display the image "Image:The_Time_Machine_(page_1).jpg" on the right. Page numbers for DjVu files are indicated by adding a slash followed by the page number to the file name. For example, Page:Sketch of Connecticut, Forty Years Since.djvu/27 displays the 27th page of Sketch of Connecticut, Forty Years Since.
Important: Although it is desirable to reproduce the layout of the page in the Page: namespace, formatting instructions which apply to the entire page must be placed above the body text in the header box. This box can be opened by clicking the plus sign in the upper left corner. The header and footer are automatically placed inside <noinclude> tags which prevent transclusion of their contents. For example, see Page:The Journal of Leo Tolstoy.djvu/7.
When you are looking at a page in Page space, you can get to its index by clicking on the 1uparrow.png tab on the page.


  • The main namespace is where readers go to see the finished product. We put various directives into the source code for the book to make it display differently in Page namespace, vs main namespace; for example we hide the original's per-page page numbers, allowing for more sensible numbering when the reader prints a copy.


[edit] Editing pages

Note: We recommend that editors/proofreaders do not use the enhanced editor toolbar available from Special:Preferences due its lack of requisite functional buttons in this namespace.

[edit] Formatting conventions

The following conventions are considered best practices for pages in the Page: namespace (DjVu files and other files which use the ProofreadPage extension). For general article formatting conventions and guidelines see Wikisource:Style guide.

  • When a chapter heading and text appear on a DjVu page, place the heading in the "Header (noinclude)" section which is accessed by clicking the Button category plus.png which appears above the edit window while in editing mode. Example Page:Wind in the Willows (1913).djvu/19.
  • Quite frequently, a book's header will include the page number at the left or right of the page and the title in the center of the page. The {{RunningHeader}} template is useful for formatting these headers. The RunningHeader template is used as follows:
{{RunningHeader|left|center|right}}
  • left is the number (or text) which should appear at the left of the page;
  • center is the number or text which should appear centered;
  • right is the number or text which should appear at the right of the page;
  • The left, center, or right paramaters may be left blank as appropriate.
  • To make notes which are only visible to other users while editing the page in the Page: namespace, use <!--HIDDEN COMMENT HERE-->    Keep in mind that this hidden text will only be visible to others if they attempt to edit the page. It will often be better to make a note on the Index discussion page or post a question in the Scriptorium, rather than making hidden comments.
  • Many users proofread in edit mode, using newlines to make the source match the lines of the original. If you do this, you should also check the generated version for issues difficult to spot in edit mode, such as missing italics, etc.
  • When a word is hyphenated onto two different pages of the DjVu scans, use {{Hyphenated word start}} and {{Hyphenated word end}}. These templates will make the word appear hyphenated in the Page: namespace and remove the hyphen when the text is transcluded. Example: (Page:Personal Recollections of Joan of Arc.djvu/473 and Page:Personal Recollections of Joan of Arc.djvu/474, Personal Recollections of Joan of Arc/Book III/Chapter 14 "pretending")
{{Hyphenated word start|FIRST HALF OF WORD|WHOLE WORD}}
{{Hyphenated word end|LAST HALF OF WORD|WHOLE WORD}}
  • If the first word of a new DjVu page starts a new paragraph in the same chapter, add {{blank line}} at the beginning of the page, to force a break in the text. Otherwise, when the pages are transcluded the separation will be treated as single space rather than a new line. Example: (Page:Personal Recollections of Joan of Arc.djvu/463 and Personal Recollections of Joan of Arc/Book III/Chapter 13)
  • If you need to indent the initial line of a paragraph, use the{{gap}} template.
  • If you need to indicate a word/phrase should be in small caps, use the {{sc}} template.
  • If you need to indicate a word/phrase should be in small smaller (or larger) font use the {{smaller}} template (or its relatives).
  • Using standard templates instead of other types of markup, gives Wikisource protection from undesirable external changes. eg.{{right}} instead of p align="right" More templates and information about them can be found at [[ Typography templates and [[Category:Formatting templates]].
  • Published notes in the scanned print version of the DjVu book that should be displayed in the wikisource version should be displayed using the <ref></ref> and {{reflist}} mark up. If the note is carried over on more than one print page, it should all be entered on the the page where the reference is made in the print version. Example: (Personal Recollections of Joan of Arc/Book III/Chapter 11, Page:Personal Recollections of Joan of Arc.djvu/451 and Page:Personal Recollections of Joan of Arc.djvu/452)
  • If you have made a lot of changes to a page, you should seriously consider leaving the status of the page unchanged when you save your edits. (You'll still be recorded in the history, as having contributed.) If everyone could see their own mistakes (or were perfect!), there'd be no need for independent review. There is no status review board, just your fingers, so be careful that what you're doing is right.

[edit] Statuses

The status of a page is reflected both in the color of its block on the index page, and by the banner on the page. The ProofreadPage Extension is used to implement the status.


The validation path of the ProofreadPage extension involves five levels:

Without text
empty page Not proofread Proofread Validated
Problematic


The first three are the normal pathway :

  • Not proofread is the default value. (See all pages.)
  • Proofread means proofread by one contributor. (See all pages.)
  • Validated means proofread by two contributors. The corresponding button is available only if the page has been already proofread by someone else. (See all pages.)

In addition,

  • Without text is for blank pages, or other pages that do not require double proofreading. (See all pages.)
  • Problematic indicates a problem that needs further discussion between contributors. (See all pages.)


You will find the buttons, to indicate what you have done, under the edit window, like this if a previous contributor has proofread the page already:


Five buttons


or like this if you are the only proofreader of this page and it must be validated by another contributor:


Four buttons






[edit] Transclusion

After the text of the work is populated into each side-by-side image page, "transclusion" is used to display the text from the Page: namespace on pages in the main namespace. Transclusion displays the page of another text without having to copy or manually update it later. The purpose of transcluding the text is to group it into logical, reasonably sized chunks— most frequently chapters or sections.

A completed example is the Wind in the Willows. Once all of the individual pages of Index:Wind_in_the_Willows_(1913).djvu were typed up in the "Page:" namespace (one page can be seen at Page:Wind_in_the_Willows_(1913).djvu/19), this text and the following pages of the first chapter were populated into the various chapters of the book in the main namespace beginning at The Wind in the Willows/Chapter 1. The following explains how to use transclusion to display your finished proofreading project as a final product in the Wikisource main namespace.

[edit] Full-page transclusion

There are two methods which can perform full-page transclusion. The <pages> function which can display a series of pages, and the {{Page}} template which can display individual pages or sections of pages. Most of the time, the <pages> tag will be the best method of transcluding into chapters. The syntax is as follows:

  • <pages index="file_name.djvu" from=20 to=40/>
  • from is the beginning page;
  • to is the ending page.
  • This does not currently work for tables. This method of transclusion is very new. If you experience problems, post a question on the Scriptorium.


The Page template can transclude a single page. The syntax is as follows:

  • {{Page|Wind in the Willows (1913).djvu/19|num=3}}
  • The page number is placed after a slash (/) following the file name.
  • num defines the physical page number, as numbered in the scanned text.

To transclude many pages, use the <pages/> command. The syntax is as follows: <pages index="Foo.djvu" from=a to=b />

[edit] Partial transclusion

If two logical sections (for example the end of one chapter followed by the beginning of another) appear the same page, it's necessary to transclude only the relevant part of the page.

This is accomplished using Labeled Section Transclusion (LST). The relevant parts of the page are marked with section tags and then when transcluding, only the relevant part of the page, rather than the entire page, are called.

To mark sections in the "Page:" namespace, insert the following syntax into the typed proofreading text to label the end of Part I and the start of Part II (where both are on the same page):

  • <section begin=chapter1 />This is Chapter 1.<section end=chapter1 />
  • <section begin=chapter2 />This is Chapter 2.<section end=chapter2 />

To accomplish this, the {{Page}} template is used, as follows, replacing "djvu" with the djvu file name and "#" with the page number where the text appears:

  • {{Page|djvu/#|section=chapter1}}
  • {{Page|djvu/#|section=chapter2}}

Alternatively, the following format can be used, replacing "article" with the article name (excluding namespace):

  • {{#section:Page:article|chapter1}}
  • {{#section:Page:article|chapter2}}

[edit] Combination multipage and partial transclusion

To add multiple pages, where the first and/or last page may require partial transclusion, the coding allows for additional parameters for use with <pages>. The extra parameters are both optional.

  • <pages index="file name.djvu" from=20 to=40 fromsection="choice A" tosection="choice B" />

[edit] Page with image transclusion

The preferred means of placing a picture or other image in the main pagespace is to upload the cropped image to Wikimedia Commons as a separate picture, for example as a jpeg file. A temporary means of displaying an image is available for users that are not able to do this. To display an image of a page in the DjVu file like at Page:Personal Recollections of Joan of Arc.djvu/9, use:

  • {{use page image|caption=JOAN'S VISION}}

The page image can be displayed in the book's Wikisource mainspace like at Personal Recollections of Joan of Arc/Book I/Chapter 2, using:

  • [[Image:Personal_Recollections_of_Joan_of_Arc.djvu|page=27|right|thumbnail|200px|THE FAIRY TREE]]

[edit] Issues

The poem tag does not work well because it adds a carriage return at the end of a block. It's also not possible to use <pre> formatting, since the line breaks are suppressed during transclusion. To solve this issue, add <br /> tags to the beginning of lines.

To ease proofreading images that are rotated, the Rotate Image Firefox extension can be used.

[edit] See also