Wikisource:WikiProject OCR

From Wikisource
Jump to: navigation, search
WikiProject OCR
Shortcut:
WS:OCR
This project is for users to request for scans to be OCRed for various Wikisource-related projects.

Contents

[edit] Instruction

The participants listed below are users who have access to some kind of OCR software and are willing to extract text from scanned documents.

Users who desire for a text to be OCRed should place their request under the Requests section with the following format:

[[Title of the book]] (year published) - Author. # of pages. [source where pages can be found]

Note: "year published" should be when it was published in the U.S. as this will make determining the copyright status easier.

While these are the general instructions for requesting that a project be scanned, other users may have more specific instructions if they are to take on a project.

[edit] Participants

[edit] Zhaladshar

[edit] Instructions

Preference given to:

  1. Smaller requests
  2. Requests where obtaining the scans is easier (such as downloading a ZIP file instead of having to access each scan and download them all individually)
  3. Works that are hard to find in text form elsewhere on the Internet
  4. Works that I do not proofread

I will only work on two large projects at a time (they are first come, first serve) and will work smaller projects in the mix as I make time for them.

[edit] Current projects

Title Year published Author Pages Source Completion
Historical Library 1814 Diodorus Siculus (trans. G. Booth) 677 < 5%

[edit] Benn Newman

[edit] Instructions

Preference given to:

  1. Smaller requests
  2. Requests where obtaining the scans is easier (such as downloading a ZIP file instead of having to access each scan and download them all individually)
  3. Works that are hard to find in text form elsewhere on the Internet
  4. Works that I have not proofread

[edit] Current projects

World Revolution


[edit] User:Inductiveload

[edit] Instructions

Preference given to:

  1. Larger or non-standard requests, or where image batch-processing or DjVu conversion is needed
  2. English requests
  3. Requests where obtaining the scans is hard (batch-downloading is my favourite bot activity)
  4. Works that are hard to find in text form elsewhere on the Internet
  5. Works that are likely to be proofread soon
  6. Large reference works which, even if not proofread soon, provide a valuable reference resource.

[edit] Current projects

[edit] Requests

[edit] Done

[edit] OCR bot

There is an automatic tool for OCRing single pages at time, which is useful for repairing text on pages where it is missing or incomplete. It is available through the editing toolbar in the Page: namespace. It is accessed by clicking the Button ocr.png button. The edit box will go grey while the server processes the image and the OCR will appear in the edit box within a few seconds (larger pages with more text take longer). You can check the status at http://toolserver.org/~phe/ocr.php. A further feature of the tool is that the next page is automatically OCR'd when one page is retrieved, so the next page's text should be ready by the time you edit the next page.

[edit] See also

Personal tools
Variants
Actions
Navigation
Toolbox
Print/export
In other languages