Wikisource:WikiProject DNB/Walkthrough

From Wikisource
Jump to: navigation, search
Wikisource:WikiProject DNB Walkthrough
First Draft: mentions a worked example so one should be added

This project page tries to gather up the know-how that you need to participate in the project and benefit from past experience.

In a nutshell: to add an article, first create the text pages next to our scanned images of the original volume. There may or may not be machine-created text from which to start, when you first look. Then, create the article itself, by using the text you have proofread via a process called "transclusion." Finally, link your new article into the DNB structure (the table of contents, the appropriate author page, and the prior and next articles). The rest of this walkthrough describes the process in context, and in detail.

This is a ProofreadPage project: what does that mean in practical terms?[edit]

While few details of the software background are needed for most operations, the structure of the project is determined by the ProofreadPage method in use. Along with other recent Wikisource projects, the DNB WikiProject is based on proofreading of text opposite an image (technically a djvu file) of a given page of the work. There is a basic manual page about this at Help:Proofread. Since the DNB was in its first edition a work in 63 volumes, there are 63 sets of djvu files, each with its own Index page.[1] Those files live in the Page: namespace. The primary objective of the project is to produce the traditional "articles" or individual biographies, and those live in the main namespace.

What is ProofreadPage? It is a software extension to the basic MediaWiki engine. It is now in general use on Wikisource, and if you are interested in the MediaWiki side of the work, you can go to mw:Extension:Proofread Page to read more.[2] The longest DNB biographies run to more than 20 pages in the original, and in creating the corresponding articles by "transclusion" - the process of summoning up the content of one web page to be included into another - we are grateful for the "power user" features that allow this multiple transclusion to be done based simply on putting two page numbers into the template {{DNBset}}. But such matters stay behind the scenes in most of the project's work. The main thing to note is that a page of text has a status that begins as "pink", becomes "yellow" for "proofread" at the click of a radio button below the edit window, and can become "green" or validated once a second editor has agreed it is proofread.

And it is still possible to contribute to this project by creating pages with the {{DNB00}} header followed by proofread article text. For the biographies in the 1901 Supplement volumes, there are no scans posted (as of early 2010), and {{DNB01}} with proofread text is the only available method.[3]

The work of the WikiProject[edit]

The DNB was a grandiose Victorian work on an ambitious scale (and lost money); but there is a huge amount to quarry out and make more accessible, and participants are motivated in different ways. The WikiProject is very much a work in progress. We are not working on the articles in alphabetical order. The software is ideal for random access to any part of the DNB (see below for finding your way around). Instead, we are working on articles of interest to editors and readers; the articles created day by day are driven by what interests people in the project, by the needs of Wikipedia, by requests.

Note that we originally worked on articles without explicitly working directly with the physical pages (and that some articles are a legacy of those days, pre-ProofreadPage and even pre-Project). We now work by first proofreading the pages in the traditional Wikisource manner, creating one web page for each physical page of the original work. After creating the page or pages that cover an article, we then create the article by transclusion, which is implemented by a combination of markup on text pages and the use of certain templates.[4] You need no technical knowlege of the transclusion mechanism to apply it.

To create a new article, you must first ensure that its source pages are available as text pages and in good order: you do this by creating them opposite the scanned images in the standard Wikisource fashion (as described below). The bulk of the work of creating a new article is spent on editing the text pages to be a faithful reproduction of the original page.

Objectives and current state[edit]

When the project is complete, there will be two separate ways to access and reference the Wikisource DNB: as articles, or as pages, in a close replica of the 63 volumes.[5] Because the raw material from which we work is highly variable, the approach is "patchwork quilt": not only will the end result be put together by many people, the text will originate from different sources, bringing together whatever scanned and transcribed text that is useful. There is no one source of good quality from which we can draw text for this work. Rather, the aim of this WikiProject is to supply that "good quality" source for everyone; and to support it with author details on Author pages, article listings, hyperlinks in the text, as added value.

Currently, most pages are not yet proofread. Some may be missing or hopelessly garbled. In general, only pages that contribute to completed articles have been proofread.

All pages have a scanned photographic image (djvu). In a small but noticeable proportion of cases that image is useless or defective. Most pages also have a "text page"; technically the "text layer" of a djvu can be summoned up by using the "create" tab, if it not already present.[6] The text layer was created using Optical Character Recognition (OCR) software. The OCR result for our DNB scans ranges from reasonably good down to complete gibberish. The different volumes use scans from different sources (drawn from archive.org). There are scans of widely different quality used, as you will certainly notice. Upgrades across the approximately 30,000 pages gradually address the issues of missing or corrupt images and text.[7]

Orientation[edit]

Where is everything?

There are multiple namespaces involved: main, Page:, Index: and Author: are all relevant to working on articles. There are project management in the Wikisource: namespace, hundreds of special templates, and several categories that belong to the project.[8]

How do I see if the article I'm interested already exists in article space?

The safest method is to go to the volume listing, also known as volume ToC, starting from Dictionary of National Biography which links to all 63 volume listings. Some of these listings as red and blue links are complete as of early 2010 but the majority are not.[9][10]

If I'm about to work on a new article, how do I find the correct pages in page space?

Several options.[11] Determine the volume first, naturally, from Dictionary of National Biography. Volumes are not all the same length, but have around 450 pages. These days you should be able to find articles to 'bracket' yours, article A before and article B after in alphabetical order. Where articles are transcluded, you can get to the page from the article (either directly by clicking in the margin, or by editing the article, depending on transclusion method). Then edit the browser line to move to and fro. With page numbers to 'bracket', bisecting the range gets you there quite quickly.[12] Alternatively, use the "Access to scanned images" link from the volume's ToC. This in turn links to the scan index and to the first scanned page of the volume's original index.

Suppose I already know the hard-copy page numbers. How do the index numbers in page space relate to page numbers in the book?

You have to add a certain number, the 'offset', to convert the original book page numbers to the numbering of the djvu files found in the Index page for the volume. This offset varies from volume to volume, but the most common value is +6.[13]

What are 'author subpage listings'?

Author:Thompson Cooper/DNB is an example. Cooper wrote over 1400 articles for the DNB, so that placing all these contributions on Author:Thompson Cooper would get out of hand. These subpages will be created for authors responsible for 50 or more articles, moving through the volumes, and giving original page numbers.

Where can I get help if I need it?

Try on the project Talk page, User talk:Charles Matthews for text-related issues (including Greek), User talk:Billinghurst for what happens behind the scenes technically.

This is a big project - how ambitious should I feel?

There is plenty to do that is accessible and doesn't involve wrestling with very bad texts or difficult technical issues. If you encounter problems or mysteries, and don't want to give up on the particular article, you will probably save time by requesting help.

Working on an article[edit]

With a particular article in view, and having found your way to the right place in page space, what next?

(1) Determine if the scans are OK and if the OCRs are OK.

If not, try to find another source. The "Progress" subpage lists the so-called "best" scan at archive.org.[14]

(2) If there is no way forward because of corruption, add cleanup categories,[15] and request help for if the article is urgently wanted.

(3) If the scans are acceptable, continue,

Proofread and edit the OCR'd pages in pagespace. Start typing from scratch if the OCR is hopeless, but the scan is readable, for short passages. But see note before and the comment at the end of the first section. We emphasise that 'triage' makes sense in this project. There is plenty to do that is not so tough. Faced with an article you really want and is apparently very hard to do, request help. Otherwise your time is probably better spent in other ways.

(4) Use the manual of style to handle small caps, end-of-line hyphens, Greek letters, Italics, ligatures, etc. The author template for the end of the article should be added as {{DNB XYZ}}.[16]

(5) Mark up for transclusion by adding "section" tags to identify text for the step below. These are the 'section begin', 'section end' tags at the end of the Wikisource-specific line below the editing box.

(6) Check your work. This includes disambiguation issues.

Disambiguation is easy to check with a full volume listing available, otherwise you need to look back to two articles before, two articles ahead, to check disambiguation for the article itself, "previous" link and "next" link. Yes, it's a pain sometimes.

(7) For any whole proofread pages, advance the state of the page to "proofread." For this, use the radio buttons below editing box, advance to status yellow from status pink. If you are leaving part of the page not proofread, leave text advancement to whoever finishes the page.

(8) Create your article in article space: transclude the sections from the pagespace article.

If a redlink for your article exists in the ToC for the volume, use that name exactly (check it for adherence to the style manual and fix if needed, but it's probably OK.) If your article title is not in the ToC, add it now as a redlink.
One other method is to go from the author template at the article end to the author page, and add the disambiguated redlink to that page. You can create from the author page, and then check "links to". If the volume listing shows up, you're done. If not, you need to go and add or tweak a link.
The most technical but most efficient way is not to use the DNB00 header template directly, as in the "worked example", but to use DNBset (keep it in a text editor). If you are working through sequentially in a volume (which cuts down overheads) you can update this template quickly, including the previous and next links.

(9) Good practice. There are certain final steps that are appreciated.

  • Check the link from the author page exists.
  • Add categories such as Category:DNB needs qv, or wikify the q.v. links.
  • Link to Wikipedia if an article exists on the same person.
  • Add the "previous" and "next" article titles to the ToC (perhaps as redlinks) if they are not already there.

Special text issues[edit]

Look down below the editing box to find a selection of accented character and other special symbols: there are various types sorted into sets accessible where it says "Select" in the box, to the right of "Please select a category from the menu". Mac users will find a menu item, "Special characters...", under "Edit" in any application.

Some further comments may be helpful.

How to display poetry

There are numerous short pieces of verse in the DNB. Use <poem> and </poem>, which are not guaranteed to make what is contained good poetry (hah), but cancel the need for a blank line to get a new para. This allows the display of lines of verse; and indentation can be created by a leading colon. See also Help:Poetry.

A common ligature issue

Of the ligatures, the symbol æ is probably the most common in the DNB, because it is used often in Latin words. Find it from the "Ligatures and symbols" option in the window below the editing box or type: {{ae}} or {{AE}}. It is the source of many scanning errors, often being read as "se" or "œ".

Long dashes

You can look for example for — on the ligatures menu.

Superscripts

For example the old abbreviation ye occurs, and it can be created as y<sup>e</sup>.

Sums of money

It is worth noting that articles often mention sums of pounds sterling, and write for example 100l. for £100. That is italic lower-case L, so the wikitext is 100''l''.; some scans think it is the slash /. Note also the use of s. for shillings, d. for pence in sums not in whole numbers of pounds.

How do I add Greek characters?

Either of these works:

(a) Go to the menu below the editing box, and where it says "Select" go down to Greek. This pulls up a whole listing of Greek letters, with full range of accents and breathings, iota subscripts.
(b) Do it in a text editor that has symbols with the full range, paste in.

The advantage of (a) is clearly that it is convenient as you go along. In practical terms it is slightly annoying for any passage of more than a word or so, because when you add say α you flick back up to the prompt in the edit box, needing to scroll down for the next letter. So for longer passages a separate pass in a text editor is better.

What else may I meet?

There is some Hebrew text, and unusual symbols, for example the Anglo-Saxon ð and Þ. These are uncommon, though. Positions given by latitude and longitude may require the symbols °′″ for degrees, minutes and seconds of arc. These are to be found on the "ligatures and symbols" menu.

Notes[edit]

  1. Wikisource:WikiProject DNB/Progress, second column of the table, links to each Index page. The color code for the Index entry indicates the state of the page, yellow and green being proofread.
  2. There is also technical detail here, with a useful diagram of the "proofreading path" for text advancement.
  3. There are later biographies from the DNB in the public domain, for example from the 1912 edition. So far the project has hardly dealt with them.
  4. Wikisource:WikiProject DNB/Transclusion
  5. There is small percentage of material in the volumes that will not appear in the biography articles. This includes: lists of writers as front matter; various types of short "redirect" messages in the text of the books; indices at the ends of books; and some interesting history of the DNB at the beginning of volume 63.
  6. The Index page for a volume shows pages yet to be created as bare red numerals.
  7. Among the issues that is getting fixed up is the phenomenon of the text layer being shifted relative to the djvu of which it is a scan. This does still occur in some volumes.
  8. Category:DNB gives access to many of the pages, not all of which can be mentioned in a short guide.
  9. Caveat: the listings are subject to human error, with some omissions, and the title conventions are not yet consistently applied in them. In other words they are not authoritative, and no warranty can be given that every article ever created has been listed. To double-check on the existence of an article in the main namespace, use the Search box also. To double-check the existence of an article in a given volume, use the Volume Index via its link from Wikisource:WikiProject DNB/Pagefinding, which can help with spelling variants.
  10. NB this further caveat: the 1901 Supplement has not yet been integrated into the project. If you have come looking from an article listed in the DNB Epitome or Concise Dictionary of National Biography, and it is not to be found here, the chances are it is in the Supplement. Such articles are currently created without transclusion, suffix (DNB01).
  11. There is much more detail on Wikisource:WikiProject DNB/Pagefinding.
  12. We are though moving to ability to look up. Author subpage listings should have the page number in the original volume added, and knowing the "offset" you can go to the exact page directly.
  13. In some cases the conversion fails, because the offset is not consistent across the volume. This indicates a technical issue with the initial bot postings. These problems are gradually being fixed.
  14. See Wikisource:WikiProject DNB/Raw materials for much more detail. On occasion you'll need the full listings of scans. The ODNB option is good for particularly bad text.
  15. Category:Problematic DNB pages, text and/or Category:Problematic DNB pages, djvu.
  16. What do you do if the author template is red? It may need disambiguating, in which case there will be instructions on the page. In small numbers of cases, the author template may not link correctly to the author page, or the author page may not yet exist.