Wikisource talk:WikiProject DNB

From Wikisource

Jump to: navigation, search
Wikisource: WikiProject DNB Main Talk Archives

Contents

[edit] Table of Contents formatting

Question around ToC (Moved from User talk:Arch dude) Some style questions

  • Replicating the OR entries, are we leading with preferred name, or do we have entry for both in ToC. Issues is that some of the OR get quite long. Plus if we use full, do we full wikilink or just first component.
  • Dates of life. We have used some DoL, to disambiguate, however, what guidance are we giving?

-- Billinghurst 03:53, 26 August 2008 (UTC)

I don't know. Do you have some examples? what seems to work best? The guiding principle is to preserve the look and feel of the original, but the original does not have a ToC in this sense: It's a navigational artifact that we added to replace the original's (physical page-based) navigation. Once we find a workable solution, we can put it in the "Style" section. -Arch dude 14:25, 26 August 2008 (UTC)
1) Example ToC
Look at name like Waad/Wade as an example of alternate names. Gut feel is to enter under first appearing name. It seems that more will get pretty ugly especially around order, and wikilinks. As you mentioned as long as the transcription replicates the book, that is the important item.
2) I would think guidance is give DoL only where it is needed to disambiguate, though this requires it then in the DNB template, for each entry, which is a level of complication. The complexity of wikilinks makes me hesitate to give definitive statement. Whatever is more foolproof is my preference.
Billinghurst 14:52, 26 August 2008 (UTC)
Your example is very informative. It's an index from the original DNB, not a ToC from the Wikisource project. We should (eventually) create an "article" that is as close as possible to an exact duplicate of the index, with every single character (including the page numbers) duplicated. By our own rules, we are permitted to link the entries in this article to the relevant articles in our project. This is (theoretically) completely distinct from the ToC. The ToC is a modern navigational construct: a navigational tool that we added to replace the paper navigation of the original. Since the project is still new, we can elect to abandon the ToC and replace it with a faithful reproduction of the DNB index. Alternatively, we can keep our ToC and also reproduce the index. My inclination is to abandon the ToC and use the index instead, but please remember that I am only one of the (currently) three members of this project. If we do elect to use the index as our primary navigational tool, we should first agree on the exact format of each "volume" article. -Arch dude 02:28, 27 August 2008 (UTC)
I have had a start at some trial text in [Abbadaire - Anne] and in the talk page is the example with full dates added. Note
later in the index is
Avershawe, Louis Jeremiah. see Abershaw.
While I can understand articles being word for word, I wonder whether an index page where we convert to a ToC should be an exact replicate. -- Billinghurst 06:47, 27 August 2008 (UTC)
It's clear that the "(DNB00)" is not wanted or needed in the ToC. As an initial matter, let's remove it using pipe notation.
  • [[Abershaw, Louis Jeremiah (DNB00)|Abershaw, Louis Jeremiah]]
  • [[Abershaw or Avershawe, Louis Jeremiah (1773?-1795) (DNB00)|Abershaw or Avershawe, Louis Jeremiah (1773?-1795)]]
My vote would be for the second article title to be
  • Abershaw, Louis Jeremiah (1773?-1795) (DNB00)
The article title is a Wikisource navigational construct, not a part of the original text, so we are free to choose. -Arch dude 11:41, 27 August 2008 (UTC)
Given it a try, have a look at Dictionary_of_National_Biography,_1885-1900/Vol_58_Ubaldini_-_Wakefield, specifically looking at Wadd, William (1776-1829) -- Billinghurst 16:01, 27 August 2008 (UTC)
This raises a few interesting points. First I think that if the DNB has an article, even if only a see reference, it should be a link. Second, the John Wadham situation is interesting because he is in the index, but what we have about him is embedded in another article in a way that cannot be easily isolated; perhaps the link to Nicholas Wadham there should be wikified. Third, Arthur and Felix Wakefield have identifiable paragraphs within another article. Would people consider it too much a breech of purity to add headings withing the article so that we could have "See Wakefield, William Hayward (DNB00)#Arthur Wakefield. Eclecticology 18:31, 27 August 2008 (UTC)
I think that adding headings to create anchors is in fact a "breach of purity." Fortunately, it is also unnecessary, since it is possible to add invisible anchors instead. Now I just need to remember the correct syntax... -Arch dude 00:19, 28 August 2008 (UTC)
Very spooky, I just happened to do one of those articles today, so I have gone and done as suggested.
Note that I have wl'd the component of the name after See under rather than the specific name itself. I am not wedded to that methodology. In the end, with many articles being short, I don't think that it will even be noticed. -- Billinghurst 06:53, 28 August 2008 (UTC)


1) Example ToC
Look at name like Waad/Wade as an example of alternate names. Gut feel is to enter under first appearing name. It seems that more will get pretty ugly especially around order, and wikilinks. As you mentioned as long as the transcription replicates the book, that is the important item.
2) I would think guidance is give DoL only where it is needed to disambiguate, though this requires it then in the DNB template, for each entry, which is a level of complication. The complexity of wikilinks makes me hesitate to give definitive statement. Whatever is more foolproof is my preference.
Billinghurst 14:52, 26 August 2008 (UTC)
Your example is very informative. It's an index from the original DNB, not a ToC from the Wikisource project. We should (eventually) create an "article" that is as close as possible to an exact duplicate of the index, with every single character (including the page numbers) duplicated. By our own rules, we are permitted to link the entries in this article to the relevant articles in our project. This is (theoretically) completely distinct from the ToC. The ToC is a modern navigational construct: a navigational tool that we added to replace the paper navigation of the original. Since the project is still new, we can elect to abandon the ToC and replace it with a faithful reproduction of the DNB index. Alternatively, we can keep our ToC and also reproduce the index. My inclination is to abandon the ToC and use the index instead, but please remember that I am only one of the (currently) three members of this project. If we do elect to use the index as our primary navigational tool, we should first agree on the exact format of each "volume" article. -Arch dude 02:28, 27 August 2008 (UTC)
I have had a start at some trial text in [Abbadaire - Anne] and in the talk page is the example with full dates added. Note
later in the index is
Avershawe, Louis Jeremiah. see Abershaw.
While I can understand articles being word for word, I wonder whether an index page where we convert to a ToC should be an exact replicate. -- Billinghurst 06:47, 27 August 2008 (UTC)
It's clear that the "(DNB00)" is not wanted or needed in the ToC. As an initial matter, let's remove it using pipe notation.
  • [[Abershaw, Louis Jeremiah (DNB00)|Abershaw, Louis Jeremiah]]
  • [[Abershaw or Avershawe, Louis Jeremiah (1773?-1795) (DNB00)|Abershaw or Avershawe, Louis Jeremiah (1773?-1795)]]
My vote would be for the second article title to be
  • Abershaw, Louis Jeremiah (1773?-1795) (DNB00)
The article title is a Wikisource navigational construct, not a part of the original text, so we are free to choose. -Arch dude 11:41, 27 August 2008 (UTC)

Thanks Billinghurst for joining the project. It takes getting a few heads together to sort out the questions that are being raised. Several points have been raised by both of you that I want to address.

  1. ToC vs. DNB Index. I think it's important that we are recognizing the importance of Wikisource navigational constructs. Failing to do this can make work awkward. (I've already run into this over works that have quotation marks as part of the title.) I have no complaint about including the DNB index with page numbers as an additional group of pages, but what page numbers should it show? My own hard copy is the (originally) 1921 reprint edition which combined the original 63 volumes into 21. The pages themselves were essentially duplicates of the originals; right down to the same word breaks at the beginning and end of a page. That edition, however, did include the changes from the 1904 errata volume. (See Internet Archives for this one.) With the 1921 reprint the pagination was changed so the original separately paginated volumes 4, 5 and 6 became the new continuously paginated volume 2. The indexes for the new volumes also had footnotes to take into account those additions made in the First Supplement.
  2. DoLs. I very much support using these as disambiguators, but only when necessary. I know that Wikipedia favours disambiguation by what made a person famous in life, but that is more likely to require some sort of subjective determination than DoLs. To make it easy for a person who wants to cross link articles we should not require that he read the entire linked article just to make the link; the dates used should be exactly as they appear in parentheses at the beginning of each article. (At some point we may need to deal with same individuals who have different dates in another reference work, but I think that that problem can be deferred.)
  3. Using a pipe to suppress the "DNB00" from what is seen in the ToC is just fine.
  4. Honorifics. My preference is to suppress these from our article titles. We would, of course, continue to include this material in the article itself. It is to be noted that titled people usually have a see reference at the title, and these "see" articles should be kept as such (e.g. "ALEMOOR, Lord. [See Pringle.])
  5. Alternative names. I agree with Arch dude's solution for the situation expressed in the Abershaw example, but without the dates since there is no ambiguity with some other person. We would maintain the see reference at "Avershawe". The alternative name should continue in the ToC since there will be an article, even if it is only a see reference which maintains continuity between other articles. Eclecticology 17:42, 27 August 2008 (UTC)
This looks like a consensus to me. Shall we now add it to the project page's "style" section and begin converting the ToCs? -Arch dude 00:19, 28 August 2008 (UTC)
I just found the syntax for an invisible anchor: go the the note on purity above. The syntax is {{anchor|my_hidden_anchor}}. -Arch dude 00:50, 28 August 2008 (UTC)

[edit] Disambiguation

I've drafted the disambiguation section on the project page. Let me know your comments. I'll attack some of the other topics in due course. Eclecticology 18:45, 28 August 2008 (UTC)

As part of how we can manage disambiguation, I have also (re)created this page
and have loaded the pages of the same name. There for thoughts one whether that is a suitable means to help. -- billinghurst (talk) 10:22, 28 July 2009 (UTC)

[edit] List of authors

Some guidance to where the list of authors are in the book would be useful. Probably something that is worth giving guidance in scope. Thx.

-- Billinghurst 02:49, 30 August 2008 (UTC)
What I've been to doing to rough this out is to simply take the number of pages in the volume and divide by four. This should give an idea of where to break up the list. I've been avoiding putting the break in the middle of a set of surnames, but this may not matter in the long run. The list can always be easily fine-tuned at some later stage.
With the first two vols., I have just been adding a page per column, and I have then been getting part the way through a fifth page. I then even them up, and like you, split after a set of like surnames.
I've been working directly from the pages, but that will work too. Your way insures that we add the see references, and mine does better with the cross-reference articles found in the text. Either way works, and having both operational will help one pickup what the other has missed. Eclecticology 18:34, 31 August 2008 (UTC)
I suspect that the reason for the breaks between list items is to avoid having the whole list wrap into one long text. Bulleted lists will avoid this (as would the less elegant addition of line breaks). Eclecticology 18:51, 30 August 2008 (UTC)
I must have been less than clear. I was enquiring about the source location of the authors of the individual articles, eg. A. H. M.. At this point, I just know and list them with their initials, and leave them from others to modify, if they see them. -- Billinghurst 11:12, 31 August 2008 (UTC)
Sorry if I misunderstood. I'm working from the "List of Contributors" in the first volume of the reprint edition, pp. xi-xx. It covers the original volumes plus the first supplement. If anyone anywhere has already scanned this it would make developing this dense material into a workable page of links much easier. The other option is for me to type this out the hard way. Eclecticology 18:34, 31 August 2008 (UTC)
Hey, absolutely no prob, thx for the direction. I have uploaded scans DNB contributors and will remove them in a week or so

BTW I type all that I do (my preference). Obviously I am a dinosaur in my ways. :-) So if I need to type these out, give us a yell, and I will do it when I get back later this week or next. -- Billinghurst 02:45, 1 September 2008 (UTC)

It's interesting to compare what you have scanned with my hard copy. They are essentially the same except that your Jan. 1932 printing (per the bottom of page 1) had the daggers, and they were omitted in mine from May 1942. I did find one name spelling variant. I'll try to at least work up what I have in mind for these pages in the next couple days.
Around here I too am from the age of dinosaurs! It's tedious, but at times there is no realistic other choice. It is good to know this when I'm proofreading something. The kinds of discrepancies that arise from typing material in have a different quality from those using an older uncorrected edition or from OCR errors. Eclecticology 08:47, 1 September 2008 (UTC)
I've started Dictionary of National Biography, 1885-1900/List of Contributors. Have a look and let me know what you think about it. Eclecticology 20:40, 1 September 2008 (UTC)

[edit] Proofs, authors and wikilinks

Is there some easy way to record and mark that something has been proofread, and probably that the wikilinks and author data has been attended to. I am finding with articles, when they have the [q. v.] components and want to come back to them later to do the right wikilink. -- Billinghurst 14:13, 30 August 2008 (UTC)

To be honest, I don't think there is a satisfactory answer for this. I have been clicking lately on the 75% box in the text advancement section under the summary box. This puts a message "Proofread and corrected" in the summary box and adds {{TextQuality|75%}} at the top of the edit page. The other stated magical effects are more obscure.
I might go and ask someone. -- ab
Another option is to use the template {{textinfo}} on the talk page. This would give more detail, but requires that much more work. This gets very tedious when you are working on a lot of short articles.
Neither of the above two techniques have a specific provision for documenting whether the wikilinks have been made. The ones with the [q. v.] are clearly the most significant, but there are plenty of other names that could be cross-linked. On top of that, you really need to check out whether disambiguation is required; I've even found one reference to a biography that doesn't exist.
The other problem that I'm finding is what are we proofreading to. The original 63 volume edition may not be the best. I've been using the 21 + 1 volume reprint edition which incorporated the extensive 1904 errata volume even though it did not make other substantive changes. That creates an enormous challenge. I look forward to your comments. Eclecticology 22:32, 30 August 2008 (UTC)
(IMNSHO) I am interested in the most correct data, rather than the more sentimental replication of data at all cost, whether correct or not. That is probably the genealogist in me! So ideally it is more that we need to annotate the edition to which the proofing was done, though I would be happy with a proof than none. -- Billinghurst 11:18, 31 August 2008 (UTC)
I don't disagree. Now it's a matter of finding a practical solution that is clear and does not generate excess work. !!!! Eclecticology 18:06, 31 August 2008 (UTC)

[edit] Using categories on talk pages to signify requirements

(I am obviously getting obsessive <g>) I have used {{textinfo}} at Talk:Waddilove, Robert Darley (DNB00) for a look-see. Plus I am wondering on the potential usefulness of adding categories (temporarily) to the article talk pages to signify work required, eg. __Category:DNB talk qv links__ and __Category:DNB author pending__ -- Billinghurst (talk) 16:04, 9 September 2008 (UTC)

There are plenty of challenges in these questions, not the least is the availability of manpower, and how much work does each person want to do. I even try to ask myself why I feel so disinclined to use {{textinfo}}. The principles behind that template are fine, but I can't approach it without a feeling of annoyance. At the same time I find it quite acceptable to spend a considerable amount of time in detailed proofreading, or in trying to track down basic information about authors.
I have been mulling over these questions as I have been sitting doing bits. I have added them, and for lack of anything better, I have been adding them. An old version of ShortKeys application makes the text addition reasonably simple, for me, though, I emphasis 'for me'. I think that all I can hope for is that {{textinfo}} will become more useful in time.
The "for me" is not always appreciated by some of our colleagues. I too tend toward idiosyncratic work habits. If a technique works satisfactorily for the contributing individual, externally imposed efficiencies will be counterproductive.
Manpower. Well, I think that for transcriptions that I can find that. The genealogical community has been generous in their time for the right projects. The issue with that community can be more about the wiki learning curve. I have a plan, and just doing other groundwork first. It then might become more about management.
The genealogical community is a big untapped resource for wikis. There was a time when it was sufficient to know only a small handful of markups before a person could edit effectively. Now, the abundance of templates and other technical tricks scares people away. If in the course of recruiting from that community you can convince them that all the markup they will really need can fit on a single printed page that campaign will be successful. The experienced ones among us then need to step in when the techies begin insisting on fancy proceedures.
I have a 'secret' there and it is very non-Wiki. I am going to give them a choice of doing it in-wiki or in-mailing list. If they can transcribe they can post it to a list, and they can be copied from there and pasted here. That bit is ready to go. Many genies have been on the web and utilising lists for years. -- billinghurst (talk) 06:53, 18 September 2008 (UTC)
I too have some concerns about categories on the talk pages. We want volunteers to know what needs to be done, but these categories and "textinfo" are more likely to be put where the work has already been done instead of where it's needed. They can't go on articles that don't exist. I've been putting a "#" by the pages listed on the volume pages to show that it has been proofread. Is this useful?
As I am only transcribing at this point in time, the only answer that I can give is probably. Backend solutions would seen preferable, it just seems that most of this communities vision is focused on the output, than the outcome.
Outcome requires more farsightedness than output. I'll keep thinking about this problem.
The "q.v" entries are a good places to start adding cross-references but I don't think that those will show their full value until we have a much greater proportion of articles written. Some people are mentioned in articles without a "q.v." Thomas Falconer is one mentioned in the Waddilove article. The articles in the earlier volumes hardly used them at all. I skimmed through the long articles on Queen Anne and Francis Bacon, and didn't notice any. I'll continue with anticipatory links, but there are probably many more that could be in there.
Nice feedback, I hadn't noticed. They are the navigation constructs only, so I feel happy that we can do as and when. Similarly, I have been adding them, though feel that the most important aspect is the actual text.
I think that our current thoughts are about setting up the underlying framework, so that others can plug n play as required.
We agree on both points.
One way to judge the utility of techniques is the extent to which the technique's inventor actually uses it. Keep up the good work. We do have people who have these brilliant ideas, but never use them. Eclecticology (talk) 07:16, 15 September 2008 (UTC)
Plethora of tools, just many that don't do the right job or not in the right place. I probably could go and learn to make the tools (BTDT at RootsWeb), however, on this project, I was more looking to be front end minion rather than back end semi-genius. :-)
-- Billinghurst (talk) 01:45, 18 September 2008 (UTC)
Indeed, we occasionally need to be rescued from genius. Eclecticology (talk) 06:33, 18 September 2008 (UTC)
Back to the coalface.

[edit] Further suggestion on the above

If the potential recruits had the page already started would they be inclined to add the text directly to that article? We could go ahead and start articles with the headers only and put them in Category:DNB25%, but those articles would have their headers complete, including author and Wikipedia links. Once they're trained to add the text to these articles (and if they haven't yet been intimidated by a welcome message), maybe they could progress to proofreading each other's entries, and after that cross-referencing. Eclecticology (talk) 06:39, 19 September 2008 (UTC)

Nice solution. I do think that having the framework in place (without author text and WP links) would be very useful. Ability to "type" and "edit" and "save" would simplify things. I think that we may be able to sweeten the deal.
N-ve may become that the categories don't get amended; for interim viewers it is not evident that nothing behind the link
P+ve direct links to each TO DO page by task from project page
The notion of being stuck with arbitrary percentages did cross my mind. If the current spectrum works, so would 20/40/60/80/100. Keyword categories are indeed more useful, and some of the issues that arise with shorter publications, are no longer a factor, especially sourcing issues which is a global one for the entire project. They should unambiguously specify what needs to be done instead of what has already been done. The problem with the word "proofread" is that the past and present tense are spelt identically. More than one category may be used on the same page.
May I suggest:
  • Category:DNB Add text - The article has a header and needs a text. This is roughly equivalent to the current 25% category.
  • Category:DNB Add header - May not be needed except in the occasional situation where we have a text but no header.
  • Category:DNB Verify - The text needs to be proofread - Roughly equivalent to the current 50% category
  • Category:DNB Re-verify - The text needs to be proofread by a second person to give greater quality assurance.
  • Category:DNB Link - Needs wiki-links added.
  • Category:DNB Stable - This page has been proofread by at least two people, the last of whom did not find any errors. The clearly required links to other DNB articles have been created, but optional ones may still need to be made.
  • Category:DNB See - Use this for articles that are little more than cross references to other articles.
  • Category:Needs info - This tries to address your negatives and the matter of {{textinfo}}. Perhaps it could involve a modified version of the box that could go right on the content page. This requires more thought.
Comments? Eclecticology (talk) 18:09, 19 September 2008 (UTC)
Broad agreement. DNB Link can be added or checked. {{textinfo}} probably becomes redundant with proposed framework, beyond not knowing who undertook. That said they need to amend talk page, and that is less likely to happen anyway. About the only additional is potential for a hidden(?) DNB complete to allow for quiet cross-reference ability. -- billinghurst (talk) 14:51, 20 September 2008 (UTC)

[edit] pagescans

I have started Wikisource:WikiProject DNB/pagescans by scanning this list (using the "Flip Book" feature). John Vandenberg (chat) 00:50, 24 September 2008 (UTC)

Thanks. An issue that arises is whether these are the best ones to use for proofreading. The first combined edition might be better since it incorporates the 1904 errata. Eclecticology (talk) 04:18, 25 September 2008 (UTC)
With the change in the upload limit to 100MB, all the DNB files fall into the limit, it seems opportune to get the DJVU files into Index: pages. I have started that process and you can see where I am up to with Index: pages at Wikisource:WikiProject DNB/Djvu files.
For where I am up to with download process, see billinghurst archive.org bookmarks. The list will be completed with what is available time. I am limiting myself so that I can keep track of where I am up to. -- billinghurst (talk) 12:45, 25 November 2008 (UTC)

[edit] DNB Author links

I just added a template: {{DNB contributor|x. y. x}} . I intend to add this template to each DNB contributor in the list of contributors. I also intend to add a new template for each DNB contributor. The worked example is {{DNB GCB}}.

The idea here is to simplify the creation of the last line of each bibliographical article. Whenever you add an article, just add the correct template as the last line. The template will add the right-justified initials with the proper author link. -Arch dude (talk) 01:17, 1 October 2008 (UTC)

Is it also worth having the template drop in Category:Contributors to DNB? That might make it easier in the long run to pull it, if/when that decision is made.
Not quite. The "contributors to DNB" needs to be added to the template that is used in each of the (less than 100) "Author" pages. That template is template:DNB contributor. Ecletology (I think) wrote that template.We could add the categorizatin to that template, which would allow us to remove the categorization later if someone does not like it. My template(s) will be added to each of the 50,000+ bibliobraphy pages. We could elect to amalgamate these templates with the footer template if such a template exists. -Arch dude (talk) 00:05, 2 October 2008 (UTC)

I just added a template: Template:DNB footer initials. This is an "internal" template to be used to create the footer (100 or so) templates for each author: See template:DNB GCB as an example. -Arch dude (talk) 00:05, 2 October 2008 (UTC)

I just edited all the articles to convert the author footers. -Arch dude (talk) 02:13, 5 October 2008 (UTC)

  • 100 authors? :-) It's over 600. The templates seem to work; they do keep me from needing to look up the same authors miltiple times. Eclecticology (talk) 23:26, 5 October 2008 (UTC)
I have updated the {{DNB contributor}} template so that it also transcludes Category:Contributors to DNB. I am working through filling all the existing Author pages with the template. -- billinghurst (talk) 23:32, 26 January 2009 (UTC)

[edit] Progress? and variations

I am just wondering on how we are progressing with these author templates?

Also, when I was cruising through volume 1, I see that Sidney Lee had been assigned the initials "SLL" and in the later compiled List of Contribs he is "SL". We will need to review how to handle, and watch out for others. When AD does his template, should we just redirect one to the other? -- billinghurst (talk) 11:51, 30 November 2008 (UTC)

My original intent was to add the templates gradually, as needed. After your heroic effort to add the page scans, I'm now addingth templates pre-emptively, starting with the ones listed on the "list of writers" page for vol 1 and progressing slowly forward from there. I am finished with vol 1 and most of vol 2. during this effort, I added S. L. L. as a synonum for S. L. on the consolidated contributors page, and E. W. G. as a synonym for E. G. I also added H. R. L. I did not add the two Misses Clerke, because I couls not figure out who they really are. (A. M. C. and E. M. C.) -Arch dude (talk) 12:10, 30 November 2008 (UTC)
Done through vol 5 Slow going here. I'm still finding about ten new authors per vol. -Arch dude (talk) 18:54, 30 November 2008 (UTC)
If I/we concentrate our transcriptions on vols. 1 through 8, or earlier, then that covers the templates for now. :-) billinghurst (talk)
Done through vol 8. I'll wait until the 'bot does a few more volumes, and then catch up with it. However, please don't feel constrained: if you need a template, just make it yourself ro drop me a note and I'll do it. -Arch dude (talk) 00:23, 1 December 2008 (UTC)

[edit] Variation between versions

The very useful material from Internet Archives is based on the first (63 + 3 volume) edition of the DNB. For proofreading I use my dead-tree version of the 22 volume reprint, which has incorporated the 1904 errata. This causes me some concern when the two versions differ. Thus in Brougham, Henry Peter (DNB00), what originally read

"As Lord Cleveland (Darlington) went over to the tories, Brougham felt bound in 1830 to vacate his seat for Winchelsea, and accordingly accepted the offer of the Duke of Devonshire to return him for Knaresborough."

became

"Brougham in 1830 vacated his seat for Winchelsea, the borough of the earl of Darlington (created Marquis of Cleveland in 1827), and accepted the offer of the Duke of Devonshire to return him for Knaresborough."

Or in another situation "1805" was simply changed to "1806". For one who would prefer precise reporting, these situations are maddening, and attempting to document them could take considerable time. Such attention to detail could severely limit the amount of material that could be included in Wikisource in the near future.

My own preference is to stick with the more recent version with the hope that at some point we will also add the 1904 errata. The purpose of those errata was, of course, to correct errors. Comments? Eclecticology (talk) 19:09, 6 October 2008 (UTC)

I'm very much against using any version later than the original 63-volumes published from 1885 to 1900 for this particular Wikisource. I feel that you should create another parallel Wikisource with a different title (e.g., "Dictionary of National Biography, 1904") in which you will be free to point back to the articles with (DNB00) suffixes when there are no differences, and you are free to replace the (DNB00) articles with (DNB04) articles as needed. We can also link forward to the (DNB04) (or later) articles from the "see also" section of the (DNB00) article header, since the header does not purport to exactly reflect the source. Furthermore, we can add a section to the "notes on reading the DNB" article to recommend that users who are looking for information rather than exact sources should start from the latest NB that we have: Indeed we can create a synthetic "Dictionary of National Biography" (with no date) as the front project. At an absolute minimum, if we intend to use the existing project as the synthesized DNB, then we need to do three things:
  • change the project name
  • add a note to the front matter of the project to explain that the project has mixed sources
  • explicitly provide the provenance for each individual article.
As I see it, there are at least two ways forward that can accommodate our competing goals:
  • Create multiple separate top-down sources (including a synthetic "best" source) within the existing project.
  • Fork the project.
If we cannot agree to preserve the original unmodified DNB00 source as a distinct source within the existing project, I will create a separate project for this even if I must do this, I would still happily contribute the this existing project also. However, I feel that it is better to continue to work as a single project team to capture all of the versions, each as a separate top-level source. Now that we have a worked example, I can create a each separate source superstructure in about an hour. Which sources do we need in addition to the existing 1885-1900 and 1904? -Arch dude (talk) 12:57, 7 October 2008 (UTC)

Clarity of principles. I think that we need to decide what is the desired outcome and the reason for it. Reflect on why. After that we can decide on how to get to the goal. I don't want to get into a bunfight over this or that, or have a view on on whether either is right or wrong. I simply wish to make the information available, and in the clearest and most factual means.
-- ab

So ... Principles

  • Make available, in a searchable text form, the DNB's information
  • To present historical data as accurately as possible

(add more ...) The point of difference seems to be

  • Identical reproduction of DNB00

OR

  • Reproduction to a later corrected DNBXX

I would like to consider the intent of the original authors in our project, and what would they have considered if they had the web available to them. I think that they would say correct information. The genealogist in me has concerns of propagation of incorrect data, even if it is available corrected on another page. The ongoing reproduction of incorrect data is a nightmare. So I am disinclined to multiple pages for the same person for the same publication.

Practically, the three of us are not going to get through DNB alone, and it seems that we are getting particularly fussy over first ed, or corrected ed. Others (hopefully) are going to come and type, and they may have whichever source available. I am not wishing to confuse the helper with the intricacies of this ed, that ed. Isn't it a matter of noting which ed., and if there is a difference just highlighting the difference?

A proposed solution

  • People type from whichever version they have available
  • The DNB00 version remains the base text
  • We proof against available version, if there is any differentiation from text in a later version, it is annotated in the article and the corrected (differing) part of the article presented on the same page in a classy fashion to indicate, that it is the predominant fact and where the errata occurred.

If we don't have a simple means to annotate, then can we import one? If it is going to take time, then we can use the Talk page to park the correction in Textinfo and get back to it when able to.

If we end up with articles with a major rewrite, or major difference between two versions, then how about we deal with those as they come up with the governing principles. -- billinghurst (talk) 23:06, 13 October 2008 (UTC)

Thanks for your comments. It should be pointed out that the errata volume is 300 pages long, and since the corrections for each of the 66 volumes always start on a right-hand page this results in a number of blank pages in those 300, and an average of four pages of corrections per DNB volume. Few of the corrections are more than two sentences long. Most frequently only a single word is changed in an article. The majority of articles remain completely unchanged.
I agree with using annotations or footnotes. When I began working on this project I scanned articles from my paper copies, and put them through OCR software. Now I find it much more efficient to use the the OCR version from Internet Archives. That version used the original edition, but for proofreading it is still easier to work with my on-paper reprint edition. The OCR from Internet Archives has fewer OCR errors than my scans, but it is not exempt from them. Nevertheless, since the errata volume is not straight text, but more of a tabular list of corrections, the OCR version of that is more difficult to work with.
Distinguishing between errata, OCR errors, or human typos during proofreading makes that task far more time consuming than it should be. The most practical time for acknowledging the errata may be when all else in a volume has been entered, at which time it would be much easier to go through all the errata pages for that volume. Eclecticology (talk) 04:35, 14 October 2008 (UTC)


Thanks Ec. Ouch, a lot of pages. I have never had the pleasure of seeing the paper version in any means. Even with proofreading, I still bet that we will have more errors from transcription or OCR even with checking. I hope that we can reach a practical solution to the issue that meets all our needs. :-) -- billinghurst (talk) 10:45, 14 October 2008 (UTC)
It looks as though we are making progress towards defining project goals. My personal project goal remains the same: I want a faithful reproduction in Wikisource from of the exact contents of the original 63 volumes. Fortunately, it appears that this goal is not hugely incompatible with the goal of the other two participants: we just need to decide how to meet my goal while also meeting the other goals. My primary reason for my goal is esthetic: I see it as being very simple and objective. However, there is major practical reason also, which is that I'm fairly sure that there is a large project over at Project Gutenberg to produce these articles from the original 63 volumes. If we get the correct infrastructure in place to accomodate these as a base, then we can build on the PG work once it is available to create the other desired forms for these texts. Note particularly that I do not disagree woth the other goals: I am perfectly happy to create a new front project that we will encourage the casual user to use. this front project should be called "the DNB," and its pages should be the most recent or "best" pages we have for each article. In many cases these will be the DNB00 pages, but in other cases they will be updated pages. However, we should also retain a "hidden" DNB00 project, which will be accessable to anyhone who desperately wants to see the original. Each of these pages can also forward-link to later updated pages.one nice thing about this approach is that we can dump the PG project into the "hidden" DNB00 and then update it at our leisure to produce the "updated DNB" "view" of the material. If this is acceptable, then adding an article proceeds as follows:
1. add the article to the DNB00 project in completely unmodified form
2. add the article's title to the appropriate DNB00 Vol table of contents.
3. see if there are 1904 corrections. If so, copy the article to create a DNB04 version: --e.g., copy "Harvey Smedlap (DNB00)" to "Harvey Smedlap (DNB04)"
4. add the article (or the 1904 version if it exists)to the DNB04 Vol TOC.
5. if you created a newer copy, add a pointer to the newer copy from the older copy and add a pointer to the older copy fromn the new copy.
6. repeat steps 3-5 for each newer version.
This scheme requires a new mainpage and new vol pages for each version. It's not elegant, but it's fairly straightforward. This scheme does not by itself aid users to see how an article evolves from version to version. It that is an important goal, we will need to be more creative. -Arch dude (talk) 22:35, 14 October 2008 (UTC)
That looks like a lot of work, and it may take a while for our current large horde of contributors to complete it. Look at Brereton, Thomas (1782-1832) (DNB00). In dealing with the 1831 Bristol riots the original edition referred in two places to the scene at the more intuitive name "Queen's Square", which could undoubtedly satisfy many readers who don't know anything about the local geography of Bristol. The 1904 errata, possibly reflecting the indignation of some Bristol resident, indicated that this should be corrected to "Queen Square" That was the only correction made to the article. Does this merit a whole new page just to deal with this one? I have documented the change with a footnote. Could this not be enough?
I should add that the 1904 publication was nothing more than a volume of errata. It did not include full texts of the corrected material. Instead, this corrections were incorporated when the work was first reprinted in 1911. Where more than a word was at issue additional rephrasing to make the corrections fit was required. To respect the pagination, additions to the text could not increase the number of lines on a page, and some rewording became necessary. Eclecticology (talk) 08:05, 15 October 2008 (UTC)
See also my treatment of removed material at Bradfield, Henry Joseph Steele (DNB00). Eclecticology (talk) 16:55, 16 October 2008 (UTC)

[edit] 1904 reflections of Sidney Lee

Found letter to The Times from Sidney Lee that reflects on DNB and Thompson Cooper. Interesting info. The one thing that took my notice was the number of articles (1422) written by Cooper for the DNB. This is going to make a very long page! We will need plans for how to manage that many pages. -- billinghurst (talk) 06:39, 11 October 2008 (UTC)

Yes, and Gordon Goodwin is in second place with 1178 entries. At least 26 authors have over 200 articles. (See the "Statistical Account" in the revised volume 1.) I think this is a valid concern, but not an immediate one. We can probably devise an appropriate system of sub-pages in the way that Rudyard Kipling's poems have been split off from his main author page. Eclecticology (talk) 17:30, 13 October 2008 (UTC)

[edit] List of revised text to be upload

Hi guys,

I moved the list of text to be revised and uploaded to here. Your help adding to and modifying the list would be really appreciated. Once we get the list completed, I would like to start uploading the text.

Thanks for your help in advance. --Mattwj2002 (talk) 08:00, 31 December 2008 (UTC)

[edit] Slippage

I noticed some vol. 59 scanned text a number of pages adrift from the images. Page:Dictionary of National Biography volume 59.djvu/236 has the beginning of the article "Walsingham, Edward" if you want the page image, while Page:Dictionary of National Biography volume 59.djvu/242 has the text. Possibly an offset has been made the wrong way round. For the moment I'll just work on that one article, and paste corrected text on my user page. I'm presuming this is a bot error and can be best corrected by the bot. Could someone drop me a note about this on my talk page? I'm here on wikisource often enough, but basically just to correct text here before using it on enWP. Charles Matthews (talk) 16:41, 11 February 2009 (UTC)

Thanks Charles, we will see if we can get Matt to reload the text with his bot. It probably came about when I trimmed leading pages, and didn't correspondingly trim the text. -- billinghurst (talk) 01:40, 12 February 2009 (UTC)
Right now I am really busy with work, but I'll take a look when I get a chance. --Mattwj2002 (talk) 12:43, 12 February 2009 (UTC)

Another glitch: Page:Dictionary of National Biography volume 28.djvu/222 is for p. 235 of the original. while Page:Dictionary of National Biography volume 28.djvu/223 moves to p. 238: two pages missing. Charles Matthews (talk) 15:09, 19 February 2009 (UTC)

[edit] {{DNB00}} & Category:DNB biographies

Noticed that the DNB biographies were all appearing in Special:UncategorizedPages. This is eventuating due to they not being subpages of the main. To alleviate the matter, I have created the hidden category Category:DNB biographies and embedded it into the {{DNB00}} header. It can be accessed via the Category:DNB. -- billinghurst (talk) 11:44, 20 March 2009 (UTC)

[edit] The supplements

Organizational point: there were three DNB supplements published in 1901, mainly catching up with folk who had died after the relevant volume was completed. It would make sense to handle these in parallel with the 1900 DNB: but how exactly? Charles Matthews (talk) 16:08, 6 May 2009 (UTC)

I believe that we are managing these with the later year of publication, eg. DNB01, and so on. I don't think that there was anything against extending, more focussing on something achievable(???) -- billinghurst (talk) 08:02, 7 September 2009 (UTC)

[edit] Vol 28 skips

Page:Dictionary of National Biography volume 28.djvu/149 is p. 151 of the original, while Page:Dictionary of National Biography volume 28.djvu/150 is p. 154. Charles Matthews (talk) 16:07, 17 May 2009 (UTC)

As a lash-up I have made Page:Dictionary of National Biography volume 28.djvu/148a, Page:Dictionary of National Biography volume 28.djvu/148b, and Page:Dictionary of National Biography volume 28.djvu/148c, to fill with text for the moment, since I want to work on the Michael Hudson article. I'm not up with how to add djvu's yet. As on an previous occasion, I'd be grateful to have bot assistance with sorting this all out. (Shouldn't be hard, given that vol 28 is untouched so far). Come to think of it, there is plenty to discuss about improving the posted text, also. Charles Matthews (talk) 16:20, 17 May 2009 (UTC)
My understanding is that we would need to pull it down, add the new pages, and then reload. Probably only want to do it once, so we should check first for illegible and other missing pages so we can insert these and do it once. -- billinghurst (talk) 08:05, 7 September 2009 (UTC)

I'm not treating these matters as being of urgency: the work of creating articles can go on, and "copying back" of text is essentially quick and trivial. What I am doing is to log all the glitches on my userpage. Logically there should be a project page that does that, though. And since this is going to be with us for a while, discussion on its talk page. Hey-ho. Yes, before implementing radical change, we should at least go through the volume seeing exactly where all the problems are. And I think that suggests priority for a "mapping subproject", identifying:

  • progress with adding text;
  • images so corrupt as to be useless and in need of replacement;
  • calculation of the "offset" (image number minus page number) which should be constant if there are no glitches;
  • progress with the master volume lists;
  • progress with adding titles to author listings.

Ouch. Plenty of work. But we really need some of these project management tools to open up areas of work. I have to say that adding decent text where possible is really my priority: once there is text on the page, it starts to show up on search engines, and I find that pretty useful anyway even before any proofing. Adding text can be done by bot, but (returning to the topic) I'm not sure why these bot glitches were there in the first place. Charles Matthews (talk) 19:54, 7 September 2009 (UTC)

Yes check.svg Done and resolved. Hopefully vol. 28 is all very special now. -- billinghurst (talk) 01:49, 27 September 2009 (UTC)

[edit] Titling

What is said so far on the project page doesn't give a complete style manual. (And I think the text conflicts with the example: John Holt (d.1415) has no trailing space after "d.") There seems to be disgreement, or at least a lack of clarity, on the disambiguation. And possibly format issues. Agreed we give dates to disambiguate. The "name" part though is not agreed to be an initial segment of the name as given at the start of the article, because "Newton, Sir Isaac" is (as I understand it) to become "Newton, Isaac". And then in the case of titles of nobility, Talbot, George, sixth Earl of Shrewsbury (DNB00), there are other George Talbots. It is arguably the right prompt to the reader to leave the title there. It is possible to disambiguate by dates here, but is being said that the dates should override the title? (There is also the point that there is some latitude in using upper case where the DNB has caps, as shown by this example.)

Two points about this all:

(a) getting a Manual of Style worked out now when progress is 1% to 2% makes sense rather than waiting longer to discuss this all; (b) this may be vain talk, but I'm really not happy with the approach as initiated - I'm not a fan of inverted names, which double searching effort, at the best of times - and I can think of ways that would be more helpful to readers searching (which can be implemented perhaps by redirects, but in any case depend what we think we are aiming at).

The basic indication of the "name" of an article would be how the volume index names it. Perhaps we can get in from that end. Charles Matthews (talk) 12:52, 20 June 2009 (UTC)

So as can be seen at Page:Dictionary of National Biography volume 55.djvu/493, the volume index gives "Talbot, George, sixth Earl of Shrewsbury (1528?-1590)". We could call that the "official title". To meet several of my issues, we could agree the following: the header template to have a field which is to hold the uninverted form of the "official title". Charles Matthews (talk) 13:02, 20 June 2009 (UTC)

[edit] Milestone

User:Magnus Manske "left" a tool for the project on my usertalk: just so everyone knows, it's here. The headline statistic is the project has passed 1000 articles created. A bit more than a drop in the ocean, really. Charles Matthews (talk) 20:31, 21 August 2009 (UTC)

bonzer billinghurst (talk) 08:00, 7 September 2009 (UTC)

[edit] ODNB comparison

Quite an interesting view from an insider: [1]. Charles Matthews (talk) 06:47, 7 September 2009 (UTC)

Yeah. Different organisation, same problems. -- billinghurst (talk) 07:59, 7 September 2009 (UTC)

[edit] Our vol. 28 is not a good copy

Howard - Inglethorpe is in very poor condition. Pages missing, and the OCR of the pages fails miserably. :-( -- billinghurst (talk) 13:09, 19 September 2009 (UTC)

Looking at archive.org, there is
Yes check.svg Done New version of this volume has been implemented. -- billinghurst (talk) 23:04, 25 September 2009 (UTC)

[edit] {{DNB link}}

To let you know that awhile back I created {{DNB link}} which was designed to be used in Works about author section. On reflection, it may also be useful for those authors who created single articles for DNB, and we could use it rather than creating a Contributions to DNB section. -- billinghurst (talk) 23:03, 25 September 2009 (UTC)

[edit] Alignment issues: BIG statement

After having pummelling on poor ThomasV unmercifully following the upgrade to ProofreadPage, I now have a lot better understanding of page scans and the like.

  1. I have purged (?action=purge) all the File:s so that the text layers should now be available from the respective Index: pages. Didn't realise that it was a solution to the layer not appearing.
  2. Where the text is out of alignment with volumes, it will be due to reloaded data files. A couple of solutions.
    1. Delete the bad Page, and then touch it from the Index: and on the recreation of the page, the original text layer from the DjVu file will be imported.
    2. Copy and paste the text to align.
  3. Matt is playing with the original TIFF files, and would be prepared to regenerate DjVu files, probably upon request — not fully resolved at this stage. This will give us the opportunity to take a step back and look at the problems facing individual volumes. We would then need to look at how we wish to save existing work, and allow for replacements.
  4. I have worked out that we can load replacement .djvu files at WS, do the work on the front end as a local file takes precedence, and then move them over to Commons once we are happy with the condition. [Note that we need to have the Commons version deleted prior to a move.]

Charles has done a fair amount of work on existing volumes, and I am wondering on the means to work out what we need to do to each volume to get it up to speed. Do we need to have a page that reports on the condition and status of the volumes? Charles you are more around this aspect!

There was something more that I was sure that I was going to address, however, it escapes me.-- billinghurst (talk) 03:16, 27 September 2009 (UTC)

I have started a survey of progress for adding the text, at User:Charles Matthews/DNBProgress. I suppose I should finish that, and move it out into project space, so we have a single page addressing general progress. I also have a section on my usertalk where I record glitches in the djvu sequences. There are a fair number of these. I think we need some sense of priorities. Where there is good-quality text to add, it makes more sense to be fussing about getting the glitches straightened out, since certainly bad scans (where there is no alternative) aren't conducive to getting to work (unless like me you really want a given article right now). The scans I mark "good" are the Toronto scans, which (as a rule of thumb) are much superior to the Google scans. Charles Matthews (talk)

[edit] Amendments to {{DNB00}}

I have been making modifications to Template:DNB00

  • Added {{{overprev}}} and {{{overnext}}} which are used to overwrite the prev/next wikilinks. I envisage that these will only be used in the first and last biographies of a volume, and enables us to lead on or back.
  • Played with the {{{wikipedia}}} so that it doesn't give weird links to if the variable is left empty. I have removed some extra parameters that were there, though we can re-add these if they are still needed, though seemed to be an unused artefact from our stealing EB's header.
  • Placed a copy and paste-able version of the script on template, and some instructions on the Talk page.

Note that I have not applied this to other derivatives of the same template. If we find the changes successful, then we can look to update them.

I was also thinking that it may be useful to add an optional(?) {{{volume}}} which would take people back to the specific volume that they may have come from, or possibly allow them to navigate towards. Thoughts? -- billinghurst (talk) 13:06, 29 September 2009 (UTC)

Some template documentation might be good ... Charles Matthews (talk) 12:07, 1 October 2009 (UTC)
There is, see up ʌ -- billinghurst (talk) 11:26, 2 October 2009 (UTC)
Ah, I'm used to a convention like Template:DNB00/doc. Charles Matthews (talk) 12:02, 2 October 2009 (UTC)
Fair enough, and it is quite welcome to be moved once proofed. I had zillions of bits on the go, and it was easier at the time, and as in first draft to paste to talk page. billinghurst (talk)
  • Added {{{volume}}}, note that this needs to be double digit, which means 01, 02, ... 09, 10, ... 63 for the early volumes. Usage = | volume = XX . At the moment this just adds a volume link to the the header, though it will be used further.
-- billinghurst (talk) 09:54, 12 October 2009 (UTC)

[edit] Progress table

I've now marked up Wikisource:WikiProject DNB/Progress with the most basic information on how we are doing, from the side of having text in place for proofing. A more honest title would be "progress and troubleshooting", since there are quite a number of legacy problems from the bot runs, as well as the inherent difficulties and constraints caused by bad scans. Obviously this page is for everyone to update, and can be expanded to include other issues (one obvious one being the per-volume article listings). Charles Matthews (talk) 15:34, 3 October 2009 (UTC)

[edit] Transcluding to main namespace

Traditionally we have basically #LST'd text in with {{#section:...}} tag, and not undertaken any formatting of the body of the work. Things have progressed with transclusion, and I wondered whether it was time to review our means of formatting for body of works, and how we transclude.

I would like to propose that we look to put our body inside

  • <div class=indented-page>

which gives a left margin and that we look to transclude with a means that includes page numbering, ideally the new nomenclature

  • <pages index="Dictionary of National Biography volume 08.djvu" from=25 to=26 fromsection="Bury, Arthur" tosection="Bury, Arthur" />.

which would give a page that looks like Bury, Arthur (DNB00)

If we think that even that class formats too wide, we can look to customise further, and specify our own class specifically. -- billinghurst (talk) 10:28, 11 October 2009 (UTC)

The new way of doing it does still create a transclusion backlink from Page:Dictionary of National Biography volume 08.djvu/25 - one of my concerns, given the amount of maintenance still to do and need to cross-check side-effects. It would make my life marginally harder (a few keystrokes) from a navigational point of view, since I like to start from "Page:Dictionary of National Biography volume 08.djvu/25" or whatever in finding nearby pages I need; this is part of opening up the text, and is a temporary issue, you could say. I was wondering if the whole transclusion business could now be made part of Template:DNB00. Charles Matthews (talk) 13:28, 11 October 2009 (UTC)
We could look to make it part of {{DNB00}}. The additional data is pretty standard, about the only remembering issue will be 01,02,03... for initial vols. rather than 1,2,3. Let me have a go. If we do it that way, it becomes even easier, though I think that I will look to subst: most of these additional variables. -- billinghurst (talk) 14:02, 11 October 2009 (UTC)

[edit] Introducing Template:DNBset

As discussed above, we were looking at an templated means to typeset headers and transclude the relevant pages. I have coded {{DNBset}}, which should be used in a substituted form. In the easier form, if we are coding section ids with the same name use the code below. Further detail on template page. All you should need to do is complete the template and save.

{{subst:DNBset
 |article= 
 |previous= 
 |next= 
 |volume = 
 |wikipedia = 
 |extra_notes= 
 |from=
 |to=
 |section=    
}}

I have run a variety of testing, and it hasn't broken yet, though I wouldn't guarantee complete robustness. :-)

Feedback, comments, suggestions, all welcome. -- billinghurst (talk) 12:29, 12 October 2009 (UTC)

I can't speak for others; but I now always use the article name as the section id. (For one thing, unless you dab the section ids you will get strangeness in the transclusions, so two birds with one stone.) But perhaps it would be better not to get into that as a time-saver? I suspect we'll be moving articles quite a lot, eventually, as we get the titling conventions into a rigorous state. By the way, thanks for working on this. Charles Matthews (talk) 12:43, 12 October 2009 (UTC)
Welcome. It was designed so all articles still have the same output and use {{DNB00}} underneath the hood, so moving or whatever will be okay. I am sure that we will leave redirects anyway. AWB and Cats all will help us tidy underlying components, there just end up being a few to do.
Once we are settled on a preferred look, I will look to run SDrewthbot through and start tidying.-- billinghurst (talk) 13:56, 12 October 2009 (UTC)

[edit] Author field

Something that I have never asked, nor really thought about until now. I am wondering why we don't have and use the |author= field in DNB00. The field is actually there, it just isn't displayed, and yet we go through the process of adding the damn thing into notes. Doesn't really make sense to me. We could look to put it in, and even have it display in the same place by a little magic, or we can look to have it display as per normal articles.-- billinghurst (talk) 15:55, 13 October 2009 (UTC)

In the bigger context, the headers used by the Catholic Encyclopedia and 1911 Britannica projects are somewhat different (trying to do different things) - but not so very different. While obviously the page layout is to some extent determined by the original work, I see no reason why the header information shouldn't one day converge to a common style. The reader, after all, has similar expectations for the articles for these reference works. The CE style is to display the author field in the "blue" part, not the notes (even though it is typically then over-ridden by a generic message). The EB articles don't currently give authors. We do as a note. Certainly the whole issue can be reconsidered. Charles Matthews (talk) 17:51, 13 October 2009 (UTC)
When I started a separate header for the DNB articles I did adapt the EB header for the purpose. I would have added an author parameter to appear in the usual place, but that was beyond my coding abilities. If the field is actually there, it must have been added later. The headers for the CE were developed differently by someone else. The use of "multiple editors" is technically correct if we are referring to an encyclopedia as a whole. As long as the title in the header is that of the entire encyclopedia one could get the impression that an author listed immediately beneath it is the editor of the whole encyclopedia rather than the specific article. I absolutely agree that the author information would be better placed in the "green (blue?)" part.
For me this all relates back to some general questions about headers and their purpose. They are an important introduction to the material beneath them, and a way of connecting that article to the rest of Wikisource. For some others they are a place to store metadata about the page in the pursuit a more standardized relation with the outside world. In the long term such metadata will be important, but putting that into the headers may be demanding too much of headers for anything other than the simplest of situations. Eclecticology - the offended (talk) 02:25, 18 October 2009 (UTC)
Ah, that must have happened when we aligned {{DNB00}} with {{header}}. What I am pretty certain that I will be able to do is to call it | Contributor and align it with the author field as a sort of overlay. Thanks for the feedback. -- billinghurst (talk) 07:37, 18 October 2009 (UTC)
I see you have done a contributor field now in {{DNBset}}; but is the author field in {{DNB00}} (into which that feeds) rendering as intended as of right now? Charles Matthews (talk) 10:00, 18 October 2009 (UTC)
All set and useable. -- billinghurst (talk) 10:05, 18 October 2009 (UTC)

So for Negus, Francis (DNB00), just created, the Author:Thomas Seccombe finished up after the subst in the author field of {{DNB00}}, as must have been intended; but I have just moved it to extra_notes so I can see it rendered? Something I'm not understanding, or is this about how {{DNB00}} builds on {{header}}? Charles Matthews (talk) 10:19, 18 October 2009 (UTC) I actually edited the template page in the end - seems to work as intended now. Charles Matthews (talk) 08:24, 25 October 2009 (UTC)

{{DNB00}} imports components of {{header}}. I have called it {{{contributor}}} replacing |author= . It is a cosmetic change though sends the message that we discussed.
I subst:DNBset and I use | contributor = (NOT author parameter), as per the instructions that MattBr copied to the documentation page. Been working fine for me.-- billinghurst (talk) 09:24, 25 October 2009 (UTC)

[edit] Sums of money

Seeing this one quite a lot. The DNB style is to write £100 as 100l., so that's an italic-l standing for pounds sterling. The scans very often make that 100/. instead (i.e. slash looking very like italic-l). Where you'd expect to see the slash in pre-decimal coinage UK money is for shillings, as in 6/- for six shillings, 6/8 for "six shillings and eightpence". But in the DNB this would always be styled as 6s. 8d., I think. Charles Matthews (talk) 08:24, 25 October 2009 (UTC)

Yep, though not sure what you are indicating.-- billinghurst (talk) 09:15, 25 October 2009 (UTC)
Gotcha. Not having dealt with funny money, I hadn't been confused and TWIS'd. It would be nice if there was a special symbol to use for this lira derivative. billinghurst (talk) 12:40, 25 October 2009 (UTC)
Closest that I can find would be ɭ billinghurst (talk)

[edit] Articles "most needed" on enWP

I'd like to construct a listing for articles that are (a) not yet created here and (b) have WP articles showing the w:Template:DNB indicating the use of DNB text. Easy enough to take backlinks to the template page there. From the point of view of having a sensible list for checking, it would be best to order those backlinks in order of their "defaultsort" so we get a best-guess at the DNB article order. Any bright ideas at automation? Charles Matthews (talk) 13:41, 26 October 2009 (UTC)

While I think that I know what you are asking, can you give some specific examples of where the existing page is, and what you want to have done to it. There is bits that I can do with AWB about preparing lists, there is probably some tricks that someone with access to the databases can do better with some comparisons. The issue will partly be the differing naming systems, and until we have some examples the complexity of the task is hard to envisage. billinghurst (talk) 04:54, 27 October 2009 (UTC)
OK, what I did yesterday was by hand, and resulted in a dozen additions to Wikisource:WikiProject DNB/Most wanted articles, which is hardly used. I checked the first few enWP pages that show up as linking to the Wikipedia template page, w:Template:DNB, and looked to see whether there was a Wikisource article to which the WP page should be linking through that template. I found a few, but those where the relevant article here had not yet been created I listed alphabetically. All I'm asking for is a way of generating more automatically a sorted list of those backlinks, with the aim of dividing it in time by the 63 volumes. The best guess at A-Z order by surname would be to use the defaultsort as on the pages. The backlinks now are in the range 1000 to 1500, so it is passing out of the sort of scale one wants to go through by hand; and in the future the numbers will be bigger. It looks to me (Magnus presumably knows all these things) that the WP backlinks do get sorted, but only as a very low priority task. So one way would to be to ask whether they could be sorted for us by someone with the access to request that. Another way would be to ask Magnus for a tool to do that job, assuming it's relatively trivial in his terms. Charles Matthews (talk) 10:23, 27 October 2009 (UTC)
AWB can do some simple things, let me see what I can quickly scoop from both sides. It isn't going to be scientific, though it can do some broad scope matters, like show me the files in this category, and then tell me where they have, or do NOT have whichever the case, Template:YADDADA.
There are also tags that we can add to templates if fields are left empty, eg. at our end, we have the Category:DNB No WP. It is a matter of thinking through the logic that we wish to apply and to where. -- billinghurst (talk) 12:05, 27 October 2009 (UTC)
Is this the sort of data that you were after? ... Wikisource:WikiProject DNB/Data capture billinghurst (talk) 13:41, 27 October 2009 (UTC)
Thanks, yes, a listing to show where the links back might exist. I have ticked a couple that I put in immediately, but it turns out that {{tick}} doesn't exist here yet. One of my personal favourites. Charles Matthews (talk) 15:57, 27 October 2009 (UTC).
We have similar as {{done}} Yes check.svg Done billinghurst (talk) 21:07, 27 October 2009 (UTC)
... the links back might exist ...

—Charles Matthews

Do you mean from WS to WP? This doesn't do Category:DNB No WP? billinghurst (talk) 21:07, 27 October 2009 (UTC)
No, cross-purposes I think. 'Back' in the sense that someone has placed DNB text on WP without linking to the DNB article in its natural habitat over here. Charles Matthews (talk) 22:41, 27 October 2009 (UTC)

[edit] "DNB Archive" on ODNB

There has been a little recent discussion (not here) of the status of the text available on the ODNB website and called "DNB Archive", purporting to be the original article text from the first DNB edition. Having tried this out, and paid close attention, I found on the second article I tried a small difference from the djvu (it was the expansion of an abbreviation, not an erratum). Using this text is therefore a time-saver where the scans are bad, certainly, but this is something that still requires detailed proof-reading and is certainly not a total short-cut. Charles Matthews (talk) 16:32, 26 October 2009 (UTC)

I would presume that we are marking those shitty pages as problematic in the PAGE environment, so we can look to replace the pages individually or the volume collectively. That gives us the scope to then at least proof against the proper version. billinghurst (talk) 05:04, 27 October 2009 (UTC)
OK, that sounds like a plan. There are actually a number of cases that come up: you can have the djvu legible enough to use to proof text, but the scanned text is simply very corrupt, in which case this other source of text is very useful; the scan has interleaved lines caused by imperfect scanning across the two columns, ditto; and then there is the case where the djvu is effectively illegible. Where there is no replacement PDF download in the last of those cases we are still apparently stuck, but it would certainly be helpful to accumulate information on which those are, so that those with access to physical copies can know where we most need that help. I came across one such page in the Hobbes article yesterday. Charles Matthews (talk) 10:08, 27 October 2009 (UTC)
I can get access to jpg images of the 23 consolidated volumes from the 20s. Not perfect, however, may be better than nothing. Just need to mark it as such. billinghurst (talk) 11:58, 27 October 2009 (UTC)

[edit] Category:Problematic

I'm just getting up to speed on Category:Problematic, which currently has 17 DNB pages. If that is to be used as a general place for cleanup of scans, it is probably going to need its own subcategory subsystem and a bit of infrastructure. I suggest systematic use of discussion pages, in reference to the various kinds of issues such as are discussed under the previous topic. Charles Matthews (talk) 10:34, 27 October 2009 (UTC)

Sure, however, for specific editions, we can manage it by adding notes to the respective Index_talk pages. We can also look at DNB IndexPages to get an overall picture of where the problematic pages are situated. I also remember someone showing me how you could search for the union of two categories. billinghurst (talk) 11:29, 27 October 2009 (UTC)

[edit] Author page population

There is a good way to progress this side, and I thought I'd note some caveats. First, it seemed too good to be true that the ODNB site would list all contributions to the DNB by a given author; and, duly, it apparently gives about 30% of articles, representing I suppose those articles that were revised rather than rewritten from scratch. So there are 400-odd for Author:Thompson Cooper, rather than 1422 (some say 1423, no matter). Still it seems that this is all worth having. Secondly, it is impossible to guarantee disambiguation on the first pass (without time-consuming checking), and even the spelling of article titles may have changed in some cases. (In the longer term a reference list on WP will note authors by initials for the articles, and the volume lists here will be properly dabbed, and so confusion will be averted.) Charles Matthews (talk) 17:14, 29 October 2009 (UTC)

[edit] Image or images that epitomise the DNB project

Is anyone able to think of an image or some types of images that they think epitomise DNB/DNB project? Preferably something on Commons. I had thought that some of the Vanity Fair images give some aspect to what we are doing. -- billinghurst (talk) 02:09, 1 November 2009 (UTC)

To start with the obvious, File:George Smith by John Collier.jpg on Commons is the publisher, atmospheric in Victorian whiskers. But I think you want something more like a logo? Charles Matthews (talk) 09:16, 1 November 2009 (UTC)
No, I wasn't thinking logo, I was thinking emblematic, something characteristic and representative of the work and its time. Not a bad start. --billinghurst (talk) 09:46, 1 November 2009 (UTC)
Producing ... Here for comment billinghurst (talk) 12:35, 1 November 2009 (UTC)

George Smith by John Collier.jpg

Wikisource has a number of active Wikiprojects that could use
your help in tackling these large additions to our library.


Dictionary of National Biography Project
Work: Dictionary of National Biography


[edit] Alignment problems in vol.35

Not sure if this is the right place to bring this, and also whether somebody may have already flagged this elsewhere, but there are alignment problems in volume 35 between the OCRs and the image scans. Around page 300, they're off by about 6. Page 299 [Page:Dictionary of National Biography volume 35.djvu/305] and page 305 [2] have been cut & pasted to match the images; the corresponding OCRs were previously on the screens with the images for page 305 and 311. Presumably, somebody has a bot that can fix this? Jheald (talk) 15:43, 3 November 2009 (UTC)

It's the right discussion page: these things are logged on Wikisource:WikiProject DNB/Progress. Charles Matthews (talk) 16:20, 3 November 2009 (UTC)

[edit] access to scans

I would suppose that the most frequently used and important links for this project are to the djvu files, the only way I have found is the subpage/progress. I am keen to do the odd article, but I can not easily find or remember this. Could we put a page in main space with the volumes and indexes, if we don't have the page the user (and bewildered contributor) can still directly access the scans, a la The Botanical Magazine. The NLA newspaper project offers anyone viewing their scans the opportunity to tag and correct as much of the text as they feel like, apparently this is very successful (several users making around a quarter of a million corrections!). Cygnis insignis (talk) 11:04, 7 November 2009 (UTC)

I am currently working through and updating each volume (Index: <-> main ns), and will be adding scans links on the volume main ns pages. As each volume lists the available biographies, I believe (at this point) that this is a better solution (for the moment). billinghurst (talk) 12:37, 7 November 2009 (UTC)
I'd be happy to format the information required and put it anywhere considered more prominent. For the "casual" participant, I suppose the questions most needing an answer are: (i) how to locate the djvus relevant to a given biography you have in mind; (ii) how to replace the OCR text with better text, in the frequent case that the bot posting/text layer assocatied with the djvus is not the best available. The way things are set out on the Progress page recognises that there are various qualifications and caveats likely to be useful to someone approaching all this work, but in a reference format rather than an exposition starting from the basics. Charles Matthews (talk) 10:50, 11 November 2009 (UTC)

[edit] Upgrading scanned volumes

I believe that I now have a decent understanding of replacing the File: versions at Commons, and how we can upgrade, and the consequences of such a change.

  1. Replacing a volume for volume with a better quality version is eminently doable, (and actually one we should keep a watch upon for the potential negatives and positives)
    1. if a complete volume is replaced with another complete volume, then things are okay as long as the same start page, and corresponding thereafter.
    2. if an incomplete version is replaced with a complete version, while this is great, it may involve a bit of page moving, hence we should resolve these files before we further advance much further with the fixing text in these beasts.

[edit] Background

A DjVu file has two layers (image and text). When a DjVu file is loaded to Commons, both are available to ThomasV's Proofread Page extension.

  • Creation of an Index: page, locks in the file at Commons, and use of pagelist reveals the images available.
  • Creation of a Page: page, shows the respective image from the Commons file, and grabs the text from what it sees as the corresponding text layer for the page. The text is what is imported to WS, and thereafter stored on WS.

[edit] Why do I tell you this?

When our Page: text is of a poor quality, and we upgrade the Commons file, the image will be upgraded, however, as the text is at WS, there is no change to the text displayed. So to grab the text layer from the new file, we need to delete our text (Page:yaddada.djvu/nn) and when we recreate the page the text layer from the upgraded file is now imported if available.

So, for example, in Vol. 57 where a number of the scans were poor and, subsequently, I have replaced the DjVu file, and we already have a number of pages created. Some of the red NOT PROOFREAD pages will have poor OCR, we may have to delete those existing pages and recreate. Deleting the pages holus bolus will not be advisable as some are partially or fully corrected, and just not advanced in their proofreading status.

Where I do replace a DjVu file at Commons, I will make a prominent note on the corresponding Index: page to alert people to this information, and direct them here to ask for admin to undertake requests identified.

[edit] So?

If a page is of poor quality, check to see if we have reloaded the image file (see Index: page), and we may be able to help out. Leave any such request on this page, and an admin will deal with it. Questions? -- billinghurst (talk) 13:17, 7 November 2009 (UTC)

Your delving into these issues is much appreciated, and, yes, the sooner the better as far as sorting out the obvious page glitches is concerned, since the work involved is only going to increase over time. Coming at it from the end of identifying the cleanup to do, I'm now proactively using Category:Problematic to report dud djvus, and I see there are now 56 there. That is probably only the tip of the iceberg for illegible djvus, though. I assume we're agreed on priorities? Gaps in the continuity of pages are the worst issue, because the knock-on effects of a numbering change are large. Illegible djvus where there is no alternate scan available at all (say case "poor" in the Progress table) are the next worst case, since that means that proofing can really only be done at present from a physical book. Bad text can be got round using the ODNB-hosted scans for those with that access (now includes me), as long as there is any sort of readable djvu.
We need ... oh, lots of things (the project as whole is fairly complex, as we are finding out) but given the request in the previous topic, can we have a look at just a few? (A) Mapping of the offsets (i.e. number in pagespace − page number in volume) as a way of finding those pesky gaps more systematically, by sampling through all 63 volumes; (B) Documentation that will make sense to those wishing to come in and help out; (C) Central page for requests - I mean a forum that is not so much about these project-level issues, but where people can simply post "I want to create/have created for me the biography of X" and state the issue they have, for an answer and attention. Charles Matthews (talk) 11:13, 11 November 2009 (UTC)
I've done a basic survey on (A) now and entered the findings at Wikisource:WikiProject DNB/Progress; and I'm working on (B), which means writing down for the first time systematically numerous bits of know-how (a couple of pages still need to be created, see redlinks on the main project page). As for (C): I can think of various kinds of requests. There may well be a need for admin recreation of pages that have been deleted only for the sake of sorting out the djvu sequences, and for those you can just contact Billinghurst or me. Either of us will try to handle requests for general help, also. I can imagine people unclear about getting decent scanned text for a particular article or section in pagespace. The answer may still be "difficult to do", but please raise such matters here, so that we can start to log the worst places. Any time you find you are leaving a gap deliberately in creating articles, we really should be aware of the issue. /Most wanted articles is effectively unused, except by me. I don't see that this page is redundant, though. Charles Matthews (talk) 12:11, 13 November 2009 (UTC)

[edit] Working over the Project's pages

I have been updating the main WikiProject page, which needed it, and have placed the Style Manual at Wikisource:WikiProject DNB/Style Manual. Manuals of style tend to grow, and to be a bit contentious at the margins. Watchlist it, and bring up stuff on the Talk page there that needs hammering out: there are quite a number of grey areas still, and all I'm trying to do as of late 2009 is to bring into focus what we currently (some of us) do. Charles Matthews (talk) 16:26, 12 November 2009 (UTC)

Fantastic. We are building a nice bit of momentum, and it would be great to hear the opinions and thoughts of our recent newcomers. Welcome to those who have joined and are watching. billinghurst (talk) 19:37, 12 November 2009 (UTC)