Wikisource talk:WikiProject DNB/Archives

From Wikisource
Jump to: navigation, search
Wikisource talk:WikiProject DNB (Archives)
Warning icon.png
This is an index of archives; for recent discussion, see Wikisource talk:WikiProject DNB. Please do not post new comments on these pages; if you wish to revive a discussion, either move it back to the main page or link to it. Dates correspond to when the discussions were archived. Archiving may have been irregular with some months absent altogether.

For help archiving pages, see m:standard archival system.

Contents

Archived in 2009[edit]

Expanding our use of {{textinfo}}[edit]

I just looked deeper at {{textinfo}} and can see that we should be using more components rather than ticking the little boxes at the bottom {{textinfo | edition = | source = | contributors = | progress = | notes = | proofreaders = }} Can I suggest that we put the full box and dice into all articles, when we create them (prefereable) or first edit them if others transcribe. I am not template expert, however, is there a possibility that it is within the DNB template that these parts are included? -- Billinghurst (talk) 08:01, 9 September 2008 (UTC)

Readers' Page[edit]

I fail to see the need for a separate page of information for the reader. The only reason given for this page is that it was copied from the EB11 project. It is not a general practice for other long works; why should the DNB project be an exception. There is no need to give one person's interpretation of copyright rationale on that page when such matters are already dealt with on a more general basis elsewhere. Eclecticology (talk) 06:25, 21 October 2008 (UTC)

Here is the material I added and you removed:
This project has not been endorsed by the Oxford University Press or any agent, editor, or subsidiary thereof. The Oxford University press has been the publisher of the Dictionary of National Biography since 1917. Modern derivatives and supplements, now known as the Oxford Dictionary of National Biography, continue to be protected by copyrights. The 1900 DNB, the first two supplements, and the early reprints are in the public domain because their copyrights have expired in the United States. Based on a commonly-accepted interpretation of Copyright law in the United States, any work that was made available to the general public in the United States prior to 1923 is in the public domain. The Wikisource servers are in the United States. Other jurisdictions have other rules. Your Jurisdiction's rules may extend the copyright to the date of the "death of the author plus 70 years," or to the "death of the author plus 100 years." If this is important to you, then you should refer to the author page for each specific author of the DNB article in question.
I do not understand your objection to the material. It is not a personal essay. Instead, it is a summary of a great deal of commentaryand discussion relating to copyright law that has occurred over the last uear over at Wikipedia. If the UK "life plus 70 years" rule is used, then some of this material is still copyrighted under UK law, and the reader may need to know that. I live in the US, and for me the material is inthe public domain, so I don't care all that much. -Arch dude (talk) 00:55, 22 October 2008 (UTC)
Although I still have reservations about the various copyright templates, they at least save us the problems associated with reinventing that wheel any time that a project like this is undertaken. I don't at all doubt that there has been a lot of discussion of copyright law at Wikipedia; that sort of discussion has been a constant presence ever since I became involved in 2002, and I'm sure that that will continue as long as there is a Wikipedia. Be that as it may, this project is not bound by the discussions on Wikipedia since it is to be presumed that those who primarily spend their time on Wikisource do not take a lot of time following chronic debates on Wikipedia. We here too have ongoing copyright discussions based on those aspects of copyright that are relevant to Wikisource. Your summary of the Wikipedia discussions remains your summary of your interpretation.
We obviously have no differences about the status of this material under US law, but I would not be so quick to draw conclusions about copyrights under UK law, or to speculate about the law of third countries. Remember that the material that is being considered in this project was originally published before the major revisions to UK law in the 1911 Copyright Act. On any given article the author was only identified by his initials; properly recrediting those authors is our innovation. The DNB is a collective work; what evidence do we have that any of the authors retained their copyrights instead of having produced works for hire? Eclecticology (talk) 05:00, 22 October 2008 (UTC)
FWIW, the last time I read US legislation associated with (IIRC) its acceptance of the Berne Convention, led me to believe that the Life+70 years rule applies under US law for UK books not published in the US before 1923.
A comment I made - in the context of discussing the copyright of Punch Magazine on Gutenberg - was:
US copyright law - specifically an amendment made on Nov. 13, 1997 to 17 USC 104A (Public Law No. 105-80) - effectively restored in the US copyrights on foreign works which were copyrighted in their country of origin. http://en.wikisource.org/wiki/United_States_Code/Title_17/Chapter_1/Section_104A#Amendment_history
There is not - that I know - any other mechanism by which one could assert that a 1920 Punch is PD in the US if it is copyright in the UK, given 17 USC 104A.
Equally, all of the DNB author's I've researched to date - five or so - all died before 1938 and so their work is in the clear. But there is scope, in my analysis, for some of the information still to be under copyright. --Tagishsimon (talk) 16:05, 28 November 2008 (UTC)
Not if you treat them as works for hire. Eclecticology (talk) 00:17, 6 December 2008 (UTC)
But the DNB was published (i.e., printed) in the US in addition to being published in the UK. -Arch dude (talk) 01:02, 6 December 2008 (UTC)
Before 1923. Eclecticology (talk) 18:01, 6 December 2008 (UTC)
Exactly. As far as I can tell, The 1997 law that Tagishsimon was referring to does not affect works that were published in the US before 1923: those works, including the DNB are in the public domain. Since the title page forthe DNB volumes says "New York", and a date prior to 1923, we are OK.

CotW[edit]

The EB1911 project often transcribes the EB1911 article for our weekly collaboration project. This week's author is Author:Isaac Brock, and the EB1911 article already exists, and is featured on Wikipedia. DNB should do the same! John Vandenberg (chat) 23:57, 24 September 2008 (UTC)

Will do. Eclecticology (talk) 04:19, 25 September 2008 (UTC)
John, is there a list of forthcoming CotW? -- billinghurst (talk) 23:08, 13 October 2008 (UTC)

Category:[edit]

Biographers?[edit]

Would you agree that all of these authors should be classified as biographers? It currently exists.

I have no problem with this as it is directly relevant to that person's written work. I admit though that I have been doing nothing about putting categories on author pages. I do find things like Category:Knights of the Elephant somewhat over the top, but it does have ardent defenders.
There definitely should be a clear purpose for Categorisation. This was a coin toss. Nights of the Heffalumps? Pass! -- billinghurst (talk) 14:27, 20 September 2008 (UTC)
A good category system requires being able to take a global view on how it's structured, and how one's own interests fit in to something bigger. Eclecticology (talk) 00:32, 21 September 2008 (UTC)
Somewhat tangentially to this I would like to see more generally done about categorizing texts than authors. Wikisource is after all about the texts. Eclecticology (talk) 18:57, 19 September 2008 (UTC)
That would mean Wikisource:Biography would be ugly, though from Category:Biographies it would have some value. DNB is already at the former, and in a subset of the latter. -- billinghurst (talk) 14:27, 20 September 2008 (UTC)
Wikisource:Biography is probably a lost cause anyway. It's unmanageable. Eclecticology (talk) 00:32, 21 September 2008 (UTC)


Authors to works[edit]

As per my question to the Scriptorium, I was thinking of putting within the category tree Category:Authors >> ...Authors by Works >> ...Authors in DNB and having that as a category for all authors. Yes? No? Maybe? -- billinghurst (talk) 11:37, 19 September 2008 (UTC)

Somewhat lukewarm, but again it's not likely to be the sort of place where I will participate. For me it begs the question: When is a category useful? We currently have 2083 entries in Category:Early modern authors, and I can't imagine any reason why I would want to look anything up there. Sorry if I sound a little testy over this matter of categories. Eclecticology (talk) 18:57, 19 September 2008 (UTC)
Most of my reasoning was systems for us.
  1. With temporary empty Author slots, may lessen where people adding superfluous flags
  2. Allows for a cross check in that every page would expect to have one of more DNB pages

Your other reflections are based upon the lack of usefulness of much categorisation. -- billinghurst (talk) 14:32, 20 September 2008 (UTC)

I wasn't objecting completely, and I think I understand what you're getting at. The intermediate category,"Authors by works," is probably too broad, and gives too many people the opportunity to fulfil their natural tendency to get things wrong. Maybe "Encyclopedia contributors?" Go ahead, and we can see how it works out. Eclecticology (talk) 22:17, 20 September 2008 (UTC)

Done → Category:Contributors to DNB. Full notification to Wikisource:Scriptorium. -- billinghurst (talk) 05:44, 21 September 2008 (UTC)

Poor Quality Scans[edit]

I really hate to bring this up, but I think it is important. I am wondering if we shouldn't replace the current scan with higher quality ones from the Archive.org instead of these Google ones we currently have. The Google scans are missing pages, have unreadable pages, are hard to read and are in general poor quality. In order to make this change, we would need community approval. Here is a list of the higher quality scans. Please note the good scans all end in (Volume volume number). I think with the importants of this project we need to have the highest quality possible. Please let me know what you think. --Mattwj2002 (talk) 12:56, 4 December 2008 (UTC)

Simple answer. Yes. I was obviously opportunistic and enthusiastic in prior uploads. Better is better. No need to hate brining it up, truth is truth.  :-) -- billinghurst (talk) 13:00, 4 December 2008 (UTC)
I think you will find that the University of Toronto texts at archive.org are generally better. Unfortunately, I never found all of the volumes from there, so we will end up with a fair nyumber of the google volumes also. Do you intend to start with the list on the Wikipedia project page? If you have not already completed your list, I can check my list to see if I found one or two that are not on that list -Arch dude (talk) 01:07, 5 December 2008 (UTC)

To look at the 70 non-Google DNB's one can do the search "creator:(sidney lee) OR creator:(leslie stephen)" AND national -description:(google). There are other compiled lists Wikipedia:WikiProject Missing encyclopedic articles/DNB, though I can compile a fresh one if that is preferred.

If it is a case of replacing pages, then I have access to a good series of individual JPGs (latter rendition however) that I can trawl to locate and download. -- billinghurst (talk) 03:03, 5 December 2008 (UTC)

Billinghurst, where did you get the images from? I think we should create a page with a link to each volume we are going to use. I agree the Toronto texts at archive.org are generally better. I think we should uses those first and use whatever else source is best for the missing volumes. My biggest objective is we have these volumes as close to prefect as possible. --Mattwj2002 (talk) 12:41, 5 December 2008 (UTC)

From the same place, I just eliminated them from the search results. I have a table existing for what I have been using already Wikisource:WikiProject DNB/Djvu files. We just need to munge it to reflect (deflect?) previous attempts. -- billinghurst (talk) 14:54, 5 December 2008 (UTC)
While I can't legitimately complain about any of the techniques outlined above, we still need to be mindful of excessive perfectionism. We are accumulating a large assortment of scanned pages, and a large number of OCR pages are quickly following. Ultimately though, what we want is a series of articles that have been reliably proofread, and that is a tremendous challenge. The mini-projects to proofread single-volume works are given a full month to be accomplished. At that rate it should take more than five years to bring the DNB to completion....but only if we don't do anything else. Eclecticology (talk) 00:53, 6 December 2008 (UTC)

Archived in 2010[edit]

Disambiguation[edit]

I've drafted the disambiguation section on the project page. Let me know your comments. I'll attack some of the other topics in due course. Eclecticology 18:45, 28 August 2008 (UTC)

As part of how we can manage disambiguation, I have also (re)created this page
and have loaded the pages of the same name. There for thoughts one whether that is a suitable means to help. -- billinghurst (talk) 10:22, 28 July 2009 (UTC)

List of authors[edit]

Some guidance to where the list of authors are in the book would be useful. Probably something that is worth giving guidance in scope. Thx.

-- Billinghurst 02:49, 30 August 2008 (UTC)
What I've been to doing to rough this out is to simply take the number of pages in the volume and divide by four. This should give an idea of where to break up the list. I've been avoiding putting the break in the middle of a set of surnames, but this may not matter in the long run. The list can always be easily fine-tuned at some later stage.
With the first two vols., I have just been adding a page per column, and I have then been getting part the way through a fifth page. I then even them up, and like you, split after a set of like surnames.
I've been working directly from the pages, but that will work too. Your way insures that we add the see references, and mine does better with the cross-reference articles found in the text. Either way works, and having both operational will help one pickup what the other has missed. Eclecticology 18:34, 31 August 2008 (UTC)
I suspect that the reason for the breaks between list items is to avoid having the whole list wrap into one long text. Bulleted lists will avoid this (as would the less elegant addition of line breaks). Eclecticology 18:51, 30 August 2008 (UTC)
I must have been less than clear. I was enquiring about the source location of the authors of the individual articles, eg. A. H. M.. At this point, I just know and list them with their initials, and leave them from others to modify, if they see them. -- Billinghurst 11:12, 31 August 2008 (UTC)
Sorry if I misunderstood. I'm working from the "List of Contributors" in the first volume of the reprint edition, pp. xi-xx. It covers the original volumes plus the first supplement. If anyone anywhere has already scanned this it would make developing this dense material into a workable page of links much easier. The other option is for me to type this out the hard way. Eclecticology 18:34, 31 August 2008 (UTC)
Hey, absolutely no prob, thx for the direction. I have uploaded scans DNB contributors and will remove them in a week or so

BTW I type all that I do (my preference). Obviously I am a dinosaur in my ways. :-) So if I need to type these out, give us a yell, and I will do it when I get back later this week or next. -- Billinghurst 02:45, 1 September 2008 (UTC)

It's interesting to compare what you have scanned with my hard copy. They are essentially the same except that your Jan. 1932 printing (per the bottom of page 1) had the daggers, and they were omitted in mine from May 1942. I did find one name spelling variant. I'll try to at least work up what I have in mind for these pages in the next couple days.
Around here I too am from the age of dinosaurs! It's tedious, but at times there is no realistic other choice. It is good to know this when I'm proofreading something. The kinds of discrepancies that arise from typing material in have a different quality from those using an older uncorrected edition or from OCR errors. Eclecticology 08:47, 1 September 2008 (UTC)
I've started Dictionary of National Biography, 1885-1900/List of Contributors. Have a look and let me know what you think about it. Eclecticology 20:40, 1 September 2008 (UTC)

Proofs, authors and wikilinks[edit]

Is there some easy way to record and mark that something has been proofread, and probably that the wikilinks and author data has been attended to. I am finding with articles, when they have the [q. v.] components and want to come back to them later to do the right wikilink. -- Billinghurst 14:13, 30 August 2008 (UTC)

To be honest, I don't think there is a satisfactory answer for this. I have been clicking lately on the 75% box in the text advancement section under the summary box. This puts a message "Proofread and corrected" in the summary box and adds {{TextQuality|75%}} at the top of the edit page. The other stated magical effects are more obscure.
I might go and ask someone. -- ab
Another option is to use the template {{textinfo}} on the talk page. This would give more detail, but requires that much more work. This gets very tedious when you are working on a lot of short articles.
Neither of the above two techniques have a specific provision for documenting whether the wikilinks have been made. The ones with the [q. v.] are clearly the most significant, but there are plenty of other names that could be cross-linked. On top of that, you really need to check out whether disambiguation is required; I've even found one reference to a biography that doesn't exist.
The other problem that I'm finding is what are we proofreading to. The original 63 volume edition may not be the best. I've been using the 21 + 1 volume reprint edition which incorporated the extensive 1904 errata volume even though it did not make other substantive changes. That creates an enormous challenge. I look forward to your comments. Eclecticology 22:32, 30 August 2008 (UTC)
(IMNSHO) I am interested in the most correct data, rather than the more sentimental replication of data at all cost, whether correct or not. That is probably the genealogist in me! So ideally it is more that we need to annotate the edition to which the proofing was done, though I would be happy with a proof than none. -- Billinghurst 11:18, 31 August 2008 (UTC)
I don't disagree. Now it's a matter of finding a practical solution that is clear and does not generate excess work. !!!! Eclecticology 18:06, 31 August 2008 (UTC)

Further suggestion on the above[edit]

If the potential recruits had the page already started would they be inclined to add the text directly to that article? We could go ahead and start articles with the headers only and put them in Category:DNB25%, but those articles would have their headers complete, including author and Wikipedia links. Once they're trained to add the text to these articles (and if they haven't yet been intimidated by a welcome message), maybe they could progress to proofreading each other's entries, and after that cross-referencing. Eclecticology (talk) 06:39, 19 September 2008 (UTC)

Nice solution. I do think that having the framework in place (without author text and WP links) would be very useful. Ability to "type" and "edit" and "save" would simplify things. I think that we may be able to sweeten the deal.
N-ve may become that the categories don't get amended; for interim viewers it is not evident that nothing behind the link
P+ve direct links to each TO DO page by task from project page
The notion of being stuck with arbitrary percentages did cross my mind. If the current spectrum works, so would 20/40/60/80/100. Keyword categories are indeed more useful, and some of the issues that arise with shorter publications, are no longer a factor, especially sourcing issues which is a global one for the entire project. They should unambiguously specify what needs to be done instead of what has already been done. The problem with the word "proofread" is that the past and present tense are spelt identically. More than one category may be used on the same page.
May I suggest:
  • Category:DNB Add text - The article has a header and needs a text. This is roughly equivalent to the current 25% category.
  • Category:DNB Add header - May not be needed except in the occasional situation where we have a text but no header.
  • Category:DNB Verify - The text needs to be proofread - Roughly equivalent to the current 50% category
  • Category:DNB Re-verify - The text needs to be proofread by a second person to give greater quality assurance.
  • Category:DNB Link - Needs wiki-links added.
  • Category:DNB Stable - This page has been proofread by at least two people, the last of whom did not find any errors. The clearly required links to other DNB articles have been created, but optional ones may still need to be made.
  • Category:DNB See - Use this for articles that are little more than cross references to other articles.
  • Category:Needs info - This tries to address your negatives and the matter of {{textinfo}}. Perhaps it could involve a modified version of the box that could go right on the content page. This requires more thought.
Comments? Eclecticology (talk) 18:09, 19 September 2008 (UTC)
Broad agreement. DNB Link can be added or checked. {{textinfo}} probably becomes redundant with proposed framework, beyond not knowing who undertook. That said they need to amend talk page, and that is less likely to happen anyway. About the only additional is potential for a hidden(?) DNB complete to allow for quiet cross-reference ability. -- billinghurst (talk) 14:51, 20 September 2008 (UTC)


pagescans[edit]

I have started Wikisource:WikiProject DNB/pagescans by scanning this list (using the "Flip Book" feature). John Vandenberg (chat) 00:50, 24 September 2008 (UTC)

Thanks. An issue that arises is whether these are the best ones to use for proofreading. The first combined edition might be better since it incorporates the 1904 errata. Eclecticology (talk) 04:18, 25 September 2008 (UTC)
With the change in the upload limit to 100MB, all the DNB files fall into the limit, it seems opportune to get the DJVU files into Index: pages. I have started that process and you can see where I am up to with Index: pages at Wikisource:WikiProject DNB/Djvu files.
For where I am up to with download process, see billinghurst archive.org bookmarks. The list will be completed with what is available time. I am limiting myself so that I can keep track of where I am up to. -- billinghurst (talk) 12:45, 25 November 2008 (UTC)

DNB Author links[edit]

I just added a template: {{DNB contributor|x. y. x}} . I intend to add this template to each DNB contributor in the list of contributors. I also intend to add a new template for each DNB contributor. The worked example is {{DNB GCB}}.

The idea here is to simplify the creation of the last line of each bibliographical article. Whenever you add an article, just add the correct template as the last line. The template will add the right-justified initials with the proper author link. -Arch dude (talk) 01:17, 1 October 2008 (UTC)

Is it also worth having the template drop in Category:Contributors to DNB? That might make it easier in the long run to pull it, if/when that decision is made.
Not quite. The "contributors to DNB" needs to be added to the template that is used in each of the (less than 100) "Author" pages. That template is template:DNB contributor. Ecletology (I think) wrote that template.We could add the categorization to that template, which would allow us to remove the categorization later if someone does not like it. My template(s) will be added to each of the 50,000+ bibliobraphy pages. We could elect to amalgamate these templates with the footer template if such a template exists. -Arch dude (talk) 00:05, 2 October 2008 (UTC)

I just added a template: Template:DNB footer initials. This is an "internal" template to be used to create the footer (100 or so) templates for each author: See template:DNB GCB as an example. -Arch dude (talk) 00:05, 2 October 2008 (UTC)

I just edited all the articles to convert the author footers. -Arch dude (talk) 02:13, 5 October 2008 (UTC)

  • 100 authors? :-) It's over 600. The templates seem to work; they do keep me from needing to look up the same authors miltiple times. Eclecticology (talk) 23:26, 5 October 2008 (UTC)
I have updated the {{DNB contributor}} template so that it also transcludes Category:Contributors to DNB. I am working through filling all the existing Author pages with the template. -- billinghurst (talk) 23:32, 26 January 2009 (UTC)

Progress? and variations[edit]

I am just wondering on how we are progressing with these author templates?

Also, when I was cruising through volume 1, I see that Sidney Lee had been assigned the initials "SLL" and in the later compiled List of Contribs he is "SL". We will need to review how to handle, and watch out for others. When AD does his template, should we just redirect one to the other? -- billinghurst (talk) 11:51, 30 November 2008 (UTC)

My original intent was to add the templates gradually, as needed. After your heroic effort to add the page scans, I'm now addingth templates pre-emptively, starting with the ones listed on the "list of writers" page for vol 1 and progressing slowly forward from there. I am finished with vol 1 and most of vol 2. during this effort, I added S. L. L. as a synonum for S. L. on the consolidated contributors page, and E. W. G. as a synonym for E. G. I also added H. R. L. I did not add the two Misses Clerke, because I couls not figure out who they really are. (A. M. C. and E. M. C.) -Arch dude (talk) 12:10, 30 November 2008 (UTC)
Done through vol 5 Slow going here. I'm still finding about ten new authors per vol. -Arch dude (talk) 18:54, 30 November 2008 (UTC)
If I/we concentrate our transcriptions on vols. 1 through 8, or earlier, then that covers the templates for now. :-) billinghurst (talk)
Done through vol 8. I'll wait until the 'bot does a few more volumes, and then catch up with it. However, please don't feel constrained: if you need a template, just make it yourself ro drop me a note and I'll do it. -Arch dude (talk) 00:23, 1 December 2008 (UTC)

1904 reflections of Sidney Lee[edit]

Found letter to The Times from Sidney Lee that reflects on DNB and Thompson Cooper. Interesting info. The one thing that took my notice was the number of articles (1422) written by Cooper for the DNB. This is going to make a very long page! We will need plans for how to manage that many pages. -- billinghurst (talk) 06:39, 11 October 2008 (UTC)

Yes, and Gordon Goodwin is in second place with 1178 entries. At least 26 authors have over 200 articles. (See the "Statistical Account" in the revised volume 1.) I think this is a valid concern, but not an immediate one. We can probably devise an appropriate system of sub-pages in the way that Rudyard Kipling's poems have been split off from his main author page. Eclecticology (talk) 17:30, 13 October 2008 (UTC)

List of revised text to be upload[edit]

Hi guys,

I moved the list of text to be revised and uploaded to here. Your help adding to and modifying the list would be really appreciated. Once we get the list completed, I would like to start uploading the text.

Thanks for your help in advance. --Mattwj2002 (talk) 08:00, 31 December 2008 (UTC)

Slippage[edit]

I noticed some vol. 59 scanned text a number of pages adrift from the images. Page:Dictionary of National Biography volume 59.djvu/236 has the beginning of the article "Walsingham, Edward" if you want the page image, while Page:Dictionary of National Biography volume 59.djvu/242 has the text. Possibly an offset has been made the wrong way round. For the moment I'll just work on that one article, and paste corrected text on my user page. I'm presuming this is a bot error and can be best corrected by the bot. Could someone drop me a note about this on my talk page? I'm here on wikisource often enough, but basically just to correct text here before using it on enWP. Charles Matthews (talk) 16:41, 11 February 2009 (UTC)

Thanks Charles, we will see if we can get Matt to reload the text with his bot. It probably came about when I trimmed leading pages, and didn't correspondingly trim the text. -- billinghurst (talk) 01:40, 12 February 2009 (UTC)
Right now I am really busy with work, but I'll take a look when I get a chance. --Mattwj2002 (talk) 12:43, 12 February 2009 (UTC)

Another glitch: Page:Dictionary of National Biography volume 28.djvu/222 is for p. 235 of the original. while Page:Dictionary of National Biography volume 28.djvu/223 moves to p. 238: two pages missing. Charles Matthews (talk) 15:09, 19 February 2009 (UTC)

Titling[edit]

What is said so far on the project page doesn't give a complete style manual. (And I think the text conflicts with the example: John Holt (d.1415) has no trailing space after "d.") There seems to be disgreement, or at least a lack of clarity, on the disambiguation. And possibly format issues. Agreed we give dates to disambiguate. The "name" part though is not agreed to be an initial segment of the name as given at the start of the article, because "Newton, Sir Isaac" is (as I understand it) to become "Newton, Isaac". And then in the case of titles of nobility, Talbot, George, sixth Earl of Shrewsbury (DNB00), there are other George Talbots. It is arguably the right prompt to the reader to leave the title there. It is possible to disambiguate by dates here, but is being said that the dates should override the title? (There is also the point that there is some latitude in using upper case where the DNB has caps, as shown by this example.)

Two points about this all:

(a) getting a Manual of Style worked out now when progress is 1% to 2% makes sense rather than waiting longer to discuss this all; (b) this may be vain talk, but I'm really not happy with the approach as initiated - I'm not a fan of inverted names, which double searching effort, at the best of times - and I can think of ways that would be more helpful to readers searching (which can be implemented perhaps by redirects, but in any case depend what we think we are aiming at).

The basic indication of the "name" of an article would be how the volume index names it. Perhaps we can get in from that end. Charles Matthews (talk) 12:52, 20 June 2009 (UTC)

So as can be seen at Page:Dictionary of National Biography volume 55.djvu/493, the volume index gives "Talbot, George, sixth Earl of Shrewsbury (1528?-1590)". We could call that the "official title". To meet several of my issues, we could agree the following: the header template to have a field which is to hold the uninverted form of the "official title". Charles Matthews (talk) 13:02, 20 June 2009 (UTC)

ODNB comparison[edit]

Quite an interesting view from an insider: [1]. Charles Matthews (talk) 06:47, 7 September 2009 (UTC)

Yeah. Different organisation, same problems. -- billinghurst (talk) 07:59, 7 September 2009 (UTC)


{{DNB link}}[edit]

To let you know that awhile back I created {{DNB link}} which was designed to be used in Works about author section. On reflection, it may also be useful for those authors who created single articles for DNB, and we could use it rather than creating a Contributions to DNB section. -- billinghurst (talk) 23:03, 25 September 2009 (UTC)

Amendments to {{DNB00}}[edit]

I have been making modifications to Template:DNB00

  • Added overprev and overnext which are used to overwrite the prev/next wikilinks. I envisage that these will only be used in the first and last biographies of a volume, and enables us to lead on or back.
  • Played with the wikipedia so that it doesn't give weird links to if the variable is left empty. I have removed some extra parameters that were there, though we can re-add these if they are still needed, though seemed to be an unused artefact from our stealing EB's header.
  • Placed a copy and paste-able version of the script on template, and some instructions on the Talk page.

Note that I have not applied this to other derivatives of the same template. If we find the changes successful, then we can look to update them.

I was also thinking that it may be useful to add an optional(?) volume which would take people back to the specific volume that they may have come from, or possibly allow them to navigate towards. Thoughts? -- billinghurst (talk) 13:06, 29 September 2009 (UTC)

Some template documentation might be good ... Charles Matthews (talk) 12:07, 1 October 2009 (UTC)
There is, see up ʌ -- billinghurst (talk) 11:26, 2 October 2009 (UTC)
Ah, I'm used to a convention like Template:DNB00/doc. Charles Matthews (talk) 12:02, 2 October 2009 (UTC)
Fair enough, and it is quite welcome to be moved once proofed. I had zillions of bits on the go, and it was easier at the time, and as in first draft to paste to talk page. billinghurst (talk)
  • Added volume, note that this needs to be double digit, which means 01, 02, ... 09, 10, ... 63 for the early volumes. Usage = | volume = XX . At the moment this just adds a volume link to the the header, though it will be used further.
-- billinghurst (talk) 09:54, 12 October 2009 (UTC)

Introducing Template:DNBset[edit]

As discussed above, we were looking at an templated means to typeset headers and transclude the relevant pages. I have coded {{DNBset}}, which should be used in a substituted form. In the easier form, if we are coding section ids with the same name use the code below. Further detail on template page. All you should need to do is complete the template and save.

{{subst:DNBset
 |article= 
 |previous= 
 |next= 
 |volume = 
 |wikipedia = 
 |extra_notes= 
 |from=
 |to=
 |section=    
}}

I have run a variety of testing, and it hasn't broken yet, though I wouldn't guarantee complete robustness. :-)

Feedback, comments, suggestions, all welcome. -- billinghurst (talk) 12:29, 12 October 2009 (UTC)

I can't speak for others; but I now always use the article name as the section id. (For one thing, unless you dab the section ids you will get strangeness in the transclusions, so two birds with one stone.) But perhaps it would be better not to get into that as a time-saver? I suspect we'll be moving articles quite a lot, eventually, as we get the titling conventions into a rigorous state. By the way, thanks for working on this. Charles Matthews (talk) 12:43, 12 October 2009 (UTC)
Welcome. It was designed so all articles still have the same output and use {{DNB00}} underneath the hood, so moving or whatever will be okay. I am sure that we will leave redirects anyway. AWB and Cats all will help us tidy underlying components, there just end up being a few to do.
Once we are settled on a preferred look, I will look to run SDrewthbot through and start tidying.-- billinghurst (talk) 13:56, 12 October 2009 (UTC)


Author field[edit]

Something that I have never asked, nor really thought about until now. I am wondering why we don't have and use the |author= field in DNB00. The field is actually there, it just isn't displayed, and yet we go through the process of adding the damn thing into notes. Doesn't really make sense to me. We could look to put it in, and even have it display in the same place by a little magic, or we can look to have it display as per normal articles.-- billinghurst (talk) 15:55, 13 October 2009 (UTC)

In the bigger context, the headers used by the Catholic Encyclopedia and 1911 Britannica projects are somewhat different (trying to do different things) - but not so very different. While obviously the page layout is to some extent determined by the original work, I see no reason why the header information shouldn't one day converge to a common style. The reader, after all, has similar expectations for the articles for these reference works. The CE style is to display the author field in the "blue" part, not the notes (even though it is typically then over-ridden by a generic message). The EB articles don't currently give authors. We do as a note. Certainly the whole issue can be reconsidered. Charles Matthews (talk) 17:51, 13 October 2009 (UTC)
When I started a separate header for the DNB articles I did adapt the EB header for the purpose. I would have added an author parameter to appear in the usual place, but that was beyond my coding abilities. If the field is actually there, it must have been added later. The headers for the CE were developed differently by someone else. The use of "multiple editors" is technically correct if we are referring to an encyclopedia as a whole. As long as the title in the header is that of the entire encyclopedia one could get the impression that an author listed immediately beneath it is the editor of the whole encyclopedia rather than the specific article. I absolutely agree that the author information would be better placed in the "green (blue?)" part.
For me this all relates back to some general questions about headers and their purpose. They are an important introduction to the material beneath them, and a way of connecting that article to the rest of Wikisource. For some others they are a place to store metadata about the page in the pursuit a more standardized relation with the outside world. In the long term such metadata will be important, but putting that into the headers may be demanding too much of headers for anything other than the simplest of situations. Eclecticology - the offended (talk) 02:25, 18 October 2009 (UTC)
Ah, that must have happened when we aligned {{DNB00}} with {{header}}. What I am pretty certain that I will be able to do is to call it | Contributor and align it with the author field as a sort of overlay. Thanks for the feedback. -- billinghurst (talk) 07:37, 18 October 2009 (UTC)
I see you have done a contributor field now in {{DNBset}}; but is the author field in {{DNB00}} (into which that feeds) rendering as intended as of right now? Charles Matthews (talk) 10:00, 18 October 2009 (UTC)
All set and useable. -- billinghurst (talk) 10:05, 18 October 2009 (UTC)

So for Negus, Francis (DNB00), just created, the Author:Thomas Seccombe finished up after the subst in the author field of {{DNB00}}, as must have been intended; but I have just moved it to extra_notes so I can see it rendered? Something I'm not understanding, or is this about how {{DNB00}} builds on {{header}}? Charles Matthews (talk) 10:19, 18 October 2009 (UTC) I actually edited the template page in the end - seems to work as intended now. Charles Matthews (talk) 08:24, 25 October 2009 (UTC)

{{DNB00}} imports components of {{header}}. I have called it contributor replacing |author= . It is a cosmetic change though sends the message that we discussed.
I subst:DNBset and I use | contributor = (NOT author parameter), as per the instructions that MattBr copied to the documentation page. Been working fine for me.-- billinghurst (talk) 09:24, 25 October 2009 (UTC)

Sums of money[edit]

Seeing this one quite a lot. The DNB style is to write £100 as 100l., so that's an italic-l standing for pounds sterling. The scans very often make that 100/. instead (i.e. slash looking very like italic-l). Where you'd expect to see the slash in pre-decimal coinage UK money is for shillings, as in 6/- for six shillings, 6/8 for "six shillings and eightpence". But in the DNB this would always be styled as 6s. 8d., I think. Charles Matthews (talk) 08:24, 25 October 2009 (UTC)

Yep, though not sure what you are indicating.-- billinghurst (talk) 09:15, 25 October 2009 (UTC)
Gotcha. Not having dealt with funny money, I hadn't been confused and TWIS'd. It would be nice if there was a special symbol to use for this lira derivative. billinghurst (talk) 12:40, 25 October 2009 (UTC)
Closest that I can find would be ɭ billinghurst (talk)

Articles "most needed" on enWP[edit]

I'd like to construct a listing for articles that are (a) not yet created here and (b) have WP articles showing the w:Template:DNB indicating the use of DNB text. Easy enough to take backlinks to the template page there. From the point of view of having a sensible list for checking, it would be best to order those backlinks in order of their "defaultsort" so we get a best-guess at the DNB article order. Any bright ideas at automation? Charles Matthews (talk) 13:41, 26 October 2009 (UTC)

While I think that I know what you are asking, can you give some specific examples of where the existing page is, and what you want to have done to it. There is bits that I can do with AWB about preparing lists, there is probably some tricks that someone with access to the databases can do better with some comparisons. The issue will partly be the differing naming systems, and until we have some examples the complexity of the task is hard to envisage. billinghurst (talk) 04:54, 27 October 2009 (UTC)
OK, what I did yesterday was by hand, and resulted in a dozen additions to Wikisource:WikiProject DNB/Most wanted articles, which is hardly used. I checked the first few enWP pages that show up as linking to the Wikipedia template page, w:Template:DNB, and looked to see whether there was a Wikisource article to which the WP page should be linking through that template. I found a few, but those where the relevant article here had not yet been created I listed alphabetically. All I'm asking for is a way of generating more automatically a sorted list of those backlinks, with the aim of dividing it in time by the 63 volumes. The best guess at A-Z order by surname would be to use the defaultsort as on the pages. The backlinks now are in the range 1000 to 1500, so it is passing out of the sort of scale one wants to go through by hand; and in the future the numbers will be bigger. It looks to me (Magnus presumably knows all these things) that the WP backlinks do get sorted, but only as a very low priority task. So one way would to be to ask whether they could be sorted for us by someone with the access to request that. Another way would be to ask Magnus for a tool to do that job, assuming it's relatively trivial in his terms. Charles Matthews (talk) 10:23, 27 October 2009 (UTC)
AWB can do some simple things, let me see what I can quickly scoop from both sides. It isn't going to be scientific, though it can do some broad scope matters, like show me the files in this category, and then tell me where they have, or do NOT have whichever the case, Template:YADDADA.
There are also tags that we can add to templates if fields are left empty, eg. at our end, we have the Category:DNB No WP. It is a matter of thinking through the logic that we wish to apply and to where. -- billinghurst (talk) 12:05, 27 October 2009 (UTC)
Is this the sort of data that you were after? ... Wikisource:WikiProject DNB/Data capture billinghurst (talk) 13:41, 27 October 2009 (UTC)
Thanks, yes, a listing to show where the links back might exist. I have ticked a couple that I put in immediately, but it turns out that {{tick}} doesn't exist here yet. One of my personal favourites. Charles Matthews (talk) 15:57, 27 October 2009 (UTC).
We have similar as {{done}} Yes check.svg Done billinghurst (talk) 21:07, 27 October 2009 (UTC)
... the links back might exist ...

—Charles Matthews

Do you mean from WS to WP? This doesn't do Category:DNB No WP? billinghurst (talk) 21:07, 27 October 2009 (UTC)
No, cross-purposes I think. 'Back' in the sense that someone has placed DNB text on WP without linking to the DNB article in its natural habitat over here. Charles Matthews (talk) 22:41, 27 October 2009 (UTC)


Image or images that epitomise the DNB project[edit]

Is anyone able to think of an image or some types of images that they think epitomise DNB/DNB project? Preferably something on Commons. I had thought that some of the Vanity Fair images give some aspect to what we are doing. -- billinghurst (talk) 02:09, 1 November 2009 (UTC)

To start with the obvious, File:George Smith by John Collier.jpg on Commons is the publisher, atmospheric in Victorian whiskers. But I think you want something more like a logo? Charles Matthews (talk) 09:16, 1 November 2009 (UTC)
No, I wasn't thinking logo, I was thinking emblematic, something characteristic and representative of the work and its time. Not a bad start. --billinghurst (talk) 09:46, 1 November 2009 (UTC)
Producing ... Here for comment billinghurst (talk) 12:35, 1 November 2009 (UTC)

George Smith by John Collier.jpg

Wikisource has a number of active Wikiprojects that could use
your help in tackling these large additions to our library.


Dictionary of National Biography Project
Work: Dictionary of National Biography


Working over the Project's pages[edit]

I have been updating the main WikiProject page, which needed it, and have placed the Style Manual at Wikisource:WikiProject DNB/Style Manual. Manuals of style tend to grow, and to be a bit contentious at the margins. Watchlist it, and bring up stuff on the Talk page there that needs hammering out: there are quite a number of grey areas still, and all I'm trying to do as of late 2009 is to bring into focus what we currently (some of us) do. Charles Matthews (talk) 16:26, 12 November 2009 (UTC)

Fantastic. We are building a nice bit of momentum, and it would be great to hear the opinions and thoughts of our recent newcomers. Welcome to those who have joined and are watching. billinghurst (talk) 19:37, 12 November 2009 (UTC)


Stats to see in the New Year[edit]

I have updated Wikisource:WikiProject DNB/Statistics: we have just passed the milestone of 2000 articles.

I think we could venture to discuss collaborations within this project. One that has been set up is at Wikisource:Magna Carta. There are obviously other ideas, based around taking a volume, a letter than could be finished quickly, or some further topics. (There is the intention to create topical lists of redlinks on WP, but there is still a great deal to do with the "first pass" checking over there, before sorting out the redlinks.) Charles Matthews (talk) 10:22, 1 January 2010 (UTC)


Category split at Wikipedia[edit]

I have again been bold, and perhaps even stupider. Over at Wikipedia, I changed w:Template:DNB so that it no longer directly includes WP articles into w:Category:Wikipedia articles incorporating text from the Dictionary of National Biography. Instead, it now includes an article into one of two subcategories:

In my opinion, this is intrinsically good, but on sober(?!) reflection, I am now concerned that it will break some of the valuable internal tools that the project is currently using. Can someone competent please check that my modification basically works? Can someone even more competent please assess the damage to our tools and whether or not it is easily repairable? In the abstract, my changes make things easier, but in practice, they may force changes that would otherwise be unnecessary. I raise this here rather than at WP because the changes affect the WS project more than the WP project. Note: it will take awhile for WP to actually update all of the articles to reflect this change. -Arch dude (talk) 02:04, 24 January 2010 (UTC)

The content of the "without" category should come to match (largely) the list of unticked at Wikisource:WikiProject DNB/Data capture, except that some new instances of {{DNB}} will have been added since the data was captured. As you rightly say, WP takes time to populate categories in this kind of case, because some low priority is given for server time allocated to updates of templates. So in a few days, maybe, the categories will reflect better the actual situation over there.
I would then find the "without" category particularly useful (because it is alphabetical, hence easily mapped onto the 63 volumes, so that WS links that can simply be added right now will be much more apparent). So I hope you are wrong about the possible unintended side-effects. If those are not serious or absent, the same idea ought to be ported to EB 1911 and Catholic Encyclopedia, to give some momentum to WS referencing as a whole. So thank you for the innovation. Charles Matthews (talk) 09:27, 24 January 2010 (UTC)

Volume 1[edit]

Is nearly done. The two remaining articles are long ones on royals: Prince Albert and Queen Anne. I'll get round to these. Those who like working from the beginning should probably look at the A's in volume 2. Charles Matthews (talk) 10:25, 24 May 2010 (UTC)

A big cheer on Volume I, then. Nice push, last time I looked there were still 80-some pages to transcribe. MLauba (talk) 10:56, 26 May 2010 (UTC)
To facilitate progress, I shall copy the transcriptions of missing pages onto Wikisource as subpages of User:Longfellow/DNB. Please note that these pages may lack some formatting, and may be slightly revised from the original printing. Thus they need to be checked against the DNB djvus. However, they will save time when using the djvus.--Longfellow (talk) 14:37, 31 May 2010 (UTC)
Thanks. We could think about a mechanism for an intra-project "collaboration of the month" or suchlike: getting all our ducks in a row (ToCs, author listings, and good text) for some definite objective. Charles Matthews (talk) 20:08, 31 May 2010 (UTC)


{{DNB lkpl}} given a second parameter[edit]

I have just now edited a second parameter to the template {{DNB lkpl}}. The second parameter is optional, and has been undertaken to allow the showing of names in the body text format, eg.

I hope that is of use, and I probably should have done it earlier, I just hadn't given my full attention when doing links in ye olde fashionne. — billinghurst sDrewth 15:08, 31 May 2010 (UTC)

Using categories on talk pages to signify requirements[edit]

(I am obviously getting obsessive <g>) I have used {{textinfo}} at Talk:Waddilove, Robert Darley (DNB00) for a look-see. Plus I am wondering on the potential usefulness of adding categories (temporarily) to the article talk pages to signify work required, eg. __Category:DNB talk qv links__ and __Category:DNB author pending__ -- Billinghurst (talk) 16:04, 9 September 2008 (UTC)

There are plenty of challenges in these questions, not the least is the availability of manpower, and how much work does each person want to do. I even try to ask myself why I feel so disinclined to use {{textinfo}}. The principles behind that template are fine, but I can't approach it without a feeling of annoyance. At the same time I find it quite acceptable to spend a considerable amount of time in detailed proofreading, or in trying to track down basic information about authors.
I have been mulling over these questions as I have been sitting doing bits. I have added them, and for lack of anything better, I have been adding them. An old version of ShortKeys application makes the text addition reasonably simple, for me, though, I emphasis 'for me'. I think that all I can hope for is that {{textinfo}} will become more useful in time.
The "for me" is not always appreciated by some of our colleagues. I too tend toward idiosyncratic work habits. If a technique works satisfactorily for the contributing individual, externally imposed efficiencies will be counterproductive.
Manpower. Well, I think that for transcriptions that I can find that. The genealogical community has been generous in their time for the right projects. The issue with that community can be more about the wiki learning curve. I have a plan, and just doing other groundwork first. It then might become more about management.
The genealogical community is a big untapped resource for wikis. There was a time when it was sufficient to know only a small handful of markups before a person could edit effectively. Now, the abundance of templates and other technical tricks scares people away. If in the course of recruiting from that community you can convince them that all the markup they will really need can fit on a single printed page that campaign will be successful. The experienced ones among us then need to step in when the techies begin insisting on fancy proceedures.
I have a 'secret' there and it is very non-Wiki. I am going to give them a choice of doing it in-wiki or in-mailing list. If they can transcribe they can post it to a list, and they can be copied from there and pasted here. That bit is ready to go. Many genies have been on the web and utilising lists for years. -- billinghurst (talk) 06:53, 18 September 2008 (UTC)
I too have some concerns about categories on the talk pages. We want volunteers to know what needs to be done, but these categories and "textinfo" are more likely to be put where the work has already been done instead of where it's needed. They can't go on articles that don't exist. I've been putting a "#" by the pages listed on the volume pages to show that it has been proofread. Is this useful?
As I am only transcribing at this point in time, the only answer that I can give is probably. Backend solutions would seen preferable, it just seems that most of this communities vision is focused on the output, than the outcome.
Outcome requires more farsightedness than output. I'll keep thinking about this problem.
The "q.v" entries are a good places to start adding cross-references but I don't think that those will show their full value until we have a much greater proportion of articles written. Some people are mentioned in articles without a "q.v." Thomas Falconer is one mentioned in the Waddilove article. The articles in the earlier volumes hardly used them at all. I skimmed through the long articles on Queen Anne and Francis Bacon, and didn't notice any. I'll continue with anticipatory links, but there are probably many more that could be in there.
Nice feedback, I hadn't noticed. They are the navigation constructs only, so I feel happy that we can do as and when. Similarly, I have been adding them, though feel that the most important aspect is the actual text.
I think that our current thoughts are about setting up the underlying framework, so that others can plug n play as required.
We agree on both points.
One way to judge the utility of techniques is the extent to which the technique's inventor actually uses it. Keep up the good work. We do have people who have these brilliant ideas, but never use them. Eclecticology (talk) 07:16, 15 September 2008 (UTC)
Plethora of tools, just many that don't do the right job or not in the right place. I probably could go and learn to make the tools (BTDT at RootsWeb), however, on this project, I was more looking to be front end minion rather than back end semi-genius. :-)
-- Billinghurst (talk) 01:45, 18 September 2008 (UTC)
Indeed, we occasionally need to be rescued from genius. Eclecticology (talk) 06:33, 18 September 2008 (UTC)
Back to the coalface.

Variation between versions[edit]

The very useful material from Internet Archives is based on the first (63 + 3 volume) edition of the DNB. For proofreading I use my dead-tree version of the 22 volume reprint, which has incorporated the 1904 errata. This causes me some concern when the two versions differ. Thus in Brougham, Henry Peter (DNB00), what originally read

"As Lord Cleveland (Darlington) went over to the tories, Brougham felt bound in 1830 to vacate his seat for Winchelsea, and accordingly accepted the offer of the Duke of Devonshire to return him for Knaresborough."

became

"Brougham in 1830 vacated his seat for Winchelsea, the borough of the earl of Darlington (created Marquis of Cleveland in 1827), and accepted the offer of the Duke of Devonshire to return him for Knaresborough."

Or in another situation "1805" was simply changed to "1806". For one who would prefer precise reporting, these situations are maddening, and attempting to document them could take considerable time. Such attention to detail could severely limit the amount of material that could be included in Wikisource in the near future.

My own preference is to stick with the more recent version with the hope that at some point we will also add the 1904 errata. The purpose of those errata was, of course, to correct errors. Comments? Eclecticology (talk) 19:09, 6 October 2008 (UTC)

I'm very much against using any version later than the original 63-volumes published from 1885 to 1900 for this particular Wikisource. I feel that you should create another parallel Wikisource with a different title (e.g., "Dictionary of National Biography, 1904") in which you will be free to point back to the articles with (DNB00) suffixes when there are no differences, and you are free to replace the (DNB00) articles with (DNB04) articles as needed. We can also link forward to the (DNB04) (or later) articles from the "see also" section of the (DNB00) article header, since the header does not purport to exactly reflect the source. Furthermore, we can add a section to the "notes on reading the DNB" article to recommend that users who are looking for information rather than exact sources should start from the latest NB that we have: Indeed we can create a synthetic "Dictionary of National Biography" (with no date) as the front project. At an absolute minimum, if we intend to use the existing project as the synthesized DNB, then we need to do three things:
  • change the project name
  • add a note to the front matter of the project to explain that the project has mixed sources
  • explicitly provide the provenance for each individual article.
As I see it, there are at least two ways forward that can accommodate our competing goals:
  • Create multiple separate top-down sources (including a synthetic "best" source) within the existing project.
  • Fork the project.
If we cannot agree to preserve the original unmodified DNB00 source as a distinct source within the existing project, I will create a separate project for this even if I must do this, I would still happily contribute the this existing project also. However, I feel that it is better to continue to work as a single project team to capture all of the versions, each as a separate top-level source. Now that we have a worked example, I can create a each separate source superstructure in about an hour. Which sources do we need in addition to the existing 1885-1900 and 1904? -Arch dude (talk) 12:57, 7 October 2008 (UTC)

Clarity of principles. I think that we need to decide what is the desired outcome and the reason for it. Reflect on why. After that we can decide on how to get to the goal. I don't want to get into a bunfight over this or that, or have a view on on whether either is right or wrong. I simply wish to make the information available, and in the clearest and most factual means.
-- ab

So ... Principles

  • Make available, in a searchable text form, the DNB's information
  • To present historical data as accurately as possible

(add more ...) The point of difference seems to be

  • Identical reproduction of DNB00

OR

  • Reproduction to a later corrected DNBXX

I would like to consider the intent of the original authors in our project, and what would they have considered if they had the web available to them. I think that they would say correct information. The genealogist in me has concerns of propagation of incorrect data, even if it is available corrected on another page. The ongoing reproduction of incorrect data is a nightmare. So I am disinclined to multiple pages for the same person for the same publication.

Practically, the three of us are not going to get through DNB alone, and it seems that we are getting particularly fussy over first ed, or corrected ed. Others (hopefully) are going to come and type, and they may have whichever source available. I am not wishing to confuse the helper with the intricacies of this ed, that ed. Isn't it a matter of noting which ed., and if there is a difference just highlighting the difference?

A proposed solution

  • People type from whichever version they have available
  • The DNB00 version remains the base text
  • We proof against available version, if there is any differentiation from text in a later version, it is annotated in the article and the corrected (differing) part of the article presented on the same page in a classy fashion to indicate, that it is the predominant fact and where the errata occurred.

If we don't have a simple means to annotate, then can we import one? If it is going to take time, then we can use the Talk page to park the correction in Textinfo and get back to it when able to.

If we end up with articles with a major rewrite, or major difference between two versions, then how about we deal with those as they come up with the governing principles. -- billinghurst (talk) 23:06, 13 October 2008 (UTC)

Thanks for your comments. It should be pointed out that the errata volume is 300 pages long, and since the corrections for each of the 66 volumes always start on a right-hand page this results in a number of blank pages in those 300, and an average of four pages of corrections per DNB volume. Few of the corrections are more than two sentences long. Most frequently only a single word is changed in an article. The majority of articles remain completely unchanged.
I agree with using annotations or footnotes. When I began working on this project I scanned articles from my paper copies, and put them through OCR software. Now I find it much more efficient to use the the OCR version from Internet Archives. That version used the original edition, but for proofreading it is still easier to work with my on-paper reprint edition. The OCR from Internet Archives has fewer OCR errors than my scans, but it is not exempt from them. Nevertheless, since the errata volume is not straight text, but more of a tabular list of corrections, the OCR version of that is more difficult to work with.
Distinguishing between errata, OCR errors, or human typos during proofreading makes that task far more time consuming than it should be. The most practical time for acknowledging the errata may be when all else in a volume has been entered, at which time it would be much easier to go through all the errata pages for that volume. Eclecticology (talk) 04:35, 14 October 2008 (UTC)


Thanks Ec. Ouch, a lot of pages. I have never had the pleasure of seeing the paper version in any means. Even with proofreading, I still bet that we will have more errors from transcription or OCR even with checking. I hope that we can reach a practical solution to the issue that meets all our needs. :-) -- billinghurst (talk) 10:45, 14 October 2008 (UTC)
It looks as though we are making progress towards defining project goals. My personal project goal remains the same: I want a faithful reproduction in Wikisource from of the exact contents of the original 63 volumes. Fortunately, it appears that this goal is not hugely incompatible with the goal of the other two participants: we just need to decide how to meet my goal while also meeting the other goals. My primary reason for my goal is esthetic: I see it as being very simple and objective. However, there is major practical reason also, which is that I'm fairly sure that there is a large project over at Project Gutenberg to produce these articles from the original 63 volumes. If we get the correct infrastructure in place to accomodate these as a base, then we can build on the PG work once it is available to create the other desired forms for these texts. Note particularly that I do not disagree woth the other goals: I am perfectly happy to create a new front project that we will encourage the casual user to use. this front project should be called "the DNB," and its pages should be the most recent or "best" pages we have for each article. In many cases these will be the DNB00 pages, but in other cases they will be updated pages. However, we should also retain a "hidden" DNB00 project, which will be accessable to anyhone who desperately wants to see the original. Each of these pages can also forward-link to later updated pages.one nice thing about this approach is that we can dump the PG project into the "hidden" DNB00 and then update it at our leisure to produce the "updated DNB" "view" of the material. If this is acceptable, then adding an article proceeds as follows:
1. add the article to the DNB00 project in completely unmodified form
2. add the article's title to the appropriate DNB00 Vol table of contents.
3. see if there are 1904 corrections. If so, copy the article to create a DNB04 version: --e.g., copy "Harvey Smedlap (DNB00)" to "Harvey Smedlap (DNB04)"
4. add the article (or the 1904 version if it exists)to the DNB04 Vol TOC.
5. if you created a newer copy, add a pointer to the newer copy from the older copy and add a pointer to the older copy fromn the new copy.
6. repeat steps 3-5 for each newer version.
This scheme requires a new mainpage and new vol pages for each version. It's not elegant, but it's fairly straightforward. This scheme does not by itself aid users to see how an article evolves from version to version. It that is an important goal, we will need to be more creative. -Arch dude (talk) 22:35, 14 October 2008 (UTC)
That looks like a lot of work, and it may take a while for our current large horde of contributors to complete it. Look at Brereton, Thomas (1782-1832) (DNB00). In dealing with the 1831 Bristol riots the original edition referred in two places to the scene at the more intuitive name "Queen's Square", which could undoubtedly satisfy many readers who don't know anything about the local geography of Bristol. The 1904 errata, possibly reflecting the indignation of some Bristol resident, indicated that this should be corrected to "Queen Square" That was the only correction made to the article. Does this merit a whole new page just to deal with this one? I have documented the change with a footnote. Could this not be enough?
I should add that the 1904 publication was nothing more than a volume of errata. It did not include full texts of the corrected material. Instead, this corrections were incorporated when the work was first reprinted in 1911. Where more than a word was at issue additional rephrasing to make the corrections fit was required. To respect the pagination, additions to the text could not increase the number of lines on a page, and some rewording became necessary. Eclecticology (talk) 08:05, 15 October 2008 (UTC)
See also my treatment of removed material at Bradfield, Henry Joseph Steele (DNB00). Eclecticology (talk) 16:55, 16 October 2008 (UTC)

Milestone[edit]

User:Magnus Manske "left" a tool for the project on my usertalk: just so everyone knows, it's here. The headline statistic is the project has passed 1000 articles created. A bit more than a drop in the ocean, really. Charles Matthews (talk) 20:31, 21 August 2009 (UTC)

bonzer billinghurst (talk) 08:00, 7 September 2009 (UTC)

Our vol. 28 is not a good copy[edit]

Howard - Inglethorpe is in very poor condition. Pages missing, and the OCR of the pages fails miserably. :-( -- billinghurst (talk) 13:09, 19 September 2009 (UTC)

Looking at archive.org, there is
Yes check.svg Done New version of this volume has been implemented. -- billinghurst (talk) 23:04, 25 September 2009 (UTC)

Alignment issues: BIG statement[edit]

After having pummelling on poor ThomasV unmercifully following the upgrade to ProofreadPage, I now have a lot better understanding of page scans and the like.

  1. I have purged (?action=purge) all the File:s so that the text layers should now be available from the respective Index: pages. Didn't realise that it was a solution to the layer not appearing.
  2. Where the text is out of alignment with volumes, it will be due to reloaded data files. A couple of solutions.
    1. Delete the bad Page, and then touch it from the Index: and on the recreation of the page, the original text layer from the DjVu file will be imported.
    2. Copy and paste the text to align.
  3. Matt is playing with the original TIFF files, and would be prepared to regenerate DjVu files, probably upon request — not fully resolved at this stage. This will give us the opportunity to take a step back and look at the problems facing individual volumes. We would then need to look at how we wish to save existing work, and allow for replacements.
  4. I have worked out that we can load replacement .djvu files at WS, do the work on the front end as a local file takes precedence, and then move them over to Commons once we are happy with the condition. [Note that we need to have the Commons version deleted prior to a move.]

Charles has done a fair amount of work on existing volumes, and I am wondering on the means to work out what we need to do to each volume to get it up to speed. Do we need to have a page that reports on the condition and status of the volumes? Charles you are more around this aspect!

There was something more that I was sure that I was going to address, however, it escapes me.-- billinghurst (talk) 03:16, 27 September 2009 (UTC)

I have started a survey of progress for adding the text, at User:Charles Matthews/DNBProgress. I suppose I should finish that, and move it out into project space, so we have a single page addressing general progress. I also have a section on my usertalk where I record glitches in the djvu sequences. There are a fair number of these. I think we need some sense of priorities. Where there is good-quality text to add, it makes more sense to be fussing about getting the glitches straightened out, since certainly bad scans (where there is no alternative) aren't conducive to getting to work (unless like me you really want a given article right now). The scans I mark "good" are the Toronto scans, which (as a rule of thumb) are much superior to the Google scans. Charles Matthews (talk)
With the recent mediawiki update, a fix has been put in place to the issue of certain djvu files not purging and updating the under text layer. This is an issue that I know that existed for some of our existing DjVy files, though which I do not remember at this point. Anyway, if you see that the text and images are out of alignment, try again to purge the File: version and see if it updates back in the Page: namespace. — billinghurst sDrewth 16:15, 22 April 2010 (UTC)

Transcluding to main namespace[edit]

Traditionally we have basically #LST'd text in with {{#section:...}} tag, and not undertaken any formatting of the body of the work. Things have progressed with transclusion, and I wondered whether it was time to review our means of formatting for body of works, and how we transclude.

I would like to propose that we look to put our body inside

  • <div class=indented-page>

which gives a left margin and that we look to transclude with a means that includes page numbering, ideally the new nomenclature

  • <pages index="Dictionary of National Biography volume 08.djvu" from=25 to=26 fromsection="Bury, Arthur" tosection="Bury, Arthur" />.

which would give a page that looks like Bury, Arthur (DNB00)

If we think that even that class formats too wide, we can look to customise further, and specify our own class specifically. -- billinghurst (talk) 10:28, 11 October 2009 (UTC)

The new way of doing it does still create a transclusion backlink from Page:Dictionary of National Biography volume 08.djvu/25 - one of my concerns, given the amount of maintenance still to do and need to cross-check side-effects. It would make my life marginally harder (a few keystrokes) from a navigational point of view, since I like to start from "Page:Dictionary of National Biography volume 08.djvu/25" or whatever in finding nearby pages I need; this is part of opening up the text, and is a temporary issue, you could say. I was wondering if the whole transclusion business could now be made part of Template:DNB00. Charles Matthews (talk) 13:28, 11 October 2009 (UTC)
We could look to make it part of {{DNB00}}. The additional data is pretty standard, about the only remembering issue will be 01,02,03... for initial vols. rather than 1,2,3. Let me have a go. If we do it that way, it becomes even easier, though I think that I will look to subst: most of these additional variables. -- billinghurst (talk) 14:02, 11 October 2009 (UTC)

"DNB Archive" on ODNB[edit]

There has been a little recent discussion (not here) of the status of the text available on the ODNB website and called "DNB Archive", purporting to be the original article text from the first DNB edition. Having tried this out, and paid close attention, I found on the second article I tried a small difference from the djvu (it was the expansion of an abbreviation, not an erratum). Using this text is therefore a time-saver where the scans are bad, certainly, but this is something that still requires detailed proof-reading and is certainly not a total short-cut. Charles Matthews (talk) 16:32, 26 October 2009 (UTC)

I would presume that we are marking those shitty pages as problematic in the PAGE environment, so we can look to replace the pages individually or the volume collectively. That gives us the scope to then at least proof against the proper version. billinghurst (talk) 05:04, 27 October 2009 (UTC)
OK, that sounds like a plan. There are actually a number of cases that come up: you can have the djvu legible enough to use to proof text, but the scanned text is simply very corrupt, in which case this other source of text is very useful; the scan has interleaved lines caused by imperfect scanning across the two columns, ditto; and then there is the case where the djvu is effectively illegible. Where there is no replacement PDF download in the last of those cases we are still apparently stuck, but it would certainly be helpful to accumulate information on which those are, so that those with access to physical copies can know where we most need that help. I came across one such page in the Hobbes article yesterday. Charles Matthews (talk) 10:08, 27 October 2009 (UTC)
I can get access to jpg images of the 23 consolidated volumes from the 20s. Not perfect, however, may be better than nothing. Just need to mark it as such. billinghurst (talk) 11:58, 27 October 2009 (UTC)

Author page population[edit]

There is a good way to progress this side, and I thought I'd note some caveats. First, it seemed too good to be true that the ODNB site would list all contributions to the DNB by a given author; and, duly, it apparently gives about 30% of articles, representing I suppose those articles that were revised rather than rewritten from scratch. So there are 400-odd for Author:Thompson Cooper, rather than 1422 (some say 1423, no matter). Still it seems that this is all worth having. Secondly, it is impossible to guarantee disambiguation on the first pass (without time-consuming checking), and even the spelling of article titles may have changed in some cases. (In the longer term a reference list on WP will note authors by initials for the articles, and the volume lists here will be properly dabbed, and so confusion will be averted.) Charles Matthews (talk) 17:14, 29 October 2009 (UTC)

Alignment problems in vol.35[edit]

Not sure if this is the right place to bring this, and also whether somebody may have already flagged this elsewhere, but there are alignment problems in volume 35 between the OCRs and the image scans. Around page 300, they're off by about 6. Page 299 [Page:Dictionary of National Biography volume 35.djvu/305] and page 305 [2] have been cut & pasted to match the images; the corresponding OCRs were previously on the screens with the images for page 305 and 311. Presumably, somebody has a bot that can fix this? Jheald (talk) 15:43, 3 November 2009 (UTC)

It's the right discussion page: these things are logged on Wikisource:WikiProject DNB/Progress. Charles Matthews (talk) 16:20, 3 November 2009 (UTC)

Greek letters?[edit]

I recall that I had a major problem figuring out how to do greek letters properly, and I do not think I ever really got it right. can soemone please add a guide on editing greek letters? -Arch dude (talk) 20:40, 3 January 2010 (UTC)

I am no expert either, though find this page useful http://www.fileformat.info/info/unicode/block/greek_and_coptic/images.htm or if you use Firefox, there is the add- on International Sideboard
Either of these works:
(a) go to the menu below the editing box, and where it says "Select" go down to Greek. This pulls up a whole listing of Greek letters, with full range of accents and breathings, iota subscripts.
(b) do it in a text editor that has symbols with the full range, paste in.
The advantage of (a) is clearly that it is convenient as you go along. In practical terms it is slightly annoying for any passage of more than a word or so, because when you add say α you flick back up to the prompt in the edit box, needing to scroll down for the next letter. So for longer passages I do a separate pass in a text editor. (It can be hard to figure out the exact text with accents when the scan is bad. Two work-arounds are these: Google to check that the thing you've typed is real Greek - snag is that Google is sensitive to the accents in Greek, so if you have a wrong accent nothing may show up; with access to ODNB text you will get a correct romanization, which is a good starting point. Neither is quite failproof.) Charles Matthews (talk) 09:32, 4 January 2010 (UTC)
Or ask Zyephyrus ever so nicely if he would be so kind to give it the once over. He does Greek. billinghurst (talk) 15:34, 4 January 2010 (UTC)
Another potential snag: It is, I believe, possible for the djvu to display one character as another, particularly if the scan is poor. A jpg image, such as those in the online viewer (found at archive.org's pages as "Read Online"), would be a better source in some circumstances. I will mention that other file types are also available, if only to justify the following: 'Be aware of gifs bearing Greek' — Cygnis insignis (talk) 16:30, 4 January 2010 (UTC)
Yes, some of the djvus we have are scarcely legible. (My O-level in Ancient Greek would undoubtedly be trumped by any native speakers, were there any; but I do have a clue). Charles Matthews (talk) 17:00, 4 January 2010 (UTC)

Proposal: Add Volume index "articles"[edit]

I propose to add all 63 volume index articles. This will take me approximately forever.

I intend to start with the index for volume 1, and use it as an experiment. My approach will be to first proofread the scanned pages in page space, and then create an "article" in article space, just as we do with biographical articles. There are two major differences:

  • There are a huge number of mainspace links (i.e., one per line!)
  • There are a huge number of pagespace links (i.e., also one per line.)

I intend to link to the resulting "article" from the volume page. The intent is to give readers access to the raw images for articles that we have not yet converted. Please comment if you think this is a bad approach. Otherwise, please prepare to comment after I complete the experiment with the first volume.

If this experiment yields a positive result, I intend to then quickly add a "stub" index article for each of the 63 volumes. The stub will link to the "raw" image of the index and to the djuv index page, and will give instructions on how to find an article based on the offset. This is simply a restructuring of the information you guys have already done. The intent is to allow the casual reader to easily find an unconverted article in page space. I will then gradually replace these stub pages with proper index pages. -Arch dude (talk) 02:03, 10 January 2010 (UTC)

I think it is controversial to have text that is not proofread available in the main namespace. Pagespace is for that. But let me work with your idea a little. With the state-of-the-art transclusion method, how best to set up "dummy" articles, in the main namespace? If the point is this, that the reader can access the djvus for an article from the volume ToCs, I think this works: sectionise and transclude as usual, but comment out text that is not proofread by applying <!-- --> opposite the djvus. Then, as far as I can see, dummy articles will present themselves simply as the small page numbers from the left margin. The reader can click on any page to navigate to the DNB text as image. This would allow the mass creation of articles that are of some use. Trivially there can be a template on the page of any dummy article explaining that "text for this DNB article that is proofread is not yet available, but you can read the original pages". It is also quite attractive to think that we could complete the author listings with such dummy articles, just flagging them there in some way. This approach would seem to address a point Magnus was making, about separating "listing" and "proofing" as activities: proofing would go on at its own pace.
A further comment about volume 1 is that this is one of the volumes with only a "bad" scan at archive.org - one of the tough nuts. My intention is to fill it in myself, since I'm not dependent on that scan (and generally to give priority of the 25% of volumes in the same state). So I'd suggest a different starting point, e.g. vol. 7 since we have ToCs up already for the first six volumes, and moving ahead there would be all plus. Charles Matthews (talk) 09:51, 10 January 2010 (UTC)
My idea is to give readers access to the photocopied page images, not the non-proofread OCRs. The fact that this also gives access to the non-proofead OCRs is an unavoidable consequence of our chosen display method. After all, What would the reader do if we did not have such a navigational feature? the reader would need to go to the non-Wiksource images, probably the ones that we reccommend. My idea is to have a link from the VOL article to an "image description" article, that tells the reader where to fined the images and specifically the original index pages within the images and that describes how to navigate and what to expect. This provides the reader with a much better access to the images that they can get at the original locations at e.g. archive.org. Remember that our goal isto make these sources available to readers, and our project has a much longer time to completion than most, so a worthwhile interim step is to provide easy access to the images. -Arch dude (talk) 17:03, 10 January 2010 (UTC)
I suspect there is a way to generate pages such as you want by transclusion of some flavour. My interest is in having such pages that satisfy a specification that does what you want (index the scanned pages for article A), and also what I'd want to do the "packaging" work for article A: namely provide a page pre-A that is (i) linked reciprocally with the author page, (ii) linked reciprocally with the volume ToC, (iii) carries the previous and next page information as links. I hope we are not talking past each other. Your interest in pre-A is to link to the scanned pages involved with A, my interest is in what you could call header information residing on pre-A. I'm thinking of pre-A as a page that is easily upgraded to A by whoever proofreads the text opposite the scanned pages. Rather than "image description", which is going to be an overloaded term in discussions anyway, I'm just thinking of the small numbers in the left-hand margin that our new-style transcluded articles have, without any text actually transcluded. Charles Matthews (talk) 17:31, 23 January 2010 (UTC)
I finally read this carefully enough to understand your idea. Yes, we are talking past each other slightly. My intent was to create a total of 63 articles ("source description pages") that give the casual reader some hints as to how to find the source for an un-transcribed article. I have now started this (see below). My idea would then follow through with an additional 63 "index" articles, each of which is a transcription of the index pages of a DNB volume. Your idea is to create 27,000+ "empty" articles that each point to the correct image pages. I do not think that these ideas are incompatible. I can build my "source description pages" very quickly, and these can be used as a very crude stopgap. The index articles and the "empty" articles are both major efforts. The justification for the index articles is that they will eventually be needed to complete the transcription effort, since the index pages are after all part of the original source. The "empty" articles are worthwhile because they complete the ToCs and provide a consistent in-place article naming system. -Arch dude (talk) 02:27, 24 January 2010 (UTC)

Index navigation template[edit]

For those who wish to readily access the Indexes, I have created {{DNB indexes}} which does this

. I will be putting it into the new VOLUMES field in the Index: namespace for each work. I am considering having a hover to display the names in the vol. but haven't gone further yet. If you think that it adds value, please get back to me. NOTE that if you use it anywhere BUT the index pages, then it should be used like {{DNB indexes|category=}}billinghurst (talk) 02:45, 10 January 2010 (UTC)

Help Ashley, John (DNB00)[edit]

Can get this fellow, Ashley, John (DNB00) to behave properly, some help would be appreciated. --P. S. Burton (talk) 16:00, 7 September 2010 (UTC)

I renamed the section to s4 (section 4), dif and comment, and reckon this style should be used unless there is a compelling reason to use the name. The user has to visit the page anyway, to see what form the name as label has taken. cygnis insignis 16:31, 7 September 2010 (UTC)
Officially we should have tags wrapped inside quotes, eg. <section begin="Ashley, John James"/> so it is explicit, whereas here we had a case of foreshortened and ambiguous labelling (and for which I am equally guilty). — billinghurst sDrewth 17:02, 7 September 2010 (UTC)
In answer to Cyg, we have aligned the tag name with the article so we can more easily transclude names. Useful practice and one that I have used for a number of biographical works when you can be proofing and transcluding at very different times and after intervals. — billinghurst sDrewth 17:06, 7 September 2010 (UTC)

Acquiring a better scan?[edit]

If we encounter a few poorly-scanned pages, is it acceptable to go to a library that has a copy of the orignal DNB and simply take pictures with a digital camera? The answer is surely "yes." If we ask someone to do this, how can we incorporate the result into the project? We can surely put the pictures in Commons. Can we somehow replace individual pages of the current "best" volumes, in place? If not, how can we indicate within those documents that we used a different source?

I ask because I'm fairly sure that multiple copies of the original 63-volume set are available and some are within commuting distance of one of us. We don't have fancy automated scanning equipment, but we are not attempting to do high-volume scanning, either. -Arch dude (talk) 02:52, 9 January 2010 (UTC)

I have access to electronic images of the combined 22 vol. set where necessary, and we can perfectionise later. As an example of a temporary fix I did do Index:Dictionary of National Biography volume 60.djvu.
Where necessary, I have been rebuilding volumes from the mix of available sources, however, it then requires an reload of a volume to Commons, and that is a slow old task, and probably a full-time repetitive task. I will do it, though once we have identified all the problematic pages, so it becomes a once only task. billinghurst (talk) 09:59, 9 January 2010 (UTC)

Difference between these two categories?[edit]

What is the difference between Category:DNB No author and Category:DNB no contributor? The former seems hardly to be used; is the latter an equivalent replacemnt for it? Is this the right place to ask? Jan1naD (talkcontrib) 17:16, 21 January 2010 (UTC)

Category:DNB No author has certainly been placed by hand, and I suppose those are some of the 320 unsigned articles in the DNB. These days I make those Author:Anonymous. Category:DNB no contributor, on the other hand, I suspect of being populated by means of a template, so that it operates as a cleanup category. It has some non-article pages in it. The former category could be emptied. Charles Matthews (talk) 22:49, 21 January 2010 (UTC)
I would suggest that they be left for the moment, as they are not transcluded works, and I will get to them as part of the tidy up of pasting to Page: ns, and transclude back, and with that process they will get assigned. billinghurst sDrewth 04:27, 22 January 2010 (UTC)
But in general, you both agree that Category:DNB No author is deprecated. Jan1naD (talkcontrib) 09:27, 22 January 2010 (UTC)
Yes. billinghurst sDrewth 13:18, 23 January 2010 (UTC)

External/internal links to sources[edit]

Most DNB articles have both inline citations (usually indicated with parentheses) and a list of sources at the end [in brackets.]

A great many of these sources are used in multiple DNB articles. By definition (almost) all of these sources are now in the public domain. Many of them have been scanned and are available on the web, and eventually we will want to add them to Wikisource.

And now for the problem: I think that it is clear that in a perfect world, all such reference works would be added to Wikisource, and citations in the DNB articles would be linked to the Wikisource work. But we are not there yet. What should we do? We can create an external link from each citation, but in many cases we will have multiple (from 1 to many hundreds) of separate external links to the same external work. If a better external source becomes available, or if we eventually create the proper internal wikisource, we will be forced to change all of the links. Not good. As an alternative, we can create "placeholder" articles here at Wikisource. Each placeholder will point to the external link and will indicate that we eventually want to add the source to Wikisource. If we do this, then all citation links from within the DNB articles will be "internal" links, and will not need to change if we actually create the new Wikisource projects or if we find better external sources. Does this make sense? I ask this question after creating a few external links from DNB-derived WP articles to various Google books. -Arch dude (talk) 03:34, 29 January 2010 (UTC)

My tentative idea is to create a template "Missing Wikisource." the template takes a parameter: the url of the external source. You use the template to create a new mainspace page. The page says "Wikisource does nto yet include this source. The source is available at: <url>" -Arch dude (talk) 03:52, 29 January 2010 (UTC)

On the whole, for "placeholder" pages, I think we should give priority to linking the author (rather than the work) to an author page. Because "placeholder" pages in the Author namespace are a recognised part of WS, it seems: if there are no works yet posted here by author A, Author:A can still be posted, and have {{populate}} added. And any links on to online versions (e.g. at archive.org) sit comfortably on such author pages, being added to WS once there rather than indiscriminately. Now, this will not cover all references in the DNB or any other of the other major reference works we are working on. There are standard works that are written by "committee" (such as the DNB itself). So only those really need the "own page in mainspace solution", it seems. So, for example, w:Edward Hasted wrote a history of Kent, and it is online at "British History Online"; this is a typical DNB reference to older local history. There shouldn't be an objection to linking Hasted wherever cited in the DNB to Author:Edward Hasted, which then links on. This happens to be an author mentioned hundreds of times on the site. Charles Matthews (talk) 08:37, 29 January 2010 (UTC)
Author page created for Hasted, and links added to the works. — billinghurst sDrewth 05:00, 22 August 2010 (UTC)
Addendum: BHO also has Fœdera vols 8/9/10, so I have created Author:Thomas Rymer and a linking template {{BHO link}} — billinghurst sDrewth
One thing that comes to mind (I only state it, I have no real solution) is that many particularly older sources will have been published in several editions, with significantly varied interpretations of the text. In addition, articles may refer to, for example, the Anglo-Saxon Chronicle as a source, without amplification, whereas the ASC exists as various original documents, even before later editors have got their hands on them. A URL to a source might point to a version significantly different from that used by the original contributor. Any thoughts? Jan1naD (talkcontrib) 09:40, 29 January 2010 (UTC)
The DNB is usually good about specific editions, e.g. Wood, Athenæ Oxon. (Bliss) meaning the Philip Bliss edition of Anthony à Wood's Athenæ Oxonienses. The general issue seems to lead a long way (free-content bibliographic information); from the perspective of this project the point is to make available what the DNB said. The Scriptorium-level points keep coming up (what degree of wikification and to where?; what to do about annotation, (including for us the Errata volume)?; disambiguation and how to treat articles deserving {{similar}}?; how to apply categories by topic, not just classifying texts?; portals?; and so on). It's not plausible that there are snappy answers. Charles Matthews (talk) 10:05, 29 January 2010 (UTC)
(ec) Just so. This is an example from Blake, William (1757-1827) (DNB00) by Anne Gilchrist, an author who helped produce standard, but far from definitive, works on the poet/painter: "Reprints of Blake's works have appeared as follows: '[[Songs of Innocence and Experience (1839)|Songs of Innocence and Experience]],' edit. by Dr. [[Author:Garth Wilkinson|G Wilkinson]] (much altered), 1839. 'Selections,' emendated, comprising nearly everything except 'Prophetic Books, edited by [[Author:Dante Gabriel Rossetti|D. G. Rossetti]]. forming vol. ii. of Gilchrist's [[Life of William Blake|'Life of Blake,' 1863 and 1880]]. 'Songs of Innocence and Experience, with other Poems' (verbatim), 1866. '[[Poetical sketches by William Blake now first reprinted from the original edition of 1783|Poetical Sketches]],' edit. by [[Author:Richard Herne Shepherd|R. H. Shepherd]] (verbatim), 1868.
My dabbing of these links is far from complete, notice that I cheated with Gilchrist's Blake and linked the versions page for now. I hope this helps to illuminate the issue. Guidelines to our titles of works are currently being developed, when firmed up they will give us a means to 'pre-disambiguate' a work. Cygnis insignis (talk) 10:14, 29 January 2010 (UTC)

Archived in 2013[edit]

Table of Contents formatting[edit]

Question around ToC (Moved from User talk:Arch dude) Some style questions

  • Replicating the OR entries, are we leading with preferred name, or do we have entry for both in ToC. Issues is that some of the OR get quite long. Plus if we use full, do we full wikilink or just first component.
  • Dates of life. We have used some DoL, to disambiguate, however, what guidance are we giving?

-- Billinghurst 03:53, 26 August 2008 (UTC)

I don't know. Do you have some examples? what seems to work best? The guiding principle is to preserve the look and feel of the original, but the original does not have a ToC in this sense: It's a navigational artifact that we added to replace the original's (physical page-based) navigation. Once we find a workable solution, we can put it in the "Style" section. -Arch dude 14:25, 26 August 2008 (UTC)
1) Example ToC
Look at name like Waad/Wade as an example of alternate names. Gut feel is to enter under first appearing name. It seems that more will get pretty ugly especially around order, and wikilinks. As you mentioned as long as the transcription replicates the book, that is the important item.
2) I would think guidance is give DoL only where it is needed to disambiguate, though this requires it then in the DNB template, for each entry, which is a level of complication. The complexity of wikilinks makes me hesitate to give definitive statement. Whatever is more foolproof is my preference.
Billinghurst 14:52, 26 August 2008 (UTC)
Your example is very informative. It's an index from the original DNB, not a ToC from the Wikisource project. We should (eventually) create an "article" that is as close as possible to an exact duplicate of the index, with every single character (including the page numbers) duplicated. By our own rules, we are permitted to link the entries in this article to the relevant articles in our project. This is (theoretically) completely distinct from the ToC. The ToC is a modern navigational construct: a navigational tool that we added to replace the paper navigation of the original. Since the project is still new, we can elect to abandon the ToC and replace it with a faithful reproduction of the DNB index. Alternatively, we can keep our ToC and also reproduce the index. My inclination is to abandon the ToC and use the index instead, but please remember that I am only one of the (currently) three members of this project. If we do elect to use the index as our primary navigational tool, we should first agree on the exact format of each "volume" article. -Arch dude 02:28, 27 August 2008 (UTC)
I have had a start at some trial text in [Abbadaire - Anne] and in the talk page is the example with full dates added. Note
later in the index is
Avershawe, Louis Jeremiah. see Abershaw.
While I can understand articles being word for word, I wonder whether an index page where we convert to a ToC should be an exact replicate. -- Billinghurst 06:47, 27 August 2008 (UTC)
It's clear that the "(DNB00)" is not wanted or needed in the ToC. As an initial matter, let's remove it using pipe notation.
  • [[Abershaw, Louis Jeremiah (DNB00)|Abershaw, Louis Jeremiah]]
  • [[Abershaw or Avershawe, Louis Jeremiah (1773?-1795) (DNB00)|Abershaw or Avershawe, Louis Jeremiah (1773?-1795)]]
My vote would be for the second article title to be
  • Abershaw, Louis Jeremiah (1773?-1795) (DNB00)
The article title is a Wikisource navigational construct, not a part of the original text, so we are free to choose. -Arch dude 11:41, 27 August 2008 (UTC)
Given it a try, have a look at Dictionary_of_National_Biography,_1885-1900/Vol_58_Ubaldini_-_Wakefield, specifically looking at Wadd, William (1776-1829) -- Billinghurst 16:01, 27 August 2008 (UTC)
This raises a few interesting points. First I think that if the DNB has an article, even if only a see reference, it should be a link. Second, the John Wadham situation is interesting because he is in the index, but what we have about him is embedded in another article in a way that cannot be easily isolated; perhaps the link to Nicholas Wadham there should be wikified. Third, Arthur and Felix Wakefield have identifiable paragraphs within another article. Would people consider it too much a breech of purity to add headings withing the article so that we could have "See Wakefield, William Hayward (DNB00)#Arthur Wakefield. Eclecticology 18:31, 27 August 2008 (UTC)
I think that adding headings to create anchors is in fact a "breach of purity." Fortunately, it is also unnecessary, since it is possible to add invisible anchors instead. Now I just need to remember the correct syntax... -Arch dude 00:19, 28 August 2008 (UTC)
Very spooky, I just happened to do one of those articles today, so I have gone and done as suggested.
Note that I have wl'd the component of the name after See under rather than the specific name itself. I am not wedded to that methodology. In the end, with many articles being short, I don't think that it will even be noticed. -- Billinghurst 06:53, 28 August 2008 (UTC)


1) Example ToC
Look at name like Waad/Wade as an example of alternate names. Gut feel is to enter under first appearing name. It seems that more will get pretty ugly especially around order, and wikilinks. As you mentioned as long as the transcription replicates the book, that is the important item.
2) I would think guidance is give DoL only where it is needed to disambiguate, though this requires it then in the DNB template, for each entry, which is a level of complication. The complexity of wikilinks makes me hesitate to give definitive statement. Whatever is more foolproof is my preference.
Billinghurst 14:52, 26 August 2008 (UTC)
Your example is very informative. It's an index from the original DNB, not a ToC from the Wikisource project. We should (eventually) create an "article" that is as close as possible to an exact duplicate of the index, with every single character (including the page numbers) duplicated. By our own rules, we are permitted to link the entries in this article to the relevant articles in our project. This is (theoretically) completely distinct from the ToC. The ToC is a modern navigational construct: a navigational tool that we added to replace the paper navigation of the original. Since the project is still new, we can elect to abandon the ToC and replace it with a faithful reproduction of the DNB index. Alternatively, we can keep our ToC and also reproduce the index. My inclination is to abandon the ToC and use the index instead, but please remember that I am only one of the (currently) three members of this project. If we do elect to use the index as our primary navigational tool, we should first agree on the exact format of each "volume" article. -Arch dude 02:28, 27 August 2008 (UTC)
I have had a start at some trial text in [Abbadaire - Anne] and in the talk page is the example with full dates added. Note
later in the index is
Avershawe, Louis Jeremiah. see Abershaw.
While I can understand articles being word for word, I wonder whether an index page where we convert to a ToC should be an exact replicate. -- Billinghurst 06:47, 27 August 2008 (UTC)
It's clear that the "(DNB00)" is not wanted or needed in the ToC. As an initial matter, let's remove it using pipe notation.
  • [[Abershaw, Louis Jeremiah (DNB00)|Abershaw, Louis Jeremiah]]
  • [[Abershaw or Avershawe, Louis Jeremiah (1773?-1795) (DNB00)|Abershaw or Avershawe, Louis Jeremiah (1773?-1795)]]
My vote would be for the second article title to be
  • Abershaw, Louis Jeremiah (1773?-1795) (DNB00)
The article title is a Wikisource navigational construct, not a part of the original text, so we are free to choose. -Arch dude 11:41, 27 August 2008 (UTC)

Thanks Billinghurst for joining the project. It takes getting a few heads together to sort out the questions that are being raised. Several points have been raised by both of you that I want to address.

  1. ToC vs. DNB Index. I think it's important that we are recognizing the importance of Wikisource navigational constructs. Failing to do this can make work awkward. (I've already run into this over works that have quotation marks as part of the title.) I have no complaint about including the DNB index with page numbers as an additional group of pages, but what page numbers should it show? My own hard copy is the (originally) 1921 reprint edition which combined the original 63 volumes into 21. The pages themselves were essentially duplicates of the originals; right down to the same word breaks at the beginning and end of a page. That edition, however, did include the changes from the 1904 errata volume. (See Internet Archives for this one.) With the 1921 reprint the pagination was changed so the original separately paginated volumes 4, 5 and 6 became the new continuously paginated volume 2. The indexes for the new volumes also had footnotes to take into account those additions made in the First Supplement.
  2. DoLs. I very much support using these as disambiguators, but only when necessary. I know that Wikipedia favours disambiguation by what made a person famous in life, but that is more likely to require some sort of subjective determination than DoLs. To make it easy for a person who wants to cross link articles we should not require that he read the entire linked article just to make the link; the dates used should be exactly as they appear in parentheses at the beginning of each article. (At some point we may need to deal with same individuals who have different dates in another reference work, but I think that that problem can be deferred.)
  3. Using a pipe to suppress the "DNB00" from what is seen in the ToC is just fine.
  4. Honorifics. My preference is to suppress these from our article titles. We would, of course, continue to include this material in the article itself. It is to be noted that titled people usually have a see reference at the title, and these "see" articles should be kept as such (e.g. "ALEMOOR, Lord. [See Pringle.])
  5. Alternative names. I agree with Arch dude's solution for the situation expressed in the Abershaw example, but without the dates since there is no ambiguity with some other person. We would maintain the see reference at "Avershawe". The alternative name should continue in the ToC since there will be an article, even if it is only a see reference which maintains continuity between other articles. Eclecticology 17:42, 27 August 2008 (UTC)
This looks like a consensus to me. Shall we now add it to the project page's "style" section and begin converting the ToCs? -Arch dude 00:19, 28 August 2008 (UTC)
I just found the syntax for an invisible anchor: go the the note on purity above. The syntax is {{anchor|my_hidden_anchor}}. -Arch dude 00:50, 28 August 2008 (UTC)

Two questions[edit]

  1. What do we do with junior and senior see Dictionary of National Biography, 1885-1900/Vol 2 Annesley - Baird "Armstrong, John, senior (1784-1829)" and "Armstrong, John, junior (1813-1856)"?
  2. Should it be "Armstrong, John, of Gilnockie" or "Armstrong, John (d. 1528)"?

P. S. Burton (talk) 20:04, 19 August 2010 (UTC)

We use dates of life to disambiguate people. So that would respectively be Armstrong, John (1784-1829) (DNB00) & Armstrong, John (d.1528) (DNB00) for your two examples. That said, do feel welcome to create redirects for Armstrong, John, senior (DNB00), junior and Armstrong, John, of Gilnockie (DNB00). I would also encourage you to create the page John Armstrong using {{disambiguation}} and list/link all the John Armstrong pages on WS. It all helps findability. — billinghurst sDrewth 06:26, 20 August 2010 (UTC)
Wouldn't it be logical to do what the DNB itself does? That would be as Billinghurst says for the first case but "Armstrong, John, or JOHNIE, of Gilnockie (d 1528)" or maybe "Armstrong, John, of Gilnockie (d 1528)" in the second case. Incidentally, ODNB corrects his date of death to 1530, but of course we follow the original.--Longfellow (talk) 17:12, 22 August 2010 (UTC)
I think the basic ideas we have worked on come down to this: (i) there is no "official title" to follow in the books, since there is only a "lead sentence"; {ii) we have chosen to make the titles minimal; (iii) we have chosen not to make the titles "informative", relying on the articles themselves to inform. Only (iii) here really requires defending. And that is perhaps only a discussion about how we think people will search the site, or otherwise find the material. I don't think such discussions are ever conclusive: my own first-order search would probably "John Armstrong"+Gilnockie in Google, so I'd want the name in natural order to appear somewhere (such as a dab page, for example). Since I'm currently typing up the author listings of redlinks, I certainly appreciate the use of short titles. Charles Matthews (talk) 10:18, 1 September 2010 (UTC)

{{DNB00}} & Category:DNB biographies[edit]

Noticed that the DNB biographies were all appearing in Special:UncategorizedPages. This is eventuating due to they not being subpages of the main. To alleviate the matter, I have created the hidden category Category:DNB biographies and embedded it into the {{DNB00}} header. It can be accessed via the Category:DNB. -- billinghurst (talk) 11:44, 20 March 2009 (UTC)


Vol 28 skips[edit]

Page:Dictionary of National Biography volume 28.djvu/149 is p. 151 of the original, while Page:Dictionary of National Biography volume 28.djvu/150 is p. 154. Charles Matthews (talk) 16:07, 17 May 2009 (UTC)

As a lash-up I have made Page:Dictionary of National Biography volume 28.djvu/148a, Page:Dictionary of National Biography volume 28.djvu/148b, and Page:Dictionary of National Biography volume 28.djvu/148c, to fill with text for the moment, since I want to work on the Michael Hudson article. I'm not up with how to add djvu's yet. As on an previous occasion, I'd be grateful to have bot assistance with sorting this all out. (Shouldn't be hard, given that vol 28 is untouched so far). Come to think of it, there is plenty to discuss about improving the posted text, also. Charles Matthews (talk) 16:20, 17 May 2009 (UTC)
My understanding is that we would need to pull it down, add the new pages, and then reload. Probably only want to do it once, so we should check first for illegible and other missing pages so we can insert these and do it once. -- billinghurst (talk) 08:05, 7 September 2009 (UTC)

I'm not treating these matters as being of urgency: the work of creating articles can go on, and "copying back" of text is essentially quick and trivial. What I am doing is to log all the glitches on my userpage. Logically there should be a project page that does that, though. And since this is going to be with us for a while, discussion on its talk page. Hey-ho. Yes, before implementing radical change, we should at least go through the volume seeing exactly where all the problems are. And I think that suggests priority for a "mapping subproject", identifying:

  • progress with adding text;
  • images so corrupt as to be useless and in need of replacement;
  • calculation of the "offset" (image number minus page number) which should be constant if there are no glitches;
  • progress with the master volume lists;
  • progress with adding titles to author listings.

Ouch. Plenty of work. But we really need some of these project management tools to open up areas of work. I have to say that adding decent text where possible is really my priority: once there is text on the page, it starts to show up on search engines, and I find that pretty useful anyway even before any proofing. Adding text can be done by bot, but (returning to the topic) I'm not sure why these bot glitches were there in the first place. Charles Matthews (talk) 19:54, 7 September 2009 (UTC)

Yes check.svg Done and resolved. Hopefully vol. 28 is all very special now. -- billinghurst (talk) 01:49, 27 September 2009 (UTC)

Progress table[edit]

I've now marked up Wikisource:WikiProject DNB/Progress with the most basic information on how we are doing, from the side of having text in place for proofing. A more honest title would be "progress and troubleshooting", since there are quite a number of legacy problems from the bot runs, as well as the inherent difficulties and constraints caused by bad scans. Obviously this page is for everyone to update, and can be expanded to include other issues (one obvious one being the per-volume article listings). Charles Matthews (talk) 15:34, 3 October 2009 (UTC)


access to scans[edit]

I would suppose that the most frequently used and important links for this project are to the djvu files, the only way I have found is the subpage/progress. I am keen to do the odd article, but I can not easily find or remember this. Could we put a page in main space with the volumes and indexes, if we don't have the page the user (and bewildered contributor) can still directly access the scans, a la The Botanical Magazine. The NLA newspaper project offers anyone viewing their scans the opportunity to tag and correct as much of the text as they feel like, apparently this is very successful (several users making around a quarter of a million corrections!). Cygnis insignis (talk) 11:04, 7 November 2009 (UTC)

I am currently working through and updating each volume (Index: <-> main ns), and will be adding scans links on the volume main ns pages. As each volume lists the available biographies, I believe (at this point) that this is a better solution (for the moment). billinghurst (talk) 12:37, 7 November 2009 (UTC)
I'd be happy to format the information required and put it anywhere considered more prominent. For the "casual" participant, I suppose the questions most needing an answer are: (i) how to locate the djvus relevant to a given biography you have in mind; (ii) how to replace the OCR text with better text, in the frequent case that the bot posting/text layer assocatied with the djvus is not the best available. The way things are set out on the Progress page recognises that there are various qualifications and caveats likely to be useful to someone approaching all this work, but in a reference format rather than an exposition starting from the basics. Charles Matthews (talk) 10:50, 11 November 2009 (UTC)

Upgrading scanned volumes[edit]

I believe that I now have a decent understanding of replacing the File: versions at Commons, and how we can upgrade, and the consequences of such a change.

  1. Replacing a volume for volume with a better quality version is eminently doable, (and actually one we should keep a watch upon for the potential negatives and positives)
    1. if a complete volume is replaced with another complete volume, then things are okay as long as the same start page, and corresponding thereafter.
    2. if an incomplete version is replaced with a complete version, while this is great, it may involve a bit of page moving, hence we should resolve these files before we further advance much further with the fixing text in these beasts.

Background[edit]

A DjVu file has two layers (image and text). When a DjVu file is loaded to Commons, both are available to ThomasV's Proofread Page extension.

  • Creation of an Index: page, locks in the file at Commons, and use of pagelist reveals the images available.
  • Creation of a Page: page, shows the respective image from the Commons file, and grabs the text from what it sees as the corresponding text layer for the page. The text is what is imported to WS, and thereafter stored on WS.

Why do I tell you this?[edit]

When our Page: text is of a poor quality, and we upgrade the Commons file, the image will be upgraded, however, as the text is at WS, there is no change to the text displayed. So to grab the text layer from the new file, we need to delete our text (Page:yaddada.djvu/nn) and when we recreate the page the text layer from the upgraded file is now imported if available.

So, for example, in Vol. 57 where a number of the scans were poor and, subsequently, I have replaced the DjVu file, and we already have a number of pages created. Some of the red NOT PROOFREAD pages will have poor OCR, we may have to delete those existing pages and recreate. Deleting the pages holus bolus will not be advisable as some are partially or fully corrected, and just not advanced in their proofreading status.

Where I do replace a DjVu file at Commons, I will make a prominent note on the corresponding Index: page to alert people to this information, and direct them here to ask for admin to undertake requests identified.

So?[edit]

If a page is of poor quality, check to see if we have reloaded the image file (see Index: page), and we may be able to help out. Leave any such request on this page, and an admin will deal with it. Questions? -- billinghurst (talk) 13:17, 7 November 2009 (UTC)

Your delving into these issues is much appreciated, and, yes, the sooner the better as far as sorting out the obvious page glitches is concerned, since the work involved is only going to increase over time. Coming at it from the end of identifying the cleanup to do, I'm now proactively using Category:Problematic to report dud djvus, and I see there are now 56 there. That is probably only the tip of the iceberg for illegible djvus, though. I assume we're agreed on priorities? Gaps in the continuity of pages are the worst issue, because the knock-on effects of a numbering change are large. Illegible djvus where there is no alternate scan available at all (say case "poor" in the Progress table) are the next worst case, since that means that proofing can really only be done at present from a physical book. Bad text can be got round using the ODNB-hosted scans for those with that access (now includes me), as long as there is any sort of readable djvu.
We need ... oh, lots of things (the project as whole is fairly complex, as we are finding out) but given the request in the previous topic, can we have a look at just a few? (A) Mapping of the offsets (i.e. number in pagespace − page number in volume) as a way of finding those pesky gaps more systematically, by sampling through all 63 volumes; (B) Documentation that will make sense to those wishing to come in and help out; (C) Central page for requests - I mean a forum that is not so much about these project-level issues, but where people can simply post "I want to create/have created for me the biography of X" and state the issue they have, for an answer and attention. Charles Matthews (talk) 11:13, 11 November 2009 (UTC)
I've done a basic survey on (A) now and entered the findings at Wikisource:WikiProject DNB/Progress; and I'm working on (B), which means writing down for the first time systematically numerous bits of know-how (a couple of pages still need to be created, see redlinks on the main project page). As for (C): I can think of various kinds of requests. There may well be a need for admin recreation of pages that have been deleted only for the sake of sorting out the djvu sequences, and for those you can just contact Billinghurst or me. Either of us will try to handle requests for general help, also. I can imagine people unclear about getting decent scanned text for a particular article or section in pagespace. The answer may still be "difficult to do", but please raise such matters here, so that we can start to log the worst places. Any time you find you are leaving a gap deliberately in creating articles, we really should be aware of the issue. /Most wanted articles is effectively unused, except by me. I don't see that this page is redundant, though. Charles Matthews (talk) 12:11, 13 November 2009 (UTC)

problem page, better index?[edit]

The page Page:Dictionary of National Biography volume 49.djvu/22 was missing a bit at the bottom, I found another scan at dictionaryofnati49stepuoft and corrected it from that. The source of the current index is sometimes problematic, in my experience, the latter may be a better version. Cygnis insignis (talk) 11:16, 18 December 2009 (UTC)

Thanks, yes, in general the "best" scan is listed at Progress, and the initial postings may not have used it. Charles Matthews (talk) 15:57, 18 December 2009 (UTC)

We need a "HowTo" for new contributors[edit]

Thanks for the fantastic work over the last year. I dropped off of Wikipedia and Wikisource about a year ago and I stopped by last week to see how the project is going. I see 2000+ articles, plus major improvements in the infrastructure, particularly the scanned images.

I tried to read the project page with the eyes of a newcomer, and I found it difficult to see how to get started. I think we need a specific "howto" section that enumerates the steps a potential contributor should take. If there are multiple methods of working, we might need a separate howto for each method.

In particular, it appears that the current default method of working is:

  • Understand the difference between page space and article space (or whatever we call them)
  • see if the article exists in article space (how exactly)?
  • Find the correct pages in page space (how exactly?)
  • determine if the scans are OK and if the OCRs are OK.
    • If not, try to find another source (how?)
    • If scans are acceptable, continue
  • Proofread and edit the OCR'd pages in pagespace. Start from scratch if the OCR is hopeless but the scan is readable.
    • use the manual of style to handle small caps, end-of-line hyphens, greek letters, Italics, ligatures, etc.
    • add "section" templates to identify stuff for the "transclusion" step below. (how exactly?)
  • check your work.
  • advance the state of the page to "proofread." (How, exactly?)
  • create your article in article space
    • If a redlink for your article exists in the ToC for the volume, use that name exactly (check it for adherence to the style manual and fix if needed, but it's probably OK.) If your article title is not in the ToC, add it now as a redlink.
    • use the DNB00 header template, using a "worked example such as xxxx (How, exactly?)
    • Transclude the sections from the pagespace article you created earlier, using the "worked example."
    • Add the "previous" and "next" article titles to the ToC (perhaps as redlinks) if they are not already there.
  • Ask for help if you need it (Where?)

And Happy New Year! -Arch dude (talk) 20:37, 3 January 2010 (UTC)

Welcome back AD. A trickle of info, not clasping the whole at the moment. See {{DNBset}} it pulls together the info and makes page creation easy in main ns. Slowly working through upgrading and updating the scans where possible, and better integrating the scans to pages and vols. I would also think that we would want to be smart about how our instructions can blend in with the general instructions for the site, even if we quilt pages together with transclusions. The instructions need to basically say TYPE WHAT YOU SEE. billinghurst (talk) 04:37, 4 January 2010 (UTC)
Yes, welcome back, and I've seen you active on the WP end too. I can work up a "how-to" guide, but I wouldn't want to be off-putting, either by putting in a huge amount of detail, or by prescribing a way of working too closely.

Commenting:

In particular, it appears that the current default method of working is:

  • Understand the difference between page space and article space (or whatever we call them)
Yes, there are multiple namespaces involved: main, Page:, Index: and Author: all relevant.
  • see if the article exists in article space (how exactly)?
The safest method is the volume listing.
  • Find the correct pages in page space (how exactly?)
Several options. Determine the volume first, naturally, from Dictionary of National Biography. These days you should be able to find articles to 'bracket' yours, article A before and article B after in alphabetical order. Where articles are transcluded, you can get to the page from the article (either directly by clicking in the margin, or by editing the article, depending on transclusion method. With page numbers to 'bracket', bisecting the range gets you there quite quickly. We are though moving to ability to look up. Author subpage listings should have the page number in the original volume added, and knowing the "offset" you can go to the exact page directly. (I have been lent a handbook, but unfortunately it is based on the 22-volume edition so I can go directly, but at the cost of some mental arithmetic.)
  • determine if the scans are OK and if the OCRs are OK.
    • If not, try to find another source (how?)
The "Progress" subpage lists the so-called "best" scan at archive.org. On occasion you'd need the full listings of scans. The ODNB option is good for particularly bad text; so I'd recommend adding them cleanup categories, and requesting help for articles urgently wanted.
    • If scans are acceptable, continue
  • Proofread and edit the OCR'd pages in pagespace. Start from scratch if the OCR is hopeless but the scan is readable.
See note before. I think we need to emphasise that 'triage' makes sense in this project. There is plenty to do that is not so tough. Faced with an article you really want and is apparently very hard to do, request help. Otherwise your time is probably better spent in other ways.
    • use the manual of style to handle small caps, end-of-line hyphens, greek letters, Italics, ligatures, etc.
    • add "section" templates to identify stuff for the "transclusion" step below. (how exactly?)
It's the 'section begin', 'section end' tags at the end of the Wikisource-specific line below the editing box.
  • check your work.
  • advance the state of the page to "proofread." (How, exactly?)
Radio buttons below editing box, advance to status yellow from status pink.
  • create your article in article space
    • If a redlink for your article exists in the ToC for the volume, use that name exactly (check it for adherence to the style manual and fix if needed, but it's probably OK.) If your article title is not in the ToC, add it now as a redlink.
My method is to go from the author template at the article end to the author page, and add the disambiguated redlink to that page. Disambiguation is easy to check with a full volume listing, otherwise you need to look back to two articles before, two articles ahead, to check disambiguation for the article itself, previous and next. Yes, it's a pain sometimes. I create from the author page, and then check "links to". If the volume listing shows up, you're done. If not, you need to go and add or tweak a link.
    • use the DNB00 header template, using a "worked example such as xxxx (How, exactly?)
I now always use DNBset, and I keep it in a text editor. If you are working through sequentially in a volume (which cuts down overheads) you can update the template quickly, including the previous and next links.
    • Transclude the sections from the pagespace article you created earlier, using the "worked example."
    • Add the "previous" and "next" article titles to the ToC (perhaps as redlinks) if they are not already there.
  • Ask for help if you need it (Where?)
Here, me, Billinghurst. I'll take queries on text matters, but am not very competent on what happens behind the scenes technically.

Charles Matthews (talk) 10:00, 4 January 2010 (UTC)

Continuing work on this at Wikisource:WikiProject DNB/Walkthrough. Charles Matthews (talk) 10:29, 4 January 2010 (UTC)


Article space and page space[edit]

We seem to have settled on a specific set of scanned volumes to serves as our basic source. We also seem to have settled on transclusion from these sources as our basic method of operation. I think we should expose this to our readers, by adding a pointer to the index of each volume to the volume's ToC header. This will allow a reader to (try to) access the scan itself when the desired article has not been written. This may also entice readers to become editors. Eventually (in approximately the year 2025 at the current rate of progress) we will have the entire DNB00 in both page form and article form, and the ToC can become an easy way to navigate in both forms. -Arch dude (talk) 10:12, 5 January 2010 (UTC)

Not quite clear about this. I think we are settled on the books (out of the various editions); though there is e.g. still the issue of how to incorporate the 1904 Errata. We are not yet settled on the precise scans, since one scan can replace another, and sometimes should. The numbering of the djvus is settled in many cases, not all (not all volumes can stay as they are for all time, since for example there is missing text). We could indeed try to set up links anchored somewhere and linking to text somewhere else. Though adding 27000 links (or anything) is a major undertaking. I think you are suggesting that the volume ToCs should link directly to the starting djvu of an article, by a link sitting by the article name in that listing. OK, can be considered. I would like to have page numbers (from the book) added to various listings. This is the same concept, except that you are thinking of a clickable page number that takes you to the djvu (in volume v , djvu.(page number + offset)). Which could probably all be put together with the article name in a fancy template. Since I basically approve of a template over a piped link on the ToCs, let's kick this around some more.
(As far as extrapolating to the project finish goes, seems like a mug's game to me, but we'll discuss this again in 2011.) Charles Matthews (talk) 15:19, 5 January 2010 (UTC)

There are several things called index here. The page in Index: space will display the original page numbers of the document in the djvu file, once adjusted, these also appear in main (or article) space when transcluded. Any other information is manually added, as with the indexes (the listing of existing articles) for the DNB volumes in main space. The original index, several pages at the end of volume, is much more powerful. I fiddled around with some of these, Page:Dictionary of National Biography volume 11.djvu/478 is an example, and found they not only give page numbers for the entries, but also cross reference alternate spellings and people mentioned in other's entries. I also discovered widowed entries in main space (not appearing at the volume listings) from the blue links that appeared and found mis-titled pages with the names given. Advantages to having these in these linked (or at) the volume's page include: applying the naming conventions to redlinks; users being able to discover if someone is mentioned, and which volume to "see", rather than looking for one they know exists; having the previous and next entry in one spot; navigation for browsing, q.v. links, and where to look in the scan if the entry is not in main space. This latter index would be useful for users, but even more so for the building and maintenance of the project. Doing this half-done, no format, no proof page was not as interesting as doing an actual entry, but I will try to add some more every now and then.

The errata could be created separately, then linked to and from the entries; making it even more useful than the original format was. Cygnis insignis (talk) 20:52, 5 January 2010 (UTC)

Yes, I would dearly like to see the indices in the books proofed: because starting from those, other working listings can be created. We have in practice mostly worked from the other end. (Picking up orphans can also be done using Magnus Manske's tool, by the way, if they link to their author page but are not linked back.) Charles Matthews (talk) 22:13, 5 January 2010 (UTC)
I have added a new project page on navigation at Wikisource:WikiProject DNB/Pagefinding. It lists both types of indexation for convenient reference, and explains what to do with page numbers. It still needs the conversion information on the 22-volume edition added (of particular interest to me, but probably not urgently needed by others). Charles Matthews (talk) 12:39, 6 January 2010 (UTC)


Source description pages[edit]

I have been bold, or perhaps stupid, and started adding "source description pages." Look at any of the first eight volume pages, (e.g. Dictionary of National Biography, 1885-1900/Vol 1 Abbadie - Anne). You will see a link from the header: "Access scanned source of Volume x," that links to a new page, e.g. Dictionary of National Biography, 1885-1900/Vol 01 source description.

These new pages are generated manually, using a new template: {{DNB volume source description}}. This means we can tweak the wording of all of the source description pages by changing the template.

These pages occupy an uneasy space between the article space (purely intended for readers) and project space (intended for editors.) The audience for these pages is the reader. When our project is complete, the casual reader will not need these pages, but our project is far from complete, so the reader may be forced to use un-transcribed material. These pages are intended to permit the casual reader to access the un-transcribed material in the least painful manner.

Please comment. -Arch dude (talk) 01:14, 24 January 2010 (UTC)

Thank you for the work put in on these pages; treating them as extra documentation doesn't seem problematic to me. When we finally "complete" a volume in terms of all biographies, perhaps we should then discuss what happens next (completing the front matter, index, the "redirects" if we are doing those, wikilinking the index and rationalising the volume ToCs, and treating these pages as scaffolding of some sort).
To resume on the "Proposal: Add Volume index "articles" " thread, and while we are looking at categories, there is Category:DNB Add text that gives an older form of "dummy" article. That just has a patch of articles from the start of one volume, and I've added text to a few that were there. I'll get round to the others, if no one else does. When I thought you were talking about "dummy articles that at least link to scanned pages", I was envisaging such articles that also actually made available the specific pages. There is more than one way such pages could operate. In the common case that a given article covers no more than two pages, I see an attractively simple way to create them (<section begin="marker"/> <section end="marker"/>) on a page transcluding nothing but bringing up the page number in the left margin to click. This requires no commenting-out. With what I know about transclusion if there is a whole transcluded page in at least three, the "middle" pages would need to be commented out to give the intended "dummy" effect. But then what I know about transclusion is not much. Charles Matthews (talk) 09:16, 24 January 2010 (UTC)
This is problematic in its existing form.
  1. The introduced text is not part of the work, it is all commentary, hence it does not belong in the main namespace. Such text belongs on a Talk: page, if anywhere.
  2. I would argue that the text itself belongs on the Talk page of the Index anyway, not of the main work in the main namespace, as the text is specific to the version of the scan, not to the volume itself.
  3. the template creates subpages to non-existent parent pages, which is far from ideal
  4. too much text to get to the information, a table would better present the data
  5. it duplicates an already existing template called {{edition}} which works perfectly fine
  6. it removes the direct link to the index pages from the notes field which is eminently useful on each page
While it was an idea, it really doesn't float with me, and needs to go back to the drawing board. billinghurst sDrewth 22:15, 24 January 2010 (UTC)
The question here is "who is the customer?" I maintain that the customer is the reader who is looking for information from the DNB. Project members have little or no need for these pages, because we already know all this stuff, but a casual reader has no clue. If we are not trying to help the casual reader, then why are we here? These pages are currently in main-space, not because they are sourced from the DNB, but because they are intended as navigational aids. This is similar to the volume pages themselves. As to the direct links, these are useful to contributors, but are fairly scary for a reader who has never seen our strange deja vu system, and this is precisely why I want to interpose a simple-minded explanation. For a direct link, I propose to add a trivial transclusion. This (apparently) causes the mediawiki software to add the "source" tab to the tab set at the top of the page, and will get that function back for those who want it. I just blew away all the "index to scan" links (less that 25% of the volumes had them) because I did not see your note in time. Sorry. I will go back and add a trivial transclusion to all 63 of my entries. In fact, I will convert my "note =" into a template so we can do a bulk change as needed to meet your objections or even completely revert this effort.-Arch dude (talk) 22:55, 24 January 2010 (UTC)
[responding to Arch Dude mainly, everyone else has heard this rave.] I'm concerned that efforts are being put into solutions that will beget more problems. Doing the following would be tedious, but it greatly reduce the need for notes, guidelines and disclaimers: proof, redlink and display the index pages of the original. Readers and contributors may want the same thing when the entry has not been created, the ability to navigate to a scan of the original. If the indexes replace [!] the ToC at the parent page we get the following advantages: a page number in the vol., a cross reference to a different volume (as "Pseudonym, Jane. See Surname, Jane"), the spelling of the subject's name, a pre-disambiguated title, and whether an entry was even written.
The Index: namespace's page may be confusing for those who haven't used them, but no less so than any other method of organizing the navigation. Full information about the integrity of the scan is evident from that page's display, though it is summarised elsewhere in project space. Bear in mind that this project existed well before the adoption of page scans. For those who don't know, the offset can be noted at these as
<pagelist
15=1
/>
It should be possible to generate metadata from these to a project maintenance page, but reiterating the information seems confusing and redundant to me. The direct link from mainspace to these (Wikisource) Indexes was a better solution. Any improvement to the functionality of these would benefit other projects, so, rather than attempting to compensate for perceived shortcomings, efforts would be better directed toward that. Cygnis insignis (talk) 05:11, 25 January 2010 (UTC)
A confusing discussion, to be sure. The issue addressed is navigation, and the trouble is coming, forgive me, from self-imposed Wikisource limitations on navigational pages (namespace usage issues). There are also debates going on here on short-term, medium-term and long-term priorities. The index pages from the DNB volumes number around 200 in all; and some of the scans of those suffer from generic bad-scan issues. Proofing them is not a short job, and likely can't be completed in the short term. I like the idea of doing something with the DNB volumes' pages in the Index namespace. This has the advantage of logic (look in that namespace for navigational information, seems intuitive), and perhaps transcluding some project pages into those Index pages could help centralise information on one page that is now spread over several. Charles Matthews (talk) 09:39, 25 January 2010 (UTC)
I concur: the "page source descriptions" are a poor substitute for the proper navigational solution. Their only justification is that we (I) can complete them in a very short time, and for the unsophisticated reader they are a great deal better than nothing at all. As Charles says, this is a as short-term solution. As we complete the transcription of the index pages, we can (and should) add a link to our transcribed index page from within the source description page and of course from the volume page. As we complete the transcription of each volume, we can entirely remove the "page source description" for that volume. To Billinghurst's point about mainspace: I am in no way wedded to this. I will experiment with moving the pages to talk space. I'm thinking about moving each to a sub-page of the talk page of its index page.-Arch dude (talk) 15:31, 25 January 2010 (UTC)
Update: Please look at the vol 1 ToC. I am now using a template and pointing to a subpage of the talk page. The template lets us make mass changes to the display text as we converge on consensus. I added a small superscript link (SI) for power users. This links directly to the scan index instead of the verbose Source description page. I am now changing all 63 ToCs to use the template, and moving the source description pages to the new locations. -Arch dude (talk) 00:47, 26 January 2010 (UTC)
I changed all 63 ToCs. This repairs the damage I did by removing the old "links to scanned pages." Instead of 25%, we now have 100% of the ToCs linked to the scan indexes, via the "(SI)" superscript link. All 30 of the "source description" pages I craeted have been moved out of mainspace. Change the template {{DNB sdp}} to change all 63 ToCs as desired. I will create theremaining 33 source description pages later, not that the damage is repaired. -Arch dude (talk) 02:47, 26 January 2010 (UTC)

(Outdent). I have completed the initial phase of this effort. All 63 "source descriptions" are now created. I have attempted to address Billinhsurst's concerns: there is a tiny ,sup>(si) link on each ToC page for power users, and each 'source description page" now starts with a terse summary, also for power users. The bloated verbiage is still present, but is now in section called "Explanation for new readers." I feed very strongly that this must be present, because it is likely to be the very first exposure the casual reader has to the raw sources, and we need a way to convey this basic information. Comments? -Arch dude (talk) 00:08, 31 January 2010 (UTC)

If we are trying to address the reader of the DNB, rather than workers on the project, there should be a clear link on the main WikiProject page of the type "if you are here to access the DNB, start by reading this". Maybe a hatnote. It should lead to an expository page explaining what is to be done about reading, given the work in progress. Charles Matthews (talk) 08:21, 31 January 2010 (UTC)
See the recently-added paragraph in the header info section of the main page. -Arch dude (talk) 15:32, 31 January 2010 (UTC)

improved scans[edit]

I didn't look very deeply so pardon my laziness and delete this post if it is redundant. Has anyone looked into creating new djvu files from the same source data. Most files at archive.org contain much larger files, such as a tif.zip, that might produce a better file (and ocr) than the one made available. The settings used by others to create our current djvu files may have compromised the integrity of the data, such as favouring compression over resolution. Cygnis insignis (talk) 10:34, 29 January 2010 (UTC)

I may be on to something here, compare the line of the index for " Cole, Thomas" in the djvu at Page:Dictionary of National Biography volume 11.djvu/479 with the online view of a jpg from the same scan data. The algorithm of the djvu conversion decided that the 6 was an 8, but the actual dates are evident from the jpg of the second link. Now the bad news, the source data file is huge and pushing them around requires a lot of bandwidth. A work-around is using what we have and referring to the online scan. Resampling and tweaking the conversion may be possible from the smaller flippy.zip that is the source of the 'online viewer'. Cygnis insignis (talk) 11:34, 29 January 2010 (UTC)

I can imagine this conception being very useful, particularly for WS:CEU (a project that has barely started, but never mind), where the key proof-reading step is for reference listings that are omitted from all the only Catholic Encyclopedia versions. There it is small blocks of tiny type that really need close attention. Charles Matthews (talk) 16:03, 9 February 2010 (UTC)

Questions[edit]

Hi, I have three questions regarding Wordsworth,_Christopher_(1774-1846)_(DNB00) (Page:Dictionary_of_National_Biography_volume_63.djvu/31 to Page:Dictionary_of_National_Biography_volume_63.djvu/33): A) Is there a possibility to avoid the line break in the small text at the bottom of the article? B) The article has a subentry about the subject's son. Is it better to split the article or to let it intact? C) I have created the article with a hypen between the years in its name, however later have noticed link(s) to it, which use a &ndash instead. Shall I move the article now to the version with the &ndash or shall I redirect the latter to the original version with the hypen? ~~ Phoe talk ~~ 12:48, 23 January 2010 (UTC)

  1. Use <div style="font-size:smaller"> and <noinclude></div></noinclude> on the last page, and then <noinclude><div style="font-size:smaller"></noinclude> on the second page. Ensure that the div is inside the tags. I have done it for this article.
  2. Put an {{anchor}}, and we can link to it #son's name from the vol's Toc
  3. We have an n-dash in the url? Gee, I normally have hyphens, they give nicer urls. In answer, redirects are cheap for the server, so don't feel to concerned about creating one. billinghurst sDrewth 13:16, 23 January 2010 (UTC)
I think we should stay away from endashes in the titles. At some later stage we could decide to use endashes, but the MoS has always said hyphen? The gain would be small, I believe, while the possibility for confusion is large. (The DNB scans in general use hyphens in the text. In the text I feel it doesn't matter, more important things to worry about. The ODNB transcriptions do use endashes.) I know there are a few titles in the volume ToCs that do use endashes, but the volume ToCs in general may not conform to the MoS (in which case they are wrong, for current purposes). Charles Matthews (talk) 17:11, 23 January 2010 (UTC)
(ec - help! ;-) To 1.: Thanks, I will keep it in mind. To 2.: See 1 :-). To 3 and as explanation.: After I had seen your validation of my first proofread page [3], I inserted also cross-references using hypens in the next pages. When I had created the mentioned article however, I noticed that the link to you had inserted in your validation in fact didn't link to the article. On closer inspection I saw that you had used &ndashes, which confused me a little bit since I had assumed that hyphens were correct. Therefore I checked also Category:DNB_biographies, where generally the articles had hyphens, which confused me a little bit more. :-) ...
By the way there are some few articles with &ndashes (for example Hunter,_William_(1755–1812)_(DNB00) and Langley, Thomas (1769–1801) (DNB00)). Furthermore as an information: apparently some articles are categorised under the wrong letter, see [4]. ~~ Phoe talk ~~ 18:00, 23 January 2010 (UTC)
Just move articles that are non-compliant titles at present. We don't usually worry so much about it: the title conventions aren't even completely codified in all cases, so we should be reasonably relaxed, tighten up the manual when there is a particular issue, and just admit that some moving is a minor price to pay for having people contribute.
The other point is a known bug in the underlying software, and will get sorted out by an upgrade some time. Apparently the transclusion by {{DNBset}} gets an illicit defaultsort. Charles Matthews (talk) 19:54, 23 January 2010 (UTC)
Aye, thanks for the information. ~~ Phoe talk ~~ 22:17, 23 January 2010 (UTC)
To 1) if you want someone else to do it, that is okay, either mark it or leave it and we will get to it. To the rest.
  • if Charles and I were any more laid back, we would fall over,
  • content is king/queen
  • we try to keep simple wherever possible
  • so far good teamwork, contained egos and reasonable expectations have worked a treat
  • we have mop and bucket skills, and NOT afraid to use them smiley
billinghurst sDrewth 23:08, 23 January 2010 (UTC)
Do you mean this kind of leaning back [5] ? ... If you want to use the mop and the bucket properly once, I guess I can give you some work [6]. ~~ Phoe talk ~~ 00:49, 24 January 2010 (UTC)
For the hyphen, Wikisource:WikiProject DNB/Style Manual is clear: the article title uses a hyphen, not an ndash. Yes, you may add a redirect, but you may also change the article that links to the "wrong" title. Here, the problem is the typography of the original: if in the opinion of one or more proofreaders, an ndash was used in the original, then we need to display an ndash in the linking article, but this has no bearing on conventions we use to create an article's title. Speaking perspnally, with regard to the text, not the Wikisource article titles, I cannot determine by examination the difference between a hyphen and an ndash in the original in most cases, and as proofreaders we are not supposed to convert from what the typesetter DID do to what the typesetter was SUPPOSED TO do. -Arch dude (talk) 03:42, 24 January 2010 (UTC)

On that last point, take another example, the [q. v.] or [q.v.] references. I believe that both forms 'occur' (i.e. spaced or not spaced) and the reason is that these books were hand typeset by compositors in right-justified lines. Therefore the spacing in the qv's was used as a way to right-justify: it was elastic. It would not surprise me at all to find that both hyphens and endashes were used, in the dates, for exactly the same reason (and there were likely other things of the sort). For me it is a bridge too far be peering at the hyphen/endash in the text and worrying about it when it may have been arbitrary anyway. Which is not to say that others can't worry. I always put a space in the [q. v.] by the way, for aesthetic reasons. I don't think we should be bothered about these matters in validating text; but if the mission requires it, some post-validation checking can go on. Frankly, if we had better scans to start with, it would make more sense to me. Charles Matthews (talk) 08:59, 24 January 2010 (UTC)

Author subpages[edit]

These seem to be catching on, so this is the moment to move out the basic "template" as Wikisource:WikiProject DNB/Author subpage template. Not all authors need them, but maybe up to 100 DNB authors would benefit from not having their page dominated by the DNB list. I made a list on my userpage.

Also time to explain a little, so that this method gets documentation. The column for page numbers is intended for the page numbers in the paper DNB, not the djvu numbers. For one thing, those djvu numbers may change as the result of maintenance work that still has to happen on the uploads. Also allowing people to give or check those page numbers easily, for citations, is going to be useful for some readers. The fourth column was initially intended as a Y/N for WP links, but it can be for unpiped links (possibly with comment) if we agree on that.

In the third column, and generally to create listings, the "plain link" {{DNB lkpl}} (as opposed to the full citation link {{DNB Link}}) is apparently now the accepted way (certainly by me, since I find a column of those to be more legible than the [[Bloggs, Joe (DNB00)|Bloggs, Joe]]), and it must have real advantages for machine reading also. Charles Matthews (talk) 16:41, 2 February 2010 (UTC)


Creating good pages in "bad" volumes?[edit]

I have created or proofread some "bad" pages, getting reasonable results. I leave some of these in the "not proofread" state, since I'm only using a part of the page. The question is: what happens to this work if someone later decided to reload or otherwise re-work the volume? Is there a chance that this work will be lost? What mechanisms are in place to preserve such pages?

A related problems is page number transclusion. We know that some volumes have missing pages, and presumably these pages will be inserted later. But transculsion depends on the sacn-index numbering, not the original volume numbvering, so if a page is moved to a different scan number, any articles that transclude the page will be messed up, right? How do we address this? -Arch dude (talk) 16:45, 15 February 2010 (UTC)

It could be worse than just part of a page. Both Charles and I (among others, I'm sure) have created text using various external sources, and taken it at least as far as Proofread, or even Validated. There would be some unhappy bunnies if that work got trashed. Jan1naD (talkcontrib) 17:10, 15 February 2010 (UTC)
This is one of the downsides to a bot extracting the ocr layer, if left as redlinks then 'not proofread' would show any activity and fully proofread sections. The upside is that I can link an entry to a work or author when they get added, because it sometimes turns up in a search of the site, but I have become wary doing this at DNB for the reasons given above. Expecting others to detect my more trivial edits seems like an unkind burden to those who undertake this complex task, at least I can see the changes and deletions turn up on my watchlist. Strictly speaking, and certainly from the pov of politeness, we should move any valid contributions. I have done this when an error was in the file was realised, a dozen or so pages was fiddly enough. This project does valuable work and is a good starting point for newer users, so urgency and caution needs to be applied. I don't expect others to leap at the opportunity, but I'll put my hand up to assist in pushing the data around. Cygnis insignis (talk) 17:38, 15 February 2010 (UTC)
Various volumes have in fact been rectified, and the fallout hasn't been so great. What I have noticed has included a small percentage of transclusions that have not been modified correctly; and some pages that need to be recreated. This is a tribute to the care with which Billinghurst has implemented changes, in fact - mostly people don't notice much. So, while it is accurately said that the system is not completely robust, our experience has been far from traumatic. Charles Matthews (talk) 18:12, 15 February 2010 (UTC)
From this response, I infer that Billinghurst is the only(?) person who fixes structural problems in the page scans. Is this correct? If so, then the rest of us need to adhere to whatever rules or guidelines that Billinghurst wished to promulgate in this regard. What are these rules? -Arch dude (talk) 21:23, 15 February 2010 (UTC)
Billinghurst is the only person who has undertaken that task, he may provide some suggestions or guidelines that derive from his efforts. As it stands, after Charles' reassurance, this is unlikely to affect future contributions. I imagine that the only 'rule' is that if one ignores the red warning about a file needing fixing before proofreading, one is risking making work for the user who fixes it and needlessly sweating over a crummy text layer. Cygnis insignis (talk) 22:02, 15 February 2010 (UTC)
I believe Billinghurst is the only project member who has done the "behind the scenes" work on ProofreadPage to know how to change over scans, and cure the "bot hiccups" that apparently account for the gaps and duplications in the posted scans. I doubt he would mind training up someone else :-) Apparently it all starts with backing up what is already there, which should be of some reassurance. Other Wikisource people may be the ones who run bots and do the "heavy lifting" with the large djvu files. I just don't have the technical chops to get involved in such things, but there is no reason that others can't be involved if they do. I think much of the sweated labour is checking over 450 pages afterwards. Sooner rather than later, really. Charles Matthews (talk) 22:52, 15 February 2010 (UTC)
I really need to make myself clear. I think that we need to encourage Billinghurst's efforts and make sure we do not make his work harder. This is a primary goal, because the project now depends fundamentally on improving the scans. I will do whatever it takes to support that goal. Given that, I need some guidance on how to avoid losing the work I am doing on "bad" pages. I do not want Billinghurst to have to worry about trashing a single trivial edit when upgrading a horrible OCR to a good OCR page, but I also do not want to lose an entire proofread page as a side-effect of a multipage upgrade. I want to be able to find the pages I need for an article and proofread them, even if they are problematical, and I want that work preserved. But I also do not want to get in the way of a desperately-needed multi-page upgrade. Tell me what to do. -Arch dude (talk) 02:39, 16 February 2010 (UTC)
I think it is wrong to say that the project depends "fundamentally" at this time on the remedial work. The approach that has operated (well) so far is like this: get on with the work, carry a machete, help map out the problems. For example, there are pages that are simply missing: my guess is that this may be around 60 out of 30,000, or 0.2%. There is a "bad quartile" of scans that have no alternative at archive.org, and are the Google-sourced ones that are basically a disgrace to the human race. That represents 16 volumes last time I counted. Obviously the latter problem is much more of an obstruction, but there is a way round (for me and those in my position). This being a wiki, we don't have project management as such; given the scale of the project, I'm not surprised that the lack may be felt.
We do have a consensus that the volumes will be fixed, and many have been, in reverse numerical order. I don't see who could give warranties about the work anyway, but I have said what I can about the actual effects of the upgradings. So people working on the later volumes of the DNB have no reason to be anxious anyway, and the second half of the alphabet has been neglected, so that there is a practical way to avoid hassle. Charles Matthews (talk) 08:56, 16 February 2010 (UTC)


Author subpages: a bit confused[edit]

Just for clarity's sake, what is the "master plan" for the author subpages and how should individual articles be handled currently as a best practice?

  1. If a subpage exists, only ensure that the article we transclude is present on the subpage and do nothing more?
  2. If a subpage exists, do the above but also add the newly transcluded article to the author's main page?
  3. Ignore subpages for now and only add to the author page?
  4. None of the above?

Also, not directly connected, I'm wondering if we couldn't build a template for each individual row for the subpage rather than navigate huge tables that I have trouble to read in wikitext. Is anyone working on something like that? Otherwise I could give it a shot. MLauba (talk) 11:16, 5 May 2010 (UTC)

It's confused, but not that serious I think. My own current practice is just to add titles to the author pages; that is because I'm doing batches of articles, several dozen a day, and adding each title is an overhead of around a minute. Adding to a table could be an extra minute, and I'm not doing that because I don't have an extra half hour in my schedule. It would be much more efficient to add a group of titles to the subpages, from time to time.
Table syntax seemed to be a natural way to do this, but that is because I wanted the (original paper) page numbers included. Those page numbers allow us to check and create references to the paper version.
In normal wiki workflow, I would say that if A adds title T to the author page, that is good, and if B moves T to the author subpage, that is also good. A and B need not be the same person and this doesn't need to happen at the same time.
There should certainly be a bigger plan, and it looks like this: there are about 650 authors for the DNB. As of now there are just over 300 of them who have a complete DNB listing on their author page already. There will be in the future I think around 100 who should have an author subpage, though only about 50 of those are really long lists. My initial idea was to fill up the author subpages for volume 1, then volume 2, ... etc., ahead of systematically working through. This alphabetical way of working is only one part of the project, though. We have volume ToCs for the first six volumes, so the idea was to create the listings for volume 1, to accelerate the work on volume 1 (and also to correlate with checking on enWP the presence of articles for the first few volumes).
As it is, volume 1 is about 80% done, but not much is done in volume 2 yet. I have been working to do more complete author listings, because I have the reference work that means I can do that. There are about 250 more to do, before I would start systematically listing for the authors who will need subpages. I'm working currently to do the letter S, which will take at least a month more.
So ... the point really is that all of these listings (volume ToCs, author page and author subpage listings) are auxiliary work. If they are done in their own right, it is much more convenient after that to proofread and create articles. But if anyone stops working on articles just to do them, there is a cost in the number of articles. I think the way is to have the listings compiled by everyone as part of the process, which means some tolerance of "confusion" is necessary. It is all made more complicated by the interaction of where articles are listed and the Magnus maintenance tool, which finds articles created but not listed on the author page. You could add in that the format on author pages isn't standard (there are some alphabetical and volume subsections, depending on which page you look at).
I would welcome a discussion on "format", namely how we could be more tidy. I don't think we'll succeed in a discussion of how the work ought to be done (when the real problem is the size of the task). Charles Matthews (talk) 13:14, 5 May 2010 (UTC)
If it is easiest to add them to the author page then do so, it is not a problem to build copy and paste to subpages and to use regex expressions to build the table. Ten names or one thousand take the same time with a regex replacement. As usual, do whatever with which you are comfortable and we can come and fix behind you. The major task is the bios, the rest is wikignome territory. With regard to standardisation, we can get a bot to do such pages, we can readily find the author pages by a number of means, so that is the easy part.— billinghurst sDrewth 13:44, 5 May 2010 (UTC)
But I'm thinking, now the question has been asked, whether the subpage is the correct solution. An alternative would be a template that you could collapse, on the author page itself. Charles Matthews (talk) 13:48, 5 May 2010 (UTC)
For clarity's sake, I meant to say that *I* was a bit confused, not the system itself :). Re: subpages vs non subpages, my line of thinking is, with transclusion the point is moot: provided we have an uniform format, any author subpage can be transcluded back to the author's page (and collapsed there if we want it to).
Regarding the format proper, while the table listing itself isn't an issue, I was thinking that having a template like '''{{DNB auth|article=XYZ|vol=N|pp=123|pedia=y/n}}''' in lieu of the complete table rows might be easier to handle visually (and for consistency's sake, it would include {{subst:DNB lkpl|XYZ}}).MLauba (talk) 14:10, 5 May 2010 (UTC)
If you think that would be easier, then I am happy to build it. Do you think that we need to subst: DNB lkpl, or is there benefit in having it as a template? — billinghurst sDrewth
Nah, that was a faulty reasoning, we don't need to subst it. MLauba (talk) 15:40, 5 May 2010 (UTC)
FWIW we would probably need DNB Auth-top, DNB Auth (multiple), DNB Auth-bottom
As an aside, I believe the scan index of Vol I now accurately displays every single page that has an illegible scan (went through them a while ago), so we now know exactly where our gaps are. MLauba (talk) 14:10, 5 May 2010 (UTC)
Nice! I do need to get back to those, so many balls in the air, so few hands. — billinghurst sDrewth 14:34, 5 May 2010 (UTC)
Good thing we have no deadline, eh? :) I have to say, I'm not too keen on transcribing the endless Annes at the end of Vol 1 that go on for several dozen of pages anyway :) MLauba (talk) 15:40, 5 May 2010 (UTC)

(outdent): I've gone ahead and tinkered a bit. The result is at User:MLauba/Sandbox, containing the drafts for DNB auth top, DNB auth and DNB bottom. Feel free not only to comment but to fix. In particular, my syntax is a bit rusty and in User:MLauba/DNB auth, the conditional check on the 'w' parameter (whether a wikipedia article is present or not) is pretty weak atm. And for User:MLauba/DNB auth top, I dimly remember that there's a trick to make the collapsebox header a proper wiki l3 header but I just cannot remember how to do it atm (or perhaps it isn't implemented here, dunno). MLauba (talk) 09:15, 6 May 2010 (UTC)

Any feedback? MLauba (talk) 10:02, 11 May 2010 (UTC)
Apologies, was thinking that CM would comment about it stylistically and functionality. Just nowI had a fiddle in case no parameters were passed for the first two. I am not sure what you are trying check with the w at the moment, if you want to have w being present, to say yes, and no value entered to say no, not quite there. If you could try {{#ifeq:{{{w|{{{$4}}}}}}|y|yes|no}} which says if w=y or parameter 4 = y, then yes, otherwise = no. — billinghurst sDrewth 11:40, 11 May 2010 (UTC)
Need a conditional volume. {{#if:{{{volume|}}}|add WIKILINK CODE here}} which will test for the existence, and if it exists, then wikilink, otherwise fail gracefully. — billinghurst sDrewth
Geez, this is meant to be easy but I keep getting tricked by details I overlook. OK, the volume parameter works. For the w param, I want to streamline what appears in the table: no if unset, 0 or "no", "yes" otherwise. And thanks for looking at it :) MLauba (talk) 12:33, 11 May 2010 (UTC)

Despite anything written above, I've been busy with author page listings for a couple of days. See next item - there is some interaction with the subpages issue.

[q. v.] or name[edit]

It is extremely discouraging to see the controversy regarding [q. v.] or name. I wanted to immediately undo the edit recently done at [7] but I realize the necessity of discussion and ingenuity to reach a superior end result. Personally I prefer to link the name. Further, until we are all on the same page I will probably "undo" future editing of my preference on pages I create. It is clear, however, that links should be made or they could get lost I just hope a reasonable solution is in the near future. Daytrivia (talk) 08:24, 11 June 2010 (UTC)

These stylistic things shouldn't be allowed to become a major distraction, firstly. (Every hard-and-fast "rule" becomes a barrier to entry on the work.) I happen to prefer the logic of linking a [q. v.] where it there: quod vide being the Latin for "click here". The idea of linking the name seems to come from general experience of wikification. There is an argument that the presence of links distracts the reader, so that a shorter link is in some ways better. I haven't myself done much of the linking on the DNB, intending to make later passes at it when there is more to link to. Where we are according to Wikisource:WikiProject DNB/Wikification is simply "No consensus so far on whether to link the name or the [q. v.]". So we need to talk this through. Charles Matthews (talk) 09:05, 11 June 2010 (UTC)
Controversy? Sheerly difference preference, and I don't think that it is worth bringing on discouragement nor worth undoing. A link is a link, and we will sort it out in time

Importing section that started a conversation, unknown whether there was more elsewhere

Hi according to this edit [8] I will have to redo everything I have done. That's the breaks I guess. Daytrivia (talk) 00:56, 10 June 2010 (UTC)

Well, ummm, I hyperlink the names, not the [qv], especially as they are too hard to see, and that has been our style from the beginning. Take it to Wikisource talk:WikiProject DNB for discussion. BTW, do not redo. Surprisingly they didn't have hyperlinks in their books, and needed a way to identify that, isn't that just weird and so old-fashioned. winkbillinghurst sDrewth 06:45, 10 June 2010 (UTC)
The way I see it, a name, generally given, is an opportunity to a general author page. A "q. v." is specifically a link to another location within the same document, and should be hyperlinked as such. Hesperian 06:55, 10 June 2010 (UTC)
With this work, it is most likely linking to a non-author, and should link to the respective article, and the link is to put it into context as the work intended not to the author page, plus it is not evident that there are two different links and it will be confusing. [qv] is archaic and redundant when we can link the name, and a damn sight more obvious. If the person is an author, then we hyperlink their introductory name to the author page quite appropriately, as well as other appropriate linking. Plus we are well into the work, and to start raising that as a concern at this point seems an inappropriate reversal.— billinghurst sDrewth 07:03, 10 June 2010 (UTC)
You make some reasonable points, mate, but the Argument From Inertia ain't one of them. ;-) Hesperian 10:18, 10 June 2010 (UTC)
(ec)In the context of my talk page, it is not inertia. This was discussed early in the start-up of the project (where? no specific memory, it occurred somewhere), and we have extensively progressed without any specific issue being raised, and there still has been no particular case presented for a change, so the status quo should be maintained until a the appropriate consensus otherwise prevails, and one that is not undertaken on my talk page. We should not have a smattering of each way bets through the works. — billinghurst sDrewth 12:00, 10 June 2010 (UTC)
I touched on this issue at talk:WPDNB/style. If the name is linked to a DNB article, there would be very few links to the author:ns. This isolates DNB from the site's SOP, a linked name goes to that namespace. If this is case, that needs to decided and explained to the user and User: and some consideration given to what other works this will apply to. And do we link articles when there is no [q.v.], a person as subject might have other legitimate references.
If we maintain the author:link, the solution is to use [q.v.] for the link — Cygnis insignis (talk) 10:59, 10 June 2010 (UTC)

So eventually we are going to need a pass through all the articles (preferably done volume-by-volume as the work becomes more complete). There will be various things that will need to be done at that time (I can think of transclusion style and artefacts, validation, check WP link+status, categorisation at least). It has just been uncovered to us that the original style at the start of the article is like SMITH, JOHN.

One approach is to say that we shall standardise wikification only at that point; being at that future date in possession of a more worked-out scheme. At present, it seems, we should only actually avoid overlinking, in the WS sense of going "over the top" in adding links. Now we could also work some on the style guide now, to lessen future efforts by getting it right first time. This is laudable as intention, but we should also notice that there has been a "moving target" throughout, with innovations being taken up. I have no profound feelings about the detail of a wikification guide, but it generally (i.e. making Wikisource look more like hypertext in its reference areas) does seem to be a quite knotty discussion, if it is a question of designing hypertext rather than just laying down style guides, work by work. Charles Matthews (talk) 10:37, 11 June 2010 (UTC)

I have no preference. However, the lack of a consensus causes me to simply not link at all, since we will need to normalize at some point in the future when consensus is reached. If we can reach a consensus, then I might start linking. If I were to start from scratch with no knowledge of other editor's preferences, I would probably include both the name and the [q.v.] in the link text. perhaps we need a template: {{DNB QV|John Doe|John Doe (fl.1500)}}. Then we can change the effect globally.-Arch dude (talk) 11:04, 11 June 2010 (UTC)
I think that's a good point: if we agree now to place qv links in a template with the right features, we can postpone the ultimate decision. Also we can track those links, and maybe compile an automated list of redlinks that are wanted for qvs, so there are other advantages. Charles Matthews (talk) 12:58, 11 June 2010 (UTC)

Now we're talking. One more thing, I have started working on Hugh Burgoyne and am now curious as to the "see" link before him [9]? Daytrivia (talk) 13:57, 11 June 2010 (UTC)

Or even the next page here [10] name or "see" as hyperlink? Name seems more appropriate than "see" but it's a very similar variable that I presume will eventually need addressed. Daytrivia (talk) 14:55, 11 June 2010 (UTC)
Caution: Outsider's opinion: Makes sense to me to link the name to the Author: NS page (if we have one, and we should), and the [q. v.] to the DNB page: Jonathan Swift [q. v.]. This means we maintain the WS trend to link names to Author: NS pages, and the original "linking", such that is is, in the work. Inductiveloadtalk/contribs 16:21, 11 June 2010 (UTC)
However, the simple fact is that you are going to see an extended blue link, and two urls together (not neat practice), and unless people can and are watching where the link is they will be on one or the other. The major purpose of the links for the work is to taken them to the existing biographical detail, in the qv links they are not looking for Author pages and that should maintain the priority, over our cross namespace links. Linking to the Author page clearly from the biography is perfectly adequate. Now if we were in the references for the works, especially the smaller links at the end and often a Foster Alumni Oxonienses then that may be a different case and perfectly suitable to link to author pages, and books.
Examples, with faked links

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. George Berkeley [q.v.] Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. George Berkeley [q.v.] Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. George Berkeley [q.v.] Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

With the these examples, where do you go to click the link, and do you normally look where it is going? — billinghurst sDrewth 16:44, 11 June 2010 (UTC)

I'm not aware of a prohibition on two link being adjacent, happens all the time at the other place and is unavoidable here. However, here is an example, with separation: George Berkeley [q.v.]

Cygnis insignis (talk) 17:13, 11 June 2010 (UTC)

I also suppose that q.v. is used judiciously, by the author, editor, or indexer, or else it would appended to every name with an entry. The [see Surname, Jane] format is an explicit reference to something of relevance in another entry. What smallcaps only means seems ambiguous, but it seems to indicate a particular authority is being cited; the entry on Blake uses it twice, only one has a DNB entry. Cygnis insignis (talk) 17:26, 11 June 2010 (UTC)

OK, the new template is [[Template:DNB qv]]. It currently makes the whole name+qv into a single link, but anyone is free to change it, or ask me to change it. Here: {{DNB qv|Tone, Theobald Wolfe|Wolf Tone}} yields {{DNB qv|Tone, Theobald Wolfe|Wolf Tone}}. -Arch dude (talk) 18:59, 11 June 2010 (UTC)

Further opinion: the original intent of the [q. v.] in the DNB was to reference another DNB article. In my opinion, we should honor this intent. I think we should link name+qv as a single link to the DNB article. Sometimes, but not always, the linked person also happens to be an author: When this is the case we should have a link in the "info" section of the named DNB article to the author page. This is metadata, not wikisource transcription. On the other hand, a [see...] link, or other inline ref, or an entry in the list of authorities, was clearly intended to guide the DNB reader to the correct reference material outside of DNB. In these cases, we should link the author name to the author page, because the original intent is clearly to provide the name of an author, not the name of a DNB article. Frequently, but not always, the reference in the DNB is to an author who has a bio in the DNB: We should treat this by ensuring that that our wikisource author page links to the wikisource DNB article for that author. -Arch dude (talk) 01:32, 12 June 2010 (UTC)
Umm, HOLD UP this template is superfluous, it performs no different function that the existing {{DNB lkpl}}, so please don't start using it.
But that's not unknown: {{DNB contributor complete}} and {{DNB contributor done}} show the same text, while having different tracking functions. It's more a question of whether the distinction is likely to prove useful at some point. Charles Matthews (talk) 09:35, 12 June 2010 (UTC)
That is tail wagging dog stuff. We try to make things as simple as possible for newbies, and this doesn't. If there is complexity required, then let us plan for it and layer it into the templates, not start up something that confuses and replicates. — billinghurst sDrewth 10:41, 12 June 2010 (UTC)
OK, the Wikification subpage has been up around four months now. It clearly needs an overhaul so that it would be newbie-friendly and also a reference for the project. Perhaps we should adjourn to the talk page. Charles Matthews (talk) 11:34, 12 June 2010 (UTC)
Thanks Arch duke. It may take some getting used to but it presently seems like the most logical thing to do. I'm on board. Daytrivia (talk) 05:22, 12 June 2010 (UTC)
"This [Author:ns] is metadata, not wikisource transcription" Keeping metadata out of the sites documents is good practice, however, making a link to this metadata is done everywhere, appropriately I think: "... though it has Basire's name affixed, is, on the authority of Stothard, from Blake's hand." [emphasis] A mention of an author is often, in one sense, a reference to their works, especially in the context of a library. Nevertheless, I am swayed by the opinion above that this is a reference work and linking from those, rather than to them, is somewhat incongruous.
With regard to the current form of the template, I'm reckoning that if another style is adopted the template could not simply be modified because it has appropriated the characters that would make a link to the author:ns. Wouldn't this require a script to fish out the characters, losing the named advantage of templating the link. The only reason I can see to avoid the semantically correct form, James Barry [q.v.], is that someone looking at the text might miss it; I've read hundreds of the articles and think I notice when that form of reference is given. The 'outsider'[?] view is that one expects a name to link to Author:ns, and that q.v. fulfils its function. The inimitable 'insider's view is that we move forward very cautiously, avoid conflicting approaches with a style guide, and focus on the primary task of adding the articles. Cygnis insignis (talk) 07:17, 12 June 2010 (UTC)

Summary so far - Wikisource:WikiProject DNB/Wikification will need editing to reflect this discussion.

  • We appear to have semi-settled one issue, namely that a template for the qv links exists and is now the preferred way to create such links, other ways being deprecated by the project, and the form of the template being something that can be discussed further at need.
  • Not sure about the "See" links yet. I had assumed they were functionally equivalent to qv.
  • Other small-caps names: I believe this syntax is used for all author names that are in parenthesised inline citations. There is no presumption that either the author or the work needs a link, but where enWS has already posted the work it would seem to add value to the reader to link to it. There doesn't seem to be a reason why the format alone would change the normal linking considerations.
  • Priority of linking. This hits the broader issue of what the hypertext is trying to do: to lead the reader to specific information, to a navigation page for enWS, or the "best" information (which might for example be a good enWP article). I propose a slogan like "everything within two clicks" as a rule of thumb. A link to a specific (I mean mainspace) page is OK, provided that page is itself provided with links to the relevant Author:, Portal: or Wikipedia pages. If not, some questions should be asked.

Charles Matthews (talk) 08:18, 12 June 2010 (UTC)

    • Priority of linking: We are trying to reproduce the original document and the authors' intentions. The authors clearly meant to link to another DNB article, not the WP article. Also, it is possible that there is a reference to some specific info in the DNB article that is not in the WP article.--Longfellow (talk) 18:26, 12 June 2010 (UTC)

I have found myself wandering between linking styles, as apparently did the original authors in their attempt to guide the reader to related material. From the discussion thus far, it would seem that the inclusion of a person's title in the link represents a departure from this group's established "norm." I have three examples I would like the group to consider, hybrid qv, 1st name only, and blind DNB link together with the [q.v.] linked to valid wiki article. The hybrid qv seems rarer than the number of blind links I have seen to date. The links containing only first names are usually found in the discussion of the subject's family. It is my intention to continue providing piped links to DNB articles, and including the title only in those cases when the full name is not given, as this path will facilitate the final edit upon reaching consensus. JamAKiska (talk) 15:13, 21 June 2010 (UTC)

The discussion is largely related to where the name should link, rather than how, to the author:page or the DNB article. What part of the name would be a finer point of style issue, except that would exclude the possibility of linking the author:namespace. Three users think linking the characters "q.v." is logical, linking the name after see is an explicit reference to another part of the work. Linking the author page is, however, a site wide preference; not linking any part of the name allows that to be implemented. Cygnis insignis (talk) 15:55, 21 June 2010 (UTC)
There are several occasions when [q. v.] appears in the text that it takes a moment or two, depending on the wording, to figure out who exactly the q.v. is meant for. Once the editor has made this determination it seems they would endeavor to make it a no doubter for "newbies" or general reference and use the name. There have been a number of opinions presented above and they all have valid points; I am amazed at the brainstorming that went on here. The general user, however, when referencing one of the DNB articles and sees a name linked they should immediately know where it is going. The problem with just the [q. v.] linked is that they would have to notice the "mouse-over" before clicking.Daytrivia (talk) 19:16, 9 July 2010 (UTC)

Here is a page [11] for which the name (actually two names) has to be used as the link not the "see" or the "and." Daytrivia (talk) 00:40, 30 July 2010 (UTC)

Author template needed[edit]

Hi Billinghurst, I am curious to know if there is an author template for author:Arthur Herbert Church as can't seem to get it here [12] thanks. Daytrivia (talk) 18:17, 22 June 2010 (UTC)

Having same problem here [13] what's going on I wonder? Perhaps I'm not doing something? Daytrivia (talk) 18:57, 22 June 2010 (UTC)

I just added both templates. My original semi-systematic effort to add all author templates fizzled out about halfway through the volumes. others have filled in most of the remaining templates, but about 50 remain. When you find a missing one, feel free to add it. -Arch dude (talk) 19:51, 22 June 2010 (UTC)

Many thanks. Daytrivia (talk) 21:20, 22 June 2010 (UTC)
If they are a onesie or a twosie level contributor, it is probably just worth using the underlying template to create the link. — billinghurst sDrewth 10:44, 23 June 2010 (UTC)
There will be nearly 100 more to create, since there are 685 author pages (not quite complete), and 604 templates with some double counting in there. But they are not hard to create from any existing one. I now log the total done in the monthly stats, just to remind us all that this is one task to bear down on among the many. Charles Matthews (talk) 12:08, 23 June 2010 (UTC)
There are 23 "dab" templates, so there are 604-23=581 "real" templates for 685 authors. A few authors have two templates, so there are slightly more than 100 missing templates, and they will almost certainly not be for big contributors. However, I recommend that we complete all of the templates, merely as a matter of consistency. -Arch dude (talk) 14:14, 23 June 2010 (UTC)
Working through the second half of the alphabet at Wikisource:WikiProject DNB/Listings#Singletons, checking what links to the author page, should catch most of them. Charles Matthews (talk) 14:35, 23 June 2010 (UTC)
I have now done a trawl through the singletons, and the number of templates has reached 705. Which should therefore be most of them. (I'm not quite clear about counting the DNB01-authors.) I met a few disambiguation issues as I went through, which I dealt with ad hoc; when we get serious about DNB01, we should probably revisit the cases where ABC is an author abbreviation in DNB00 and also for a different author in DNB01 - it's something to look out for. Charles Matthews (talk) 20:08, 24 June 2010 (UTC)

Update: Category:Dictionary of National Biography contributor templates now has 721. There must be 15 or so cases where the same author has more than one template in different places (which happens), to account fully for the number. Anyway I expect only a handful missing now. Charles Matthews (talk) 07:16, 25 June 2010 (UTC)

Only one template didn't have the category. All should now be listed. — billinghurst sDrewth 02:58, 1 July 2010 (UTC)

Generic nature[edit]

Tommy Jantarek (talkcontribs) brought up an interesting point about these footer templates based on {{DNB footer initials}} in whether they could be used for other encyclopaedic material. The answer is yes, presuming of course they are the same person. What makes the initials specific for the project is that a set of initials (which of course can be widely used) identify specific individuals (or numbers of people as we have found) working for a specific publication. Apart from making it easy and standard to link with the template, the value is that we can use it for audit.

In reality, this underlying template probably should be quite generic so it can be more widely used by all the projects, and probably the case with {{DNB contributor}}. So I will look to see if we can free the base templates from the projects, while still maintaining the project components. Before I do, does anyone envisage or plan for other use for these templates. — billinghurst sDrewth 02:40, 1 July 2010 (UTC)

The templates generate a specific text in a specific format for a specific reason. Unless another work was originally printed using this particular format (initials, right-justified) these specific templates will not be useful. Furthermore, unless these specific authors use these identical initials in the other work, then these templates will not be useful. Also, the larger the number of works we try to cover with these templates, the higher the chance of ambiguities. Finally, these templates are in the "DNB author templates" category. Given all of these constraints, a non-DNB project will probably find it more useful to create its own templates. The biggest advantage of these templates is that the create a super-easy way to figure out who the author is when transcribing. Much of this advantage will be lost in a new work if we increase the possibility of ambiguity. IN general, the other work is likely to have at least a slightly different footer style. If so, then create an analogue of {{DNB footer initials}} and modify it to implement that style. Then, go the the author list of the work in question and start making your templates. I am a slow typist, but I was managing about 30 an hour, even though the situation with the 63-volume DNB is horribly messy. Please feel free to contact me for help. -Arch dude (talk) 12:52, 1 July 2010 (UTC)
Most of the encyclopaedic works of that period utilise the right hand side initials for contributors, and the prime example of these works is EB1911 where there was a crossover of contributors. Anyway, I was more looking to create something like {{footer initials}} as a base template and convert {{DNB footer initials}} to utilise it with an allocated parameter for DNB. So there should be no extra work required, and no changes to any of the DNB XX templates. Also, if someone does use a DNB footer for EB1911, we will pick it up and should be able to create the other templates as necessary. It shouldn't be a biggy. — billinghurst sDrewth 15:48, 1 July 2010 (UTC)

Replacing crap scans[edit]

Hi to all,

I am in a reasonably good time and space to get to and undertake some file replacements at Commons for DNB. Though such a task is going to need more than one head (brain and eyes) to address the tasks. As I see it the processes are:

Identify type of replacement we are doing
  • Volumes that require substitution of problematic pages (ie. will result in the same number of pages.
    • If it is not a straight vol for vol swap, this will require rebuilds of djvu files and to upload, and we will need to identify the pages that we wish to replace (presumably all marked problematic) and the source that we wish to use to replace.
    • If volumes have a dud page that is rescanned subsequently on the following page, we can more than likely live with that outcome.
  • Volumes that require pages added or subtracted and also have poor scans
    here I know that vol. 20 is a candidate
    • This will require working out, from the respective Index: pages, the pages that we are looking to keep
    • Are we straight reinserting a better copy or do we need to construct a version from components
    upload the respective file (and neuter the old file at Commons)
    • deleting (identified) pages that are not worth saving (admin task)
    • moving (identified) pages worth saving
      • Subsequent fixing of any main ns pages that transclude moved files

Think that is all. — billinghurst sDrewth 05:15, 2 July 2010 (UTC)

I'd like to see the big gap in vol.23 fixed - there is a better scan anyway. Working from the front, vol. 3 has missing pages and would hold us up. Charles Matthews (talk) 20:40, 2 July 2010 (UTC)
  • Vol. 23 Yes check.svg Done
With volume 3, is the text that bad? Moving that many pages is going to be problematic (mega painful). We can add extract the two missing pages and upload those separately if it is those two alone. Otherwise would we just be moving the pages that are NOT transcluded and delete the rest? — billinghurst sDrewth 04:57, 3 July 2010 (UTC)
Sure, transcluding those two pages from userspace is possible right now. My "roadmap" tends to be built up from known "obstructions" to getting various areas done. Charles Matthews (talk) 08:07, 3 July 2010 (UTC)
Umm, is that a yes or no for a vol. 3 reload? — billinghurst sDrewth 13:24, 3 July 2010 (UTC)
You can give other things greater priority. Charles Matthews (talk) 09:19, 4 July 2010 (UTC)
vol. 26 matches your second description, as it requires several 1:1 swaps, while majority of remaining djvu pages are blurry. I suspect vol. 27 is in similar condition. Thanks for your help on 23. JamAKiska (talk) 15:47, 3 July 2010 (UTC)

Vol 25

I did some work on this volume fixing several pages by transposing several pages as this volume has a numbering problem as it is now. This volume is pretty bad and I also had some discussion about it here. I specifically mentioned that this Google image seems to be far superior to the one we have right now. Maybe we can use that instead. Ww2censor (talk) 16:34, 3 July 2010 (UTC)


Manually transcribed - a way to tag[edit]

We still have a number of manually typed or pasted DNB bios, and I am proposing that if we stumble across them that we at least mark with {{migrate to djvu}} as a means to track them and get to converting them. Crude, but it is better than nothing. — billinghurst sDrewth 04:35, 25 August 2010 (UTC)

OK. Particularly for volume 6, I have in mind to "batchify" the process of conversion, i.e. to apply a version of my current standard method with pre-prepared text. This will be a time-efficient way to do any longer runs of articles. Given that, I would suggest that others working to convert pick off the isolated examples where there is no great efficiency to be had by more industrial methods. Charles Matthews (talk) 10:06, 1 September 2010 (UTC)


Straight text biographies[edit]

MAINTENANCE TASK (manual) Produced a list of biographies that are text alone in the main namespace listed at Wikisource:WikiProject DNB/non-transcluded. These are those that need to be migrated to the relevant places in the respective volumes. There is also the need for {{DNB00}} improvements through each. — billinghurst sDrewth 00:59, 14 September 2010 (UTC)

NB that #6 to #333 on the list come from volume 6. As I mentioned above, this volume deserves a more systematic push. Charles Matthews (talk) 08:39, 14 September 2010 (UTC)

#section transclusion[edit]

MAINTENANCE TASK (bot/semi-auto) List of files that utilise #LST, Wikisource:WikiProject DNB/section transclusion and need to be converted to <pages>, and probably need |volume = xx added in {{DNB00}} and to be wrapped in <div class="indented-page"></div>. — billinghurst sDrewth 03:39, 14 September 2010 (UTC)

Yes check.svg Donebillinghurst sDrewth 13:42, 14 September 2010 (UTC)


Milestone: Vol 2 articles are done[edit]

I completed the last few articles in Vol 2. Other editors had already done almost all of the articles, leaving just a few associated with three problematic pages. Two pages, Page:Dictionary of National Biography volume 02.djvu/202 and Page:Dictionary of National Biography volume 02.djvu/458, are true problems, with one or two characters of each line cut off on the right. The other page, Page:Dictionary of National Biography volume 02.djvu/255, was merely a poor OCR requiring manual input.

For the cut-off pages, I interpolated the missing characters. This was trivial except when the missing character was part of a date. For these, I guessed, and added a [?] to the text. Only two articles are affected. My reasoning is that having an article with very slight damage is better than not having an article, but it would be better if someone can use a paper copy to fix this. -Arch dude (talk) 06:27, 26 September 2010 (UTC)

Proofed first two pages and corrected dates. — billinghurst sDrewth 02:29, 27 September 2010 (UTC)
It's great to have another volume done. We don't currently have a list of volumes "all articles added"; but we should have a page for listing that and other progress, such as completed letters.
The "issues" bring up a general point, which is how is the validation effort going to be managed? The page status "traffic light" system ought to be the prime tracking tool, of course: it is better to mark pages as "problematic" status blue if there is any doubt about the proofing. There are paper copies around for the finishing of the validation (I know of one participant with access to all 63 original volumes, not a given in that later editions do have updates). But I think we need to be patient and simply work round the blue pages, currently.
The other point is that volume 3 is now very much on the agenda. There are two missing pages, as logged on Index:Dictionary of National Biography volume 03.djvu. Now bridging the gap is not itself a major problem, in that I have "patched" other such gaps with transclusion from my userspace, and this will do for the time being. That solution doesn't mix well with the "ribbon" monitoring system, and obviously something has to be done eventually, for validation.
I'm led to remark that we are short of tools that could handle the maintenance issues, which as I understand it require making mash-up djvu files on Commons, and handling here on Wikisource the preservation of the existing pages of text, for selective restoration once a new djvu is uploaded on WS. The pages may well have to go back to newly-numbered places. (Please anyone correct my understanding here: I've not done this work.) I think we really should be asking if such tools can be written, because we have many volumes to sort out, and they could benefit the work of others too. Charles Matthews (talk) 09:39, 26 September 2010 (UTC)
Two ways to view this. 1) Are we trying to make a copy for the web where the text is complete; or 2) are we trying to build complete djvu files for download. If the former, then we can do some level of mix and match, as has been done in Index:Dictionary of National Biography volume 60.djvu where there are additional pages added (see top left) and transcluded directly. This has been in a couple of other places too. If it is the latter, then all the transcription is not relevant, as the text and images only work in the Page: namespace as that is the only place they are pulled together. — billinghurst sDrewth 03:28, 27 September 2010 (UTC)
My priority, certainly, is to have all the DNB biographies available as articles at a standard where they can act as references. It looks like we can get there for the first edition, at current rate of progress, in 2012. Then there is the issue of making the DNB into a piece of hypertext. I'd estimate about 50,000 qv links, so that this is substantial (perhaps the main business will take 150,000 to 200,000 edits). Making the text for the djvus complete requires another substantial chunk of work; probably of the same order (a couple of edits to each page). That would be most of it; but for utility we should do the Errata, and also categorise the pages. Categorisation would be a priority for the needs of the sister project on WP. Then some sort of mopping up, but this gets over the horizon, at least for me. Charles Matthews (talk) 11:30, 27 September 2010 (UTC)

I spoke too soon. I realized that my previous "milestone" merely turned all TOC entries blue without verifying that all articles are present in the TOC. I just quickly ran through all pages in the volume using the "next page" link. I did not find any missing articles, but I did find and fix a bunch of bad "next page" links. I also converted all(?) old-style articles to transclusion. I also found one article that depended on yet another problematic "cut-off" page, so I interpolated that page also. It would be helpful if someone with access to an alternata source can please proof-read Page:Dictionary of National Biography volume 02.djvu/336. -Arch dude (talk) 21:27, 26 September 2010 (UTC)

Checked last page against alternate source. — billinghurst sDrewth 02:45, 27 September 2010 (UTC)
Wikisource:WikiProject DNB/Completions is now the place to flag up volumes and letters for which all the biographies have been created. Charles Matthews (talk) 08:20, 1 October 2010 (UTC)

Volume 3[edit]

Have completed review of links on TOC page using online 1885 edition and adjusted lateral links as required. There were a few extra's that I removed as I could not find them in 1885 edition. Should finish page checking the volume in next couple of days (currently up to page 387). Am replacing text from alternate source which should accelerate the editing process, currently through djvu page 26 (will try to stay about 15 pages in front of current editing). Have replaced text on the two missing pages and included headers to facilitate editing from alternate source (So text is available for every page in this volume through page 387). Have also replaced text on all problematic pages and recategorised page 308 as not proofread. The two missing pages currently contain duplicates of text pages 126 and 127 which should facilitate a swap when these pages become available. While the text replacement is not optimum, it will help until good quality djvu pages become available for the entire volume. It would be beneficial to come to closure on qv links if that has not already be accomplished.JamAKiska (talk) 13:20, 29 September 2010 (UTC) Volume 3 page review is complete. Found only one more page that needs replacement. I added text to transform existing characters into something more easily recognized.JamAKiska (talk) 12:45, 30 September 2010 (UTC)

Thanks! What do you mean by "online 1885 edition," and "alternate source?" Please provide links so we can maintain provenance. To the extent that you are using an outside source to repair or help interpret a somewhat poor scan, there is no problem, since the scan should still be usable as the source of record. However, if our existing scan is actually illegible and you are replacing the text from another source, then I think we need some indication of where the material came from. The idea here is that it should be possible for some third party to validate your source: this is equivalent to the "verifiability" requirement at Wikipedia. Of course, we have already asserted that these pages are from the original edition, so a third party is free to go to a library that has the originals, but I feel that it is better to provide a web-accessible source if at all possible. Forgive me I am being too picky. -Arch dude (talk) 15:18, 29 September 2010 (UTC)
Am trying to support without intruding...The documents found at Wikisource:WikiProject DNB/Progress are equivalent to other DNB volumes available via the internet. Through trial and a few edit revisions and working with feedback from Charles, discovered it was best to verify the provided DNB text was from an 1885-1900 edition. These "on-line" resources have been scanned in a variety of locations around the globe; those that I reference are being made available through US libraries collaborating with Google in their ongoing effort to mainstream access. The authenticity can be verified using the side-by-side review process with the existing djvu pages. The vagueness of my language was to help distinguish this material from some of the existing fuzzy djvu scans. The volume 3 link on the provided page should help you authenticate the missing pages.JamAKiska (talk) 16:36, 29 September 2010 (UTC)
w:User:Charles Matthews/DNB scans for the complete list (probably) of the scans as they appear on archive.org. The "Progress" page links to the so-called "best" scan, but it may not be the best for any given page. There probably are many more Google scans than have reached archive.org: I have tried to post as many Google Books "keys" as possible (at User:Charles Matthews/DNB referencing data) for scans of the DNB of many editions that are relevant; but there are two or sometimes three Google scans of the 1885-1900 books. It's actually a fantastically complicated picture. Here in the UK I cannot actually read any of the Google Books postings, one reason why I somewhat obsessively want to replace Google Books DNB links on WP by links to our own versions. In short, Google has raw material above and beyond what is easily apparent to us. Charles Matthews (talk) 20:48, 29 September 2010 (UTC)
This is not a complaint. I am in awe of the collective progress being made as different individuals take the initiative to attack our various problems in different ways. If everyone just keeps doing whatever seems to be useful, we will continue to make progress. There is no particular reason for anyone else to cater to my obsessions about provenance unless you just want to. Let me summarize the situation as Charles outlined it:
  • Scans of many volumes from many different sets of volumes, by Google and others, are available on the web.
  • Many of these are in fact the correct edition of the DNB 1885-1900.
  • A Wikisources editors/contributors/project people/whoever (thanks!) have uploaded the "best" instance of each of the 63 volumes to commons as .djvu files. Each upload generally has a comment with a link back to the source web site.
  • Some of these volumes have subsequently been replaced with better copies, with or without updating the backlink to the new source.
  • The "best" volume is not necessarily the best for any particular page.
  • Some page-by-page repair work has been done or is being contemplated: this would involve downloading the whole volume, using tools to replade image pages, and uploading the result. There (is or is not?) a system for logging the changes made by this method, including which pages came from which alternate sources.
  • Other repair work is being done by modifying the text pages using the alternate sources in conjunction with the normal "pageview" tools. There (is, is not) a system for logging these changes including which alternate sources were used for each page.
  • Charles has some information about the locations for alternate sources for each volume.
Is this a valid summary? -Arch dude (talk) 22:30, 29 September 2010 (UTC)

Not quite it. The initial choices of djvu upload were before my time, but even with charity those weren't always best. The "logging" issue comes in two parts. I think Billinghurst once said that we don't really care about the provenance of text from which we proofread. There are multiple sources at archive.org of the correct edition, and there are sources for later editions. It's all grist to our mill: text layer or external source. We do it patchwork. Changes to the djvu sequence have been more disciplined, and are much more of an effort. It's an admin task. I have not participated. I believe changes have been logged to the Progress page. Charles Matthews (talk) 09:53, 30 September 2010 (UTC)

Lead required[edit]

Hello, I've just stumbled upon the project page and my thoughts were that it needs a lead at the top of the project page outlining the basics of the project. One or two sentences is all that's needed. It should cover such basics as the country (Dictionary of National Biography of New Zealand? England? USA?) and the aim of the project. Just a suggestion :) Schwede66 (talk) 23:41, 30 September 2010 (UTC)

I've added text to the header, which has a field to describe the project. Charles Matthews (talk) 07:08, 1 October 2010 (UTC)

I added a new lead, because the text in the header is not very prominent. -Arch dude (talk) 17:29, 1 October 2010 (UTC)


simplified 'LST'[edit]

As this project uses LST, the labelled sections in the Page: namespace, a great deal, I thought I would bring attention to the changes introduced and discussed at Wikisource:Scriptorium#Easy_LST. cygnis insignis 21:37, 19 October 2010 (UTC)

Thanks, I'm trying to test this as it affects my normal way of working. The idea is that if a section runs over multiple pages, then only the starting and closing section marks are then needed? Didn't work for me first time I tried that. Charles Matthews (talk) 11:40, 20 October 2010 (UTC)
hmm no that’s not the idea ; the internal syntax is unchanged. The script is only a preprocessing.
I fixed Page:Dictionary of National Biography volume 56.djvu/148 because the end mark was missing. Note that this cannot happen when you use the script.
to use it, reload your javascript. You’ll see that section tags are replaced with ## titles ## during edition.
ThomasV (talk) 11:44, 20 October 2010 (UTC)
Well, I'm going to have to try to understand this. Page:Dictionary of National Biography volume 56.djvu/148 has this diff you made: [14]. You added a section end. But the change in wikitext, for me, is nothing there. Charles Matthews (talk) 12:04, 20 October 2010 (UTC)
There is a spooky effect caused by what was termed a pseudo labelling. Thomas outlined how defining the end is not needed in our practices. In theory the code is synonymous, it adds the same thing, but only the start of section code 'appears' to the user (as ## section title ##). The section end is 'there', but does not appear, the end is implied by the start of a new section. Does that help conceptualise what is going on? cygnis insignis 12:16, 20 October 2010 (UTC)
You can continue as you were, and there is an opt out if you don't want that display. cygnis insignis 12:19, 20 October 2010 (UTC)
I'm going to have to think hard to understand how robust this now is. I can't guarantee not to make mistakes in markup: I certainly make mistakes 10% of the time. My first reaction is that this upgrade is not helpful to me personally. Charles Matthews (talk) 12:38, 20 October 2010 (UTC)
By the way, if the opt-out is anything to do with the Vector skin (perhaps implied by what is written at the Scriptorium about this), I don't use it and have no intention of doing so (it breaks my way of creating articles with two windows side-by-side, amongst other things). Charles Matthews (talk) 12:47, 20 October 2010 (UTC)
I will add a gadget to simplify opt-out ; currently you need to change yopur javascript manually.
However, this upgrade should be useful to you personally, because it guarantees that you cannot make mistakes such as unbalanced or misnamed tags, such as the page that I fixed.
That was a test of mine, in fact, not an error, and you made the change while I was creating an article to check whatb happened. A typical mistake I make, in adding text from a long strip in a text editor, is to copy the wrong beginning or end section up or down (I have the correct names at the beginning and ends of sections, but the page break may come in the middle, and so I need to copy). With the new system I suppose I couldn't see if I have copied down the wrong end section tag by mistake. But that is why I said I needed to understand better. Charles Matthews (talk) 13:12, 20 October 2010 (UTC)
I’d like to understand why it breaks your way of creating articles. could you clarify this point ?
ThomasV (talk) 12:52, 20 October 2010 (UTC)
That's purely about dimensions and tabs when I have two windows open: something I found inconvenient when I last tried it. Charles Matthews (talk) 13:12, 20 October 2010 (UTC)
I understand the upgrade better now: thank you for the time you have put in. A way of switching back to the view without the preprocessing might prove helpful in troubleshooting. Charles Matthews (talk) 07:51, 21 October 2010 (UTC)


archived posts[edit]

I have archived a slab of posts, though the page is still big. If you think that a section above is complete and can be archived, then please mark it as such and I will get back to copy and paste them to the archive. Of course feel free to do it yourself. — billinghurst sDrewth 08:49, 1 November 2010 (UTC)

Yes, when posts from two years ago are still deemed relevant, it feels like time for some FAQ-like material. Charles Matthews (talk) 13:35, 1 November 2010 (UTC)


Change to coding across WS, modified template[edit]

I have found that a number of the <div> formatting is now redundant on transcluded pages, accordingly I have removed it from its application within {{DNBset}}. At some point I will get a bot to rung through and tidy the DNB biographies to remove that redundancy. — billinghurst sDrewth 11:45, 4 November 2010 (UTC)

Page numbering for volume 33[edit]

Can someone please "fix" the numbering for volume 33? For example Livingstone, David (DNB00) the first page link says 390 and that links to Page:Dictionary of National Biography volume 33.djvu/390 but it is 384 in the physical volume. I would do it myself but I do not know how. So if it can be done please explain it here or please provide a link to a page with an explanation. -- Philip Baird Shearer (talk) 05:14, 6 November 2010 (UTC)

Renumbered the pages of volume 33 to align page one of text with that number. The preamble pages have been renamed, the final two using roman numerals that align with the original text. This statement can be found in the "pages" section while editing the index to this volume. JamAKiska (talk) 12:30, 6 November 2010 (UTC)

Thanks volume 16 seems to have a similar problem. -- Philip Baird Shearer (talk) 01:51, 9 November 2010 (UTC)

That should do it…let me know if you see any others… JamAKiska (talk) 02:31, 9 November 2010 (UTC)

Volume 6 seems to have a similar problem page 125 links to 113. Is there any documentation on what you do to solve this? If not could you give an explanation here so that if I find any more I can fix them myself rather than having to trouble others by asking for it to be done. -- Philip Baird Shearer (talk) 22:19, 16 February 2011 (UTC)
I’ll give it a go…
When the djvu file images are transformed into an index page (Wikisource namespace Index), the index page reflects the total number of images starting from the first page of the file (can be the front cover), image 1 and also djvu 1, and continues to the last image. As most texts have pages prior to page 1 (prefatory notes and the like) it is possible to align the image for page one with the page: ns (where the edited text is stored). Using your example V. 6 p.113, the image is of page 113 and yet it is the 125th image on the index page. The difference between these numerical values is referred to as the offset, and corresponds to the number of images (pages) prior to the first page of text. If you edit the Index Vol. VI you will observe on the left hand margin a box titled "pages:" that contains a statement <Pagelist…> which allows a renaming (realignment) of the page: ns. In volume 6, page: ns 13 was reset to aligne with the image for page one of volume 6. So prior to the upgrade yesterday, this statement functioned properly. During the upgrade, this statement no longer functions as intended, and all pages reflect the djvu image number…which can be a little confusing when trying to locate pages using the index page…as you noticed…when the bugs get worked out of the upgrade this statement should function as intended… JamAKiska (talk) 12:45, 17 February 2011 (UTC)


Vol. 11 replaced with good version[edit]

I have replaced volume 11 with the identified "better version". I have deleted those NOT PROOFREAD (red) pages from the scan where it did not look as though there had been text transcluded from them. I did that by looking at what was proofread and checking pages around the edges. If you find pages deleted that should not have been so, will need to be recovered. Tlak to anyone with admin rights, though primarily Charles or myself among the DNBizens.

I have found that this introduced version from UofT does not have the listing of biographies at the rear, so we may have to add that somewhere else. Will need to mark that as a TO DO task. — billinghurst sDrewth 05:57, 29 November 2010 (UTC)


Site to reOCR a page[edit]

I have the website http://www.free-ocr.com/ very useful to reOCR a page, especially where the centre line seems to have been ignored in our scans.

  • To use Wikisource image, click into edit mode and save the image (right click, save as, etc.); alternatively
  • To use archive.org online version, find image, zoom in to 100% and right click and save as ...

Then just need to upload the image to the site above. Note that the image needs to be <2MB, and pages of our text tend to be about 1MB. — billinghurst sDrewth 22:19, 17 December 2010 (UTC)

Volume 44[edit]

Volume 44 index page is available for proofreading after switching to the alternative source. All 14 of the problematic pages have been identified and marked accordingly. The djvu pages that correspond to text pages 284-291 need adjusting, as this source has duplicates of pages 282-3 in the first two pages of this sequence and no images for text found on pages 290-291. The text for pages 284-289 are offset from their respective images by two frames (hence the +2 ID on label) If someone is familiar with removing and replacing a small section of the djvu file, that would be preferred to downloading and building up a fresh image file. JamAKiska (talk) 22:33, 28 December 2010 (UTC)

All volume 44 images and text are now aligned. Pages 290-1 have images, leaving 12 problematic text pages in this volume. JamAKiska (talk) 22:36, 29 December 2010 (UTC)

Vol. 6 all text transferred[edit]

From the many pages that were manually transcribed in the main namespace, I have finally completed the transfer of these pages to the Page: namespace from Index:Dictionary of National Biography volume 06.djvu. A big sea of orange, some splashes of green, and some pockets of redlinks. George Burgess (talkcontribs) has been following through picking off the red pockets. Now that is out of the way, I can look at some of the other problems that people have left me through DNB. %-) — billinghurst sDrewth 15:02, 29 December 2010 (UTC)

Transfer now complete through volume 11, my load was much lighter:^). JamAKiska (talk) 17:29, 31 December 2010 (UTC)

Volume 07 djvu file replace.[edit]

Replaced volume Index:Dictionary of National Biography volume 07.djvu from 2008 IA source. Will need to append index pages as they come available. With some care, the previous images remained in place for some of them. JamAKiska (talk) 17:31, 31 December 2010 (UTC)


Volume 6[edit]

The Index page shows this volume is close to complete. Around a dozen articles need to be created from proofread text now; and there is just one page marked "problematic" that actually needs attention. Charles Matthews (talk) 10:12, 14 January 2011 (UTC)

OCR'd text and inserted, back to "not proofread" Others done. — billinghurst sDrewth 13:50, 14 January 2011 (UTC)

I have created the remaining articles. Charles Matthews (talk) 20:03, 20 January 2011 (UTC)

Volume 24 has been updated to best available scan[edit]

All pages moved, page transclusions updated, buffed and polished and paint black on the tyres. — billinghurst sDrewth 16:39, 19 January 2011 (UTC)


Vol. 1 - transcluded the peripheral pages to end[edit]

To the end of Dictionary of National Biography, 1885-1900/Vol 1 Abbadie - Anne I have trancluded the lead pages to the work, including the contributors, added a link to the anchor within the header. The trailing pages [dinner for George Smith (1894)], I have included a couple of the pages, though I think that they may have been bound in at a later time, and wonder they are from another volume, those pages are incomplete anyway, and we need to work out whether to cull them or what. Note that we can probably dig up the newspaper articles themselves and add them rather than rely on these as standalone. Anyway there as food for thought. — billinghurst sDrewth 12:51, 26 January 2011 (UTC)

Replaced Vol. 19 scans[edit]

I have updated vol. 19 scans as that was the recommendation from the /progress page. I trimmed the version to align precisely, so no page moves are required, and I am now deleting the bot applied pages (by checking validation status and transcluded articles). If there are any that were incorrectly deleted, please get back to me and I will return them. — billinghurst sDrewth 02:58, 27 January 2011 (UTC)

Seeking opinions on how we manage vols with missing pages.[edit]

Volume 30 has two pages missing and is basically held up until we replace them. We can either wait until there is a new scan of the whole volume (for which I am not holding my breath) or we can try and get the two missing pages (p.28&29). Now I have ready access to copies of those pages from a later edition, or someone can go and scan the pages. While the former is the less pure option, it is readily achievable, however, to progress that way is a community decision, not one persons. We can always replace the two pages at a later time as required. — billinghurst sDrewth 23:12, 30 January 2011 (UTC)

The long term solution is a complete volume with all of the pages intact. The interim solution needs to provide proofread text for article creation. In the past those text pages were archived in another location and made available for transclusion.
The following reflect preferred options to meet the interim goal.
(Plan A) If AI or Commons could splice in the missing pages, that would provide completed djvu files for those volumes. Moving the existing pages to match the revised djvu file is fairly straight-forward. Once completed we have achieved the long-term goal.
Support as preferred methodology, though you want to make that call early, as moving a completed volume is going to be très ugly as it means so much work and surely something breaks.
(Plan B) Volume 60 created an extra index page for pages 18 & 19 that provides a great location to store these extra pages as an interim step. We could organize this index page along the lines of the Errata volume to provide partitions for only those volumes that need pages stored.
Support as secondary fix, to be used where there is small blocks of missing pages, eg, 1-2, once or twice through a work, though I don't really feel that it is necessary to retrospectively insert them into the work, see plan A ugly for supposed purity of little value
(Plan C) Form a small working group that stays in place until all of the problematic or missing pages are validated for volume 30 as a starting position and then complete the remaining volumes as needed. The two of us could finish volume 30 this week…once we decide where to store the proofed text. JamAKiska (talk) 01:18, 31 January 2011 (UTC)
Not my preference nor how I want my involvement
(Plan D) Works where there are major problems/gaps in a work, prepare a work with the best effort, insert dummy pages where there are missing pages and when/if they become available then to insert these at a point in time.

Does anyone have access to a library that has the physical volume? If so, it should be acceptable to go to that library and take photographs of the pages. Clearly, this is not a copyright violation. Then, upload the photographs to commons and use them. It is not strictly necessary for all of the pages of the deja.vu file to be readable as long as we have a data trail of where the proofread data actually came from. In this case, the proofreader could simply place a note on the talk page of the proofread page in pagespace to point to the picture in commons. The volume (probably) physically available at the Stanford University library. I don't know if it is available at the Library of Congress. -Arch dude (talk) 23:30, 31 January 2011 (UTC)

OK, the LoC claims to have a copy of the whole DNB00 plus the DNB01 and also the later editions, and they are open on Saturday. I have never been there, even though I have lived in the area since 1970. I am willing to try this myself. Do we have a list of all pages of that we need across all volumes? According to their web site, it is also possible to online-order digital images of specified items. The charges are somewhat high, so I'll try to do the bulk of the work in a few in-person visits. I'll try this if everyone agrees that this is an acceptable approach to repairing our problematic pages. -Arch dude (talk) 00:07, 1 February 2011 (UTC)
Yes, I know that is possible, however, my point was the purity to which we were looking to be. There has never really been anyone say that they have ready access to the volumes, though I that was part of my implicit question in case someone was going to say that they did have access. The known problematic pages should be able to be determined, without mega issues, though it may need someone who better query the API. — billinghurst sDrewth 04:58, 1 February 2011 (UTC)

I downloaded clean copies of volumes 13, 21, 58, & 60 earlier. Is there a preference as to the order in which to begin their replacement? if not will proceed in order and should have these four complete by the weekend if I work alone on this one. Volumes 20 and 41 are complete volumes, but with a few pages where the fringes are ripped and missing small groups of (1-4) characters. The overwhelming majority of these torn pages are completely legible and have the text quality found in volumes 23 & 44. In volume 20 these pages are 95, 96, & 97. If the existing text of these pages were validated using the scans in place (already been proofed), then replaced by text images from this fresh volume, very few if any, would question the authenticity of these rough pages until such time as the replacement pages became available. If time is not an issue, we wait until the images are ready. Volume 41 needs pages 145, 146, 175, 176, 177 & 178. These volumes are all downloads from AI. If Arch dude can work these images we should have a way forward through these tough volumes. I’ll map volumes 27 and 30 tonight to provide their requisite pages. Volume 30 needs at least pages 28 & 29. Our fallback position includes public domain scans from Google to fill our existing void with a text quality currently found in volume 3. JamAKiska (talk) 02:14, 1 February 2011 (UTC)

I don't see any real value in doing volume 60 (see plan B comment), the others all look ripe for it. I would suggest working on volumes where there are least deletions to do undertake first, so we can work on the deletions separate to the upload on those volumes, they are separate actions. — billinghurst sDrewth 05:15, 1 February 2011 (UTC)

I’ll save 60 for last in case of 2nd thoughts (this is a complete volume)…take another look in the 80s of this index page. Would like to splice in some details from above into my volume 20 request below…believe it should be replaced with new file that is a 99.9% solution (easier on the proofreader) and will only require at most three images to be replaced (as they become available), or linked to a "reference" volume (perhaps in Greenwich :^)

I'm somewhat confused by all this, so here are some simple questions:
  • Should I go to the Library of Congress and get copies of certain individual pages?
  • Which pages? (I assume at least pages p28&29 of Vol 30.)
  • Should I get the title page of each volume from which I get other pages? ( I assume yes, as part of attesting to the provenance.)
  • Which other specific pages should I get as part of this initial effort?
-Arch dude (talk) 21:10, 1 February 2011 (UTC)

Pages for Volumes 20 & 41 specified above. JamAKiska (talk) 22:44, 1 February 2011 (UTC)

ArchDuke. There is no exact answer without forensic analysis to compile an exact list from the Index: pages, beyond to say MISSING PAGES and PROBLEMATIC PAGES. With regard to provenance, if you are taking digital photos having a copy of the title page is probably pretty useful, especially if it becomes a string of files as that would make it easy to determine where to split. It may even be worthwhile converting the string of images into PDF file to upload, and then we can either pull it apart or to separate a page to put through an external OCR. We can (quietly) delete the PDF from Commons at a later time. — billinghurst sDrewth 04:18, 2 February 2011 (UTC)
Thanks. I know that there is not yet an exact answer for the general question of "which pages?" I was asking for guidance on which pages to try for during my preliminary scouting expedition this Saturday. I will use the lists provided above for vols 30, 20, and 41. I also infer from these responses that both you and JamAKiska feel that this approach is worth trying. With regard to formatting: The LoC has a "copying services" department that apparently does the actual copying, and part of this service is e-mailing a .tiff file. I hope they are willing to perform this service (and charge for it) without also charging the research and retrieval fee if I am physically present with the actual books in my hand. The research and retrieval fee is the expensive part of this mess, but cost is not the main objection (I have the money and I'm more than happy to spend it on my hobby.) My reason to go in person is to be able to check the results instantly, Also, I would like to finally physically lay hands on the physical volumes after more than three years of working with them online. -Arch dude (talk) 14:36, 2 February 2011 (UTC)
I would think that we can work from TIFF to DJVU, though I don't have one available to play. But no photos of books? Wow that is just bloody rough. Yes, it is worth trying. — billinghurst sDrewth 14:46, 2 February 2011 (UTC)
I'm back from my scouting expedition. See User:Arch dude/DNB at the US Library of Congress. Short version: I can access the DNB, and I'm going back next week with a camera. -Arch dude (talk) 20:32, 5 February 2011 (UTC)
Excellent news. Probably we'll see how the first expedition proceeds, and what we can do with the text, and then we can then work out what is a reasonable schedule going forward, the number of pages that you can manage, the number of trips that we think that it might take, and our priority list. — billinghurst sDrewth 00:15, 6 February 2011 (UTC)


Help request...volume 20.[edit]

Have replacement volume (contains all pages of text and index with good quality images) ready to upload. Need to validate existing pages 105, 106 & 107 (text pages 95-97) with existing scan prior to upload as new volume has portions of these pages torn off. JamAKiska (talk) 18:14, 31 January 2011 (UTC)

We should be replacing the pages in the incoming volume with the decent pages from the present volume. Otherwise we may as well just have two volumes at Commons and call the requisite pages individually. There doesn't seem much point in going from an old bad to perpetuating a new less bad. — billinghurst sDrewth 23:36, 31 January 2011 (UTC)


Broken section markers[edit]

The sections for the underlying pages seems to be broken for Boyle, Michael (1609?-1702) (DNB00) a couple of articles and a large fragment of another article are appearing on that page. Please could someone fix it and be kind enough to explain here what needed to be done (teach a man to fish) -- Philip Baird Shearer (talk) 22:08, 16 February 2011 (UTC)

Troubleshooting is underway…believe it to be related to the Mediawiki upgrade which was initiated this morning. Refer to several central discussions related to this topic. If in the course of your editing you experience any additional "difficulties" post them at that location. Thanks…there is a bugzilla report filed this morning that we should amend as we make additional observations in this new environment. JamAKiska (talk) 22:26, 16 February 2011 (UTC)

Section begin/end problem?[edit]

Somethings gone wrong here How (DNB00). Please help. --P. S. Burton (talk) 21:36, 17 February 2011 (UTC)

See reply directly above 'Broken section markers' — George Orwell III (talk) 21:48, 17 February 2011 (UTC)

Vol. 35 cleansed[edit]

Vol. 35 had the text and page images out of kilter. As the pages were bot applied, I have deleted the pages that were not used in articles or were only showing as /*not proofread*/. — billinghurst sDrewth 14:58, 20 February 2011 (UTC)

Recent Author pages created.[edit]

Author:Philip Norman from supplements was not created and now is, haven't checked what else may have been missing for this bloke. Charles: I also didn't run through your checklist for authors, articles … — billinghurst sDrewth 02:19, 3 March 2011 (UTC)

Antiquarian remembered for his artisan vantage point regarding the buildings and architecture of London. Added links to those works I could find on AI as a means to preview any which should be included in WS collection. Located 2 DNB Supplemental articles, one each in DNB01 and DNB12. The work about him was a brief 1906 review of one of his works that gave some insights into who he was. I would consider it a placeholder pending the substitution of a more suitable alternative [see BAL Biography file]. JamAKiska (talk) 18:57, 10 March 2011 (UTC)


Bad scans[edit]

Hi folks...

While going through Category:Index - File to fix and attempting to resolve some of the issues with the listed Index files I noticed there are about 5 or 6 DNB volumes in there. After trying to do some back tracking/investigating at Commons, InArc, etc. I get the felling that most, if not all, of these have been "updated" at GoogleBooks since the originals were scanned for InArc. For instance, Volume 25 on Commons gives the source as InArc and the URL there citing GooBoo was

... with a creation date sometime in 2007 but since updated in 2009.

Upon visiting that URL now, it is obvious the scan has been re-freshed since 2009 -- signified by an additional URL w/ apparently same (now fixed) content as the above URL of...

Now at first I was thinking of slowly removing and inserting the "bad" pages in the existing djvu, but it dawned on me the OCR'd text is also 4+ years old and probably could be better by using today's utilities to (re)extract that too. Somebody just tell me the best option, patch the old or pull down a fresh version, for the project and I will try to fulfill that need. — George Orwell III (talk) 21:26, 5 April 2011 (UTC)

As I scan the various sources and I find a good file I store a link at Progress. See Managing Vols with missing pages discussion for background as we continue to gather concensus. Have yet to discover the path to inserting specific page images for volumes that are almost complete. Hope to spend time in that direction soon. JamAKiska 23:14, 5 April 2011 (UTC)
We'd certainly be grateful for help. Charles Matthews 06:54, 6 April 2011 (UTC)
Sure but as I peel back the proverbial onion one thing is clear -- Google has a made a real mess of this series. Each volume has at least 3 separate file names (URLs) but it is hit or miss if the content is same for all 3 (or even just 2 of 3). Once you compare the 2 or 3 at a glance with each other - hardly any of the have the same number pages. Upon closer inspection, none of them are perfect. Some of them are slightly bad with double or missing pages while others are FUBAR with half-scans, "folded" pages and faded text through out. I'm going start organizing my findings into something that I can make a better informed decision with for best possible results in the meantime. Let me guess - most of you don't have access to the U.S. Google section where most of these are found? — George Orwell III (talk) 08:30, 6 April 2011 (UTC)
You have clearly identified our issues with the files. We started a premise of uploads vols; which then became bad vol, let us find another; which became, oh, that is bad too, let's mark the pages problematic, and come back to it. Finally at a level of maturity that we can identify volumes sufficiently that we can look to construct them where there is no perfect volume available. And you are correct that from Google most of us (all?) cannot see them.
I'll gladly up the PDFs not "viewable" to you folks as we determine which volume to address and in what order so that those more familiar with this series than I can determine which revision or version is best to convert to a bundled .djvu (and how). Once this clear path of "what" to address first is laid out before me, I can get my hands dirty almost immediately. — George Orwell III (talk) 22:37, 6 April 2011 (UTC)
Plus we find that the bot applied pages for a bad volume makes it difficult to clean up as we don't know which page has edited text unless it is completely proofread, and I am trying to get some time to learn some jquery to work out a schema to run a query on the api that will give us that result. — billinghurst sDrewth 13:10, 6 April 2011 (UTC)
Well that is an issue the goes well beyond just this project and one that am I least likely to have an answer for. My "gut" tells me that some across-the-board type of refinement might be best -- one that lets us toggle? away from the current defaults of list displays of oldest-creation-date first or the straight alphabetical at the top. If we can pull contribution lists indicating "top" for being currently the last one to make an edit, then there must be a way to pull a list of edits made by a User (the bot in this case) that does not show "top" for a range of articles (Pages in this case). Like I said, I'm not that technically advanced but it seems possible for somebody who is. — George Orwell III (talk) 22:37, 6 April 2011 (UTC)
For reference: w:User:Charles_Matthews/DNB_scans, my page about what there is at archive.org. The Google scans are by no means the pick of the bunch. Charles Matthews (talk) 17:47, 6 April 2011 (UTC)
Thanks. That will surely help. I beg to differ though - even if it isn't made clear at IA's description page, a good portion of IA hosted works orginated at or in partnership with GooBoo with the occasional dedicated individual tweaking the scans a bit before archiving it at IA. While GooBoo is slow to place copyright-free works properly in the public domain no matter where in the world you are logging on to the internet, once they do (full-view), the work is more prone to being re-freshed/replaced and such is the case with this series from what I have gathered so far. — George Orwell III (talk) 22:37, 6 April 2011 (UTC)

I have created a few DNB articles. They have very good depth of treatment for the most part, which makes them useful for Wikipedia references, but the scans I have run into seem to be very poor quality. I found Hartlib, Samuel (DNB00) the most difficult. Details are on the talk page of that article. I resorted to Google for a scan of one of the pages. It must be the subject of an erratum, because the OCR text I was working with had a significant difference on one item from what I was finding in the Google text image. Judging from the Hartlib article, the text here seems to be the most up to date for v. 25. The Hartlib article has a numbered list of his works, and an item listed as a footnote here as a republication of someone else's work, appears in the list there as one of his own works. Bob Burkhardt (talk) 18:05, 20 April 2011 (UTC)

Several locations to help locate better scans…DNB Progress has a few files stored in that location to include volume 25. Other times they will be found as a link on the respective volume index page or the source description page. The better scans have been the priority for some time, and as they are located, they are placed in easy to find locations. As previously mentioned, AI and GooBoo have provided most if not all of the readable image files for the project thus far. JamAKiska (talk) 22:08, 20 April 2011 (UTC)
Thank you. These are good to know about. The Google copy is different from the Google copy I was using, but the text appears to be the same, and again slightly different from the text layer of the djvu page I edited which appears to be the result of correcting errata, though not by following the instructions from 1904 Errata which I have now appended to the article. Bob Burkhardt (talk) 21:42, 23 April 2011 (UTC)
You are most welcome. Glad you could join us on this lengthy effort. Some of the contributing editors used material from the re-issued 1908 and 1909 edition that incorporated the 1904 Errata. Appending the 1904 Errata to revised acticles helps readers verify the authenticity of the original or slightly revised text through 1904. Upon transfer from the original publisher, Smith Elder and Company, to ODNB in 1917, some of the articles underwent revisions based upon additional research. The discussion found at w:Talk:Henry Grey, 10th Earl of Kent provides some insight into the subtleties involved in the interpretation. JamAKiska (talk) 14:58, 25 April 2011 (UTC)
Found an intact volume 41 in Toronto and replaced existing file…unable to locate good replacements from that location for the remaining six files. Volume 25 in that location is actually volume 24 mislabeled by year and volume on the library cover pages. JamAKiska (talk) 22:19, 4 May 2011 (UTC)

End of vol. 3[edit]

Looks like there are issues with created articles, from around p. 380 onwards. I have fixed up a couple where the page range was shifted (needed increment of 2). But Bastwick, John (DNB00) might not be so simple. Charles Matthews (talk) 18:39, 10 May 2011 (UTC)

Adjusted links for article aligment. JamAKiska (talk) 23:15, 11 May 2011 (UTC)


A Plantagenet oddity[edit]

Plantagenet, Family of (DNB00) is a non-standard article. Nowhere else have I seen the kind of listing of cross-references that occurs after {{DNB JT-t}} on Page:Dictionary of National Biography volume 45.djvu/407. I haven't transcluded them into the article, partly for reasons of time, and partly because this is like "see" material we usually put on its own page. Charles Matthews (talk) 21:22, 17 August 2011 (UTC)

I think that it is a case of "it is what it is", and as we are just reproducing the work, we just reproduce it and maintain the integrity, and let others judge it separately. I would think that {{tl}|DNB lkpl}} would sufficient and that if it is not fully transcribed and linked, that it remains non-proofread until otherwise undertaken. — billinghurst sDrewth 23:07, 17 August 2011 (UTC)


Terminating dead [q. v.] links[edit]

I cannot remember where/what we decided to do with [q. v.] links that ended up with a dead link, ie. no biography was written for that person. I vaguely remember that we made no decision previously. At this point in time I see that we have three options:

  1. to leave the created links and create a terminating page under the name and the DNB header that has standard terminating text
    Positives -(not much) keeps all qv links as coloured links, can allow respective onward referencing
    Negatives - almost becomes misleading, and takes us away from the 'true book reproduction'
  2. to leave the created links but redirect them to a singular (generic) terminating page that says that no biography was created
    Methodology - creates standard landing page that has explanatory text
    Positives - keeps all qv links coloured; enables all names to be tracked; standard explanation, and can point elsewhere
    Negatives - almost misleading
  3. to undo the links, and use something like {{tooltip}} that underlines the text and has pop up text that says that no biography was created.
    Methodology - create a specific DNB template with standard text based on tooltip
    Positives - we are not creating non-DNB pages; undoing redlinks; we can track pages of the referring biography
    Negatives - we don't know the terminating links, without otherwise referencing the originating pages

I think that I favour option 3. Cleaner and simpler and remains truer to the original publication.

Do others see other options? — billinghurst sDrewth 01:09, 21 August 2011 (UTC)

There is an old thread, yes. I think I'd favour #2 at least as a first step. I like the flexibility of it, and once the links are there we can reconsider. There are numerous options really for wikifying; it's going to take some time to get a definitive style in place, if we ever do, and sending all the problems to one page initially seems OK. We can for example offer links onwards by doing section-anchored places on the page. Charles Matthews (talk) 09:25, 21 August 2011 (UTC)

Volume 46 duplication[edit]

I was looking forward to finishing volume 46 shortly, but there is a glitch in the page images I have just run into (entered on Wikisource:WikiProject DNB/Progress: a couple of duplicate pages not previously logged). So I'll stop working forward at this point, after today, and move to something else (so as not to create extra text that needs to be warehoused when the scan is updated). This is the first really bad alignment problem I have seen in a while. Charles Matthews (talk) 11:29, 24 October 2011 (UTC)

OCR digit errors[edit]

It has been pointed out over at Wikipedia that some digits are not being proofread carefully enough, and so errors can be propagated. I'm inviting User:Fram to contribute here with any examples, so we can learn more. Charles Matthews (talk) 10:25, 27 October 2011 (UTC)

In my experience "5" and "6" are often misinterpreted by the OCR software. -- Philip Baird Shearer (talk) 02:27, 28 October 2011 (UTC)
Definitely the 5<-6>, but also see more broadly S <-> 8 <-> 5; 1 <-> I <-> l <-> !; 7 <-> y, 11 <-> n +++. — billinghurst sDrewth 03:11, 28 October 2011 (UTC)

Wildman, John (DNB00)[edit]

The article Wildman, John (DNB00) consists of several pages. The join between the second and third and fourth and fifth is causing an unintentional paragraph break. Could someone have a look at fix it and please let me know how it was done here so that I can fix problems like this myself in future. -- Philip Baird Shearer (talk) 02:30, 28 October 2011 (UTC)

There is a generic problem recently introduced through all of Wikisource (see Scriptorium) with the addition of terminating line feed. I am running a bot through several times a week to fix, so it will get capture and resolved then, and we try to hurry WMF to fix the problem. — billinghurst sDrewth 02:58, 28 October 2011 (UTC)

Index missing from current page images[edit]

Vol 7, this is a bit of a nightmare image, here and the previous few pages do not match the text. Not sure what's going on. Rich Farmbrough, 22:12 6 November 2011 (GMT)

When our scan at commons is bad, you should access the "scans" page for the volume to see if we know of an alternate scan. For volume 7, we know of an alternate scan: see this page. to find it, go to the volume TOC (Dictionary of National Biography, 1885-1900/Vol 7 Brown - Burthogge.) From there, go to Access scanned source of Volume 07, and from there, try the alternates. -Arch dude (talk) 00:50, 3 January 2012 (UTC)


General update[edit]

The first six volumes are now complete. In terms of working from the front, that means the next big milestone is finishing letter B, which would take us to the early part of volume 8.

It also raises the question of the volume ToCs, given that listings for vols. 1 to 6 were posted some time ago. These days the raw material for a complete listing is in Wikisource:WikiProject DNB/Messy lists. As the name implies, there is a bit of work to do in creating a checked listing with the help of the existing listing, the messy list, and flicking through the pagespace version. But much less work than doing it all from scratch. There are 25 incomplete volumes of the first edition now, so finishing the listing work is within reach, really.

I have forward link and backlinked checked the volumes 1 to 6. One minute error found and corrected. Respective talk pages edited to record that audit has been undertaken.

There are 300 authors done now: the state of play on authors can be seen with this search. I have just checked the list, and there are numerous authors needing just one more article to finish. The template {{DNB contributor complete}} (which refers to listing the DNB00 and DNB01 articles) should be updated to {{DNB contributor done}} to change the tracking. Charles Matthews (talk) 11:09, 18 April 2012 (UTC)

By the way, 20,000 articles done now. Well done us. Charles Matthews (talk) 19:44, 3 May 2012 (UTC)
Shimmy us, shake that booty CM. smileybillinghurst sDrewth 10:43, 4 May 2012 (UTC)


Transclusion issue in vol. 8[edit]

I have marked vol. 8 complete, but there is actually a patch of bad transclusion running from Cadwaladr Vendigaid to Cædmon and I can't say I understand why right now.

Since the latest MediaWiki upgrade there have been a few new phenomena. Double rows #### #### occur at the bottom of pages, and produce a characteristic "1. 1. 1. 1." when transcluded. I suppose these could be found by bot. I'm currently unclear, speaking of the bottom of pages, whether the forced newline after </small> is still with us.

One reason some things are hard to see is (apparently) very aggressive caching of pages. Might depend on browser, I suppose: I use Chrome. Does some Chrome user know how to force refreshing in this context?

I'm going to look again at the particular vol. 8 issues. Some transclusion problems - and I thought I knew most of the tricks of the trade by now - can be caused by odd syntax earlier in the page. I've not see this happen so as to affect several pages, though.

BTW for a completed volume I now run two checks, one based on the "messy lists", and another simply clicking right through using "next". They usually bring up some issues to fix. Charles Matthews (talk) 07:13, 30 June 2012 (UTC)

I retranscluded it [15] and it took (note that I did lots of other attempts at fixes first. It all looks exactly the same in the code, though the diff sees something different, so all I can think of is a unicode issue. I will run my bot through to see if that is helpful catch-all. If we have other transclusion issues in a volume or feral code, the bot will get it if you show me an example. FWIW I have retranscluded them all with a fresh template, and it resolved, so who knows what the hell it was.
Re the #### as I use old tag form, I am seeing nothing different. I would have thought that the empty tags (####) would have been innocuous. Can you point to some examples?
Here's one from today: diff. Charles Matthews (talk) 19:48, 3 July 2012 (UTC)
Another anomaly: Page:Dictionary of National Biography volume 12.djvu/69. Lower text doesn't show up on that page, and the transclusion has some white space. Charles Matthews (talk) 08:18, 21 July 2012 (UTC)
As a note, I have been replacing <small> with {{smaller block|}} as it formats nicely and gets around some of the authority split issues. — billinghurst sDrewth 08:59, 1 July 2012 (UTC)

Thanks. I'll try to remember to move on from <small>, if you say {{smaller block|}} has positive advantages. Charles Matthews (talk) 06:50, 2 July 2012 (UTC)

To note that when an authority splits, one needs to use the {{smaller block/s}}{{smaller block/e}} combination one in and one out of the header/footer. However, I am catching some as I trail and validate pages. — billinghurst sDrewth 10:30, 2 July 2012 (UTC)

Termination of the Jones, John (d.1660) article[edit]

I have added a page "Jones, John (d.1660) (DNB00)", but for some reason the terminator

tosection="Jones, John (1645-1709)"

does not work as I expect. I have checked that it is not a hyphen ndash issue, but could someone else have a look, fix it and tell me what is the mistake I have made. It would also be nice if someone can check the text and if it is of sufficient quality set the two pages to Proofread. -- Philip Baird Shearer (talk) 14:11, 6 July 2012 (UTC)

You opened with one Jones and ended with start of the following Jones, leaving out <section end="Jones, John (d 1660)" /> altogether. Take another look at the Page: in edit mode to see where I added it.
...and fwiw - it still has a couple of typos that need fixing before I'd call it "Proofread". -- George Orwell III (talk) 22:19, 6 July 2012 (UTC)
Thanks, I'll keep it in mind for the next one I do. -- Philip Baird Shearer (talk) 18:57, 8 July 2012 (UTC)


Volume 25[edit]

Happily volumes 1 to 24 are now complete. Volume 25, however, is a bad scan: Wikisource:WikiProject DNB/Progress may not even detail all the problems. Working from the front now has its attractions, but I'll cease to add to vol. 25 for the moment, in the hope that someone would care to replace the scan. Charles Matthews (talk) 06:28, 11 October 2012 (UTC)

You'd have a replacement source file if IA would get around to processing it:
I uploaded that newly compiled PDF over two weeks ago. Somebody needs to move things along at IA so I can trim the resulting DjVu to match the existing source file on commons (that way hundreds of page moves are not needed). -- George Orwell III (talk) 03:13, 17 November 2012 (UTC)

Notes:

  • scan page 79 duplicated at positions /85 & /86
  • scan pages 80 & 81 missing from source
Everything created and/or transcluded after position /85 will need to be bulk moved/adjusted. -- George Orwell III (talk) 10:57, 21 November 2012 (UTC)
Update.

Internet Archive finally got their act together; the replacement is of decent scan quality and has a better text layer than the previous one(s). The important thing is all the pages are present now with no duplicates. The downside is everything between scan page numbers 280 to 375 (or positions /286 thru /381) has the text content offset by anywhere between -1 to -3 pages compared to the thumbnails. Once the text content in that range is aligned with their matching thumbnails, adjustments still need to be made to anything & everything previously transcluded into the main namespace.

Its a real mess (no surprise when your original was 3 pages short of the total 463 needed). -- George Orwell III (talk) 00:46, 22 November 2012 (UTC)

Thanks anyway. Charles Matthews (talk) 09:11, 22 November 2012 (UTC)

Update II

Bugzilla: 42466 has been rectified and the 'no-extraction of embedded text-layer' issue has been resolved. I've moved all the pages in the Page namespace to match the thumbnails now being generated from the replacement source file except for handful that were so incomplete or so scrambled, I just them left them where they were for the project to deal with. Again anything previously transcluded to the mainspace before this 2 month long mess will probably need adjustment to the pages command.

I can't wait to start replacing Volume 26 <sarcasm off /> -- George Orwell III (talk) 06:31, 7 December 2012 (UTC)

Thanks for everything. Charles Matthews (talk) 05:48, 9 December 2012 (UTC)

Volume 30 issues[edit]

After a run of Mpaabot in September, most of this volume is in the following state: the scan you see for a page is not the scan you see when you edit the page, which is the scan that matches the text opposite. I'm hoping that this problem can be fixed, so that the anomaly disappears, and there is just one scan per page. For example Page:Dictionary of National Biography volume 30.djvu/100 should show page 94 of the volume. The offset is six pages. From Page:Dictionary of National Biography volume 30.djvu/36 to Page:Dictionary of National Biography volume 30.djvu/38 there is an offset of four. There is an awkward skip from Page:Dictionary of National Biography volume 30.djvu/51 to Page:Dictionary of National Biography volume 30.djvu/52, bringing the offset down to four again. There is another glitch on Page:Dictionary of National Biography volume 30.djvu/59 and Page:Dictionary of National Biography volume 30.djvu/60. Charles Matthews (talk) 07:39, 14 November 2012 (UTC)

Not seeing any of that here. Page thumbnails match the original content generated back in 2008 even after the insertion of missing scan pages 28 & 29 (/34 & /35) in September 2012 into the source file. The Mpaa bot run of that same September just moved the 2008 generated content to match the new page progression after my source file fix (though many other issues still remain with the source file from 2008, the total page count for relevant content is now correct).

You probably suffer from the typical 'dead cache refresh issue' seen with large files uploaded 2 ~ 2.5 years ago or more. I'm sure if you entered edit mode on any of those pages you mentioned, you'll see the content matches that thumbnail just fine. -- George Orwell III (talk) 03:08, 17 November 2012 (UTC)

I'll have to take your word for it. Thanks for your efforts with vol. 25. The project needs to take DNB first edition to a state of complete biographies in a matter of a few months now. Whether or not they all are transcluded (they won't be, as far as I can see). Charles Matthews (talk) 09:49, 17 November 2012 (UTC)

Volume 30, among some others, could use a full source file replacement/upgrade as well but I hesitate to "open another door" with the volume 25 issue still unresolved. I'm not at liberty to use email as freely as most folks thanks to work restrictions but someone still needs to poke & prod them along or none of this will ever get done. -- George Orwell III (talk) 10:39, 17 November 2012 (UTC)


Volume 26[edit]

Well I had volume 26 ready to go for some time now but I held off until the bug affecting volume 25 had been resolved.

As of today, volume 26 has been replaced with a "better" scanned source file on Commons; duplicate scan page nos. 129 & 130 are no longer an issue & previously missing scan page nos. 15 & 287 are also now present. Mpaa has already answered my BOT request for the series of bulk page moves needed to re-align content to their new thumbnails in the Page: namespace & has BOT-ed all the needed changes to the main namespace <pages> lines for each affected work (the changes per affected range(s) being simple +1 or -1 throughout in this case). I cycled through and spot checked some of his results (I realigned previously out-of-order pages for example) and everything seems the same if not better now than before the file swap took place. A 3rd opinion still might be in order.

Vol. 26 was in far, far better shape than vol. 25 was structure wise, but both have a good number of pages also marked 'Problematic' - mostly because the previous source files had blurred or cut-off pages. These need to be re-checked and at least statused back to 'Not-Proofread' but I leave that up to the Project's discretion.

Once I determine the best available copy of Volume 27, it's swap-out should be done by the end of the week and off to address Volume 32. Prost. -- George Orwell III (talk) 21:52, 10 December 2012 (UTC)

Volume 27[edit]

As of today, volume 27 has been replaced with a "better" scanned source file on Commons. Mpaa has already answered my BOT request for the series of bulk page moves needed to re-align content to their new thumbnails in the Page: namespace & has BOT-ed all the needed changes to the main namespace <pages> line for each affected article.

This volume also has a good number of pages marked 'Problematic' - mostly because the previous source file had many blurred or cut-off page scans. These need to be re-checked and at least statused back to 'Not-Proofread'; again, I'll leave that up to the Project's discretion. -- George Orwell III (talk) 21:10, 14 December 2012 (UTC)

Volume 35[edit]

As of today, volume 35 has been replaced with an improved quality source file on Commons. No need for a series of bulk-moves or tweaks to the main namespace <pages> line as the less-than-a-hundred pages currently created (out of some 450 total) lined up perfectly. This was more about replacing a 5-year old embedded text-layer with one generated with the software available today. No OCR routime will ever produce perfect results but the amount of flaws per text-layer page were most certainly reduced based on my spot checks. That should save some time & effort at some point down the road. Prost. -- George Orwell III (talk) 14:33, 18 December 2012 (UTC)

Thanks. We're getting there! Charles Matthews (talk) 07:37, 19 December 2012 (UTC)

Volume 32[edit]

As of today, volume 32 has been replaced with an improved quality source file on Commons. A series of bulk-moves and bulk-deletions have been executed on the page namespace. This was due to the fact that at somepoint the page scan thumbnails no longer matched the default text-layer dumps. In the interim, many contributors took it upon themselves to copy & paste content to where they erroneously thought it should belong while in other instances they trimmed the beginning or end of adjacent entries completely out; losing that neeeded content in the process.

Every mainspace article present before the source file replacement has been salavaged with the appropriate tweaks to the main namespace <pages> line included.

All the unedited page namespace articles have been moved and/or deleted as needed in order to recover the improved text layer moving forward. Generally, the majority of the pages that were worked by contributors but never transcluded has not been lost in this process.

If any anybody knows of another volume suffering from the same scan-thumbnails-don't-match-the-dumped-text problem, please let me know ASAP. -- George Orwell III (talk) 05:39, 22 December 2012 (UTC)

Many thanks. Charles Matthews (talk) 07:12, 22 December 2012 (UTC)


Something amiss[edit]

Page:Dictionary of National Biography volume 25.djvu/108: I can't for the life of me see why the final section isn't showing up. Charles Matthews (talk) 08:11, 22 December 2012 (UTC)

Yes check.svg Done - you forgot to close one of the 'section end' statements with the needed " / " -- George Orwell III (talk) 01:09, 23 December 2012 (UTC)

I am permanently in your debt ... Charles Matthews (talk) 08:13, 23 December 2012 (UTC)

Volume 42[edit]

As of today, volume 42 has been replaced with a "better" scanned source file on Commons. This volume has a good number of pages marked 'Problematic' - mostly because the previous source file had many blurred or cut-off page scans. These need to be re-checked and at least statused back to 'Not-Proofread'; again, I'll leave that up to the Project's discretion. -- George Orwell III (talk) 08:43, 23 December 2012 (UTC)

Volume 36[edit]

As of December 31st, 2012, volume 36 has been replaced with a "better" scanned source file on Commons. This volume has a good number of pages yet to be created - I guess mostly because the previous source file had many blurred or cut-off page scans. Again, I'll leave simple Page: creation up to the Project's discretion. -- George Orwell III (talk) 08:43, 31 December 2012 (UTC)


Volume 33[edit]

As of today, volume 33 has been replaced with a "better" scanned source file on Commons. This volume has a good number of pages yet to be created - I guess mostly because the previous source file had many blurred or cut-off page scans. Again, I'll leave simple Page: creation up to the Project's discretion. -- George Orwell III (talk) 22:16, 11 January 2013 (UTC)

Thanks again for your efforts. And excellent timing - vol. 32 was finished yesterday, and I'll be getting on with vol. 33 today. By the way, for future reference, Andrew Gray was prompting me yesterday re the scans available via the British Library (see http://en.wikipedia.org/wiki/Wikipedia:GLAM/BL/Books); he says the DNB scans there are pretty good. Charles Matthews (talk)
You can replace whichever scans you like as you like - I wasn't planning on doing any more volumes unless I found something wrong with one. There aren't any more marked as needing replacement. -- George Orwell III (talk) 12:08, 12 January 2013 (UTC)