Wikisource talk:WikiProject DNB/Archive 3

From Wikisource
Jump to: navigation, search
Warning Please do not post any new comments on this page. This is a discussion archive first created on Error: Invalid time., although the comments contained were likely posted before and after this date.
See current discussion.

15,000 milestone[edit]

According to Magnus's gadget the 15000th DNB biography here is Wilmot, John (1647-1680) (DNB00). Category:DNB biographies has slightly different ideas; but let's not quibble about numbers. I've thought for a little while that this milestone is the best one we'll have to mark the midpoint of the project—as far as the creation of the biographies goes, that is.

So, a good time for a chat about how it should go from here, in consolidating the project. Certain things about the desired final state are not clarified; and some idea of strategy would probably help. I have various points to bring up. But would someone else like to start? Charles Matthews (talk) 20:01, 22 June 2011 (UTC)


I'm putting up a list of the unsigned (i.e. anonymous) DNB articles at Wikisource:WikiProject DNB/Unsigned. Most of these are in early volumes, and were because Leslie Stephen was being coy, and there are something over 300 in all.

There are a couple of definite uses for the list (checking that all articles are present as a complement to author pages; making up volume ToCs). But there is another issue: we currently have two styles in play, namely linking to Author:Anonymous and placing "no contributor recorded" in the contributor field. A third option would be to create a dedicated page in the author namespace for such a list, and link to that. I wonder what people think: for some purposes having such links would be a plus. Charles Matthews (talk) 09:36, 30 July 2011 (UTC)

I thought that we had decided to identify them as "no contributor recorded". They should appear on the category that we set up, though we do different if required. I probably need to go back and poke it, as we still had a number of those identified on that page that were there solely due to the field being omitted (ie. early transcripts). — billinghurst sDrewth 15:12, 30 July 2011 (UTC)
Category:DNB no contributor has the bits, and there is a subset Category:DNB See, which I also remember chatting about though again was waiting until we had moved all the pages that were not backed by scans to scans. — billinghurst sDrewth 15:34, 30 July 2011 (UTC)

Author pages[edit]

I have a couple of points to raise about what we do on author pages, one for now and one for the future.

Firstly, there was my theory that very long lists of DNB articles on author pages would prove unacceptable. Author:Thompson Cooper has, however, had 1400 DNB links now for a while, and I'm not aware of complaints. So it seems my caution was unnecessary, and the proposed solution of subpages of author pages isn't required. So I suggest that those subpages now be dismantled. NB that there is another solution, in fact what the Germans do for the ADB, their equivalent: a dedicated category. I wouldn't enjoy this while there were still redlinks to fill in.

Okay, we can start dismantling them, and putting them back into the root level. I will ask someone to run a sql query to generate a list.
I have repatriated the remaining subpages to the root level. Are we right to kill Template:DNBauthorsubpage or would you like it to remain. — billinghurst sDrewth 15:29, 17 August 2011 (UTC)

Second, there is the issue of the separation of the DNB00, DNB01 and DNB12 parts of the work on author pages. Having thought about it, in the long term, I believe it would help the reader not to maintain the distinction, but to have a single DNB list, alphabetical by name. There is a small twist to this, in that some amount of disambiguation might need to go on; this actually could be handled by piping, and needn't involve a new layer of disambiguation on the page. But I would see this as part of the final tidying of the work, when all redlinks are filled: the separation currently serves a useful function as we work through the volumes. (Also it comes to mind that the DNB12 effort will need fresh application, to get author pages up and list the articles to create on them.) Charles Matthews (talk) 06:51, 31 July 2011 (UTC)

Probably right, though one wonders whether a numerical count of biographies undertaken by volume would be helpful. Note that we can probably get someone to do a sql run once the vols. are complete. Re 1912, do we have a list of authors, and that probably means that we should review the templates against the requirements for all the linking templates. Probably worthwhile running a pilot to get our head around it. Also we can probably grab the author list from the volume, and work upon it. — billinghurst sDrewth 08:52, 31 July 2011 (UTC)

"Messy" lists[edit]

In case anyone wonders what is happening at /Messy lists, it is an attempt to do something about what is written at /Master lists by means of a CatScan search. In other words to scrape redlinks off about 450 author pages. I believe this is going to be useful in a few ways; but on the other hand there are also a few caveats to enter. I have been quietly working with some other "messy" lists over at letter P. There is the potential to cut down the time needed to complete the DNB00 volume ToCs, and also to pick up on some dab work in an economical fashion. I can explain more if required. Charles Matthews (talk) 13:36, 16 August 2011 (UTC)

So the longer lists by letter have been put on subpages of that page, to avoid the slowness of manipulating a page of nearly 400K. There are a few tasks that are immediate and in the nature of tidying. Where there are complete letters and volumes, any DNB00 redlinks on these new lists in those areas are caused by a mismatch of an author page redlink and the article title as created (can be a dab issue, typo, title convention point, and so on). So a bit of maintenance to do. Also I noticed that the search throws up some dab issues itself: for example Cooke, Thomas (DNB00) is the top hit, being linked to by five author pages. I can get these done out of the Fenwick handbook easily enough.

The main list should be redone over time, at least in part, to purge bluelinks as they appear, and to feed back the dab work. The main idea has always, in my mind, been to create volume ToCs from a "master list". Given a list for a volume in "messy" form, paging through a volume that currently has no ToC to create the listing is not going to be a very long task: certainly quicker than starting from scratch. This is where the caveats come in, of course. From the "messy" redlink list, you need to:

  • add bluelinks that are available on the volume ToC already;
  • add "unsigned";
  • ASCII-sort the whole list you have;
  • check with the actual volume text on ordering (not exactly ASCII) and proper dab.

There may be a few missing (e.g. errors of omission in author page lists, bluelinks that are not on the ToC). But after the pass through the volume there should be a good-enough volume ToC to post. Charles Matthews (talk) 09:22, 17 August 2011 (UTC)

Stats January 2012[edit]

Posted at Wikisource:WikiProject DNB/Statistics: we start the year at 63% done (figure includes the supplements in the 100%, so it is more like two-thirds of the first edition). It ought to be the case that 2012 is the year in which we get the DNB under control.

Anyway this is a reasonable moment to do some looking ahead and planning. I have a straightforward idea or five-year-plan, which I'll be posting and discussing mostly at the WP end of the project: take a volume complete here each month, and do adaptation and checking of that volume over at Wikipedia.

I don't at all suppose that a complete job can be done there, in a month, but new tracking pages can certainly be set up and some of the more serious stuff seen to. Such a VOTM implies a few things at this end also: a focus for validation and the checking of WP lks; definitive format for the volume ToCs; and a completeness check (make sure all the biographies are present). So if there is a task list for DNB VOTM/WS, what should be added to it? Charles Matthews (talk) 20:08, 2 January 2012 (UTC)

I've been slowly working through volume 4. At my current rate of progress, there is no way it will be ready by April. Sorry. -Arch dude (talk)

Not starting at the beginning, though. Letter A has had a lot of attention. I decided to try vol. 21 as more typical. See w:Wikipedia talk:WikiProject Dictionary of National Biography#Volume of the Month for the deal in the other place. Charles Matthews (talk) 20:41, 3 January 2012 (UTC)

Wikisource:WikiProject DNB/Most wanted articles[edit]

This page is now active, after a lull (material for a project around w:Nominate reports, by User:James500, prime territory for DNB additions). If anyone would like to tidy up what is there now, that would be great: I'm doing a daily pass at present to create articles. Charles Matthews (talk) 10:41, 6 January 2012 (UTC)

Mc, Mac and messiness[edit]

I have just added a note to Talk:Dictionary of National Biography, 1885-1900/Vol 35 MacCarwell - Maltby. Basically my standard method of using lists like on Wikisource:WikiProject DNB/Messy lists/M that are ASCII-ordered to build up the volume ToCs breaks down badly: volume 35 starts

MacCarwell, David
M'Caul, Alexander
McCausland, Dominick
McCheyne, Robert Murray
MacCluer, John

showing the old librarians' convention in action. Meaning that the alphabet goes M - Mc - N and both Mac and M' are sent to the normal forms McCarwell and McCaul before ordering. (If I have that right - User:George Burgess?) Anyway a wee bit of programming would come in handy here. Charles Matthews (talk) 11:24, 18 June 2012 (UTC)

I claim no particular expertise here, but I would normally expect all "M'"s, "Mc"s and "Mac"s to be treated alike. The DNB approach appears to be to treat them all as "Mac"s rather than "Mc"s- see for example "Macclesfield, Earls of" in the index between McCheyne and McCluer.--George Burgess (talk) 19:56, 18 June 2012 (UTC)

Ah, right. In any case if there is a "normal form", then it becomes comprehensible to a machine. My edit to the Vol. 35 ToC is probably not useful as such, and I'll go and revert it now: it will still be in the history. I'm working on letter C, by the way, with the aim of getting A to G solid, as P to Z is solid. Then we have the "endgame" for DNB00. Sorting out the volumes affected by the Macs is a subtask (I mean getting the listings straight) that might appeal to someone methodical who wants a break from proofing, or who would like to automate. Charles Matthews (talk) 07:21, 19 June 2012 (UTC)

I now believe the sorting thing can be done with find-and-replace and a template trick. So you can all relax ... Charles Matthews (talk) 05:35, 20 June 2012 (UTC)

Added three volumes of Second Supplement[edit]

To note that I have added the first two volumes of the Second Supplement (1901-1911), these being the only volumes that I can locate. I have also updated the substituted template {{DNBset}} that allows for its use for the first and second supplements. Addition of the parameter yr = 01 or yr = 12. If you notice anything wrong with the template then please get back to me. Similarly if you find a later volume. I would also be interested if anyone was able to find the 3rd supplement, as I believe that we can add it too. — billinghurst sDrewth 04:34, 9 September 2012 (UTC)

I believe vol. 3 of the 2nd Supplement is out there: see User:Charles Matthews/DNB referencing data#Adamant at the bottom for the key. NB that I can't read it, because in general UK readers can't get Google's versions of the DNB. But the external link tool shows that on WP there are references to it, at the version of the address. So I would expect our American participants to have access to this edition.
On the 3rd Supplement, I believe we are out of luck as far as public domain is concerned. It covers 1912 to 1921 (deaths) but was published around 1927. Charles Matthews (talk)

Pardon - I wish you guys would just come to me first re: American access to such works - especially for a project as important as DNB. At the same time, there a few DNB volumes long marked as needing source file fixing/replacement that I'd love to get out of the way (I've swapped in a better volume 60 from one of the lists on scan quality somewhere here or on WP already for example).

And yes, 2nd Supp., Volume 3 is available on GooBoo at least in two forms ...

... and there are probably more than just those 2 out there (as you know - the naming is frequently not spot on for every existing volume Google hosts).

Just give me a task and I'll report back to you with the available options here in the U.S. - even temp upload base files for your review if need be. I'd much rather take the time out to properly prep a file before it gets worked on rather than after ~600 pages are already in place and I have to squeeze in a missing page near the front or something. -- George Orwell III (talk) 23:35, 9 September 2012 (UTC)

The versions that we have looked and commented upon are at Wikisource:WikiProject DNB/Progress. To this point the second supplement has been way from a priority, until I needed it for an author page reference. :-) — billinghurst sDrewth 08:15, 10 September 2012 (UTC)
The list is OK I guess but most of those scans are coming up on being 5 or 6 years old from the date of conversion. A large part of those have since been re-done or superseded by later incarnations.

Anyway - volume 3 of the 2nd Supplement is also up --> Index:Dictionary of National Biography, Second Supplement, volume 3.djvu -- George Orwell III (talk) 05:45, 11 September 2012 (UTC)

I've made a working list of authors for Supplement II: Wikisource:WikiProject DNB/1912 authors. These are as found in the volumes: many are "usual suspects" from the previous volumes, and so can be blue links by correcting the name. Many should be easy identifications given the biographical clues there (FRS and so on). And some will probably prove tricky. But in any case, a start, and some of the template work can be done. Listing the articles will probably have to be done directly - the Fenwick handbook doesn't go that far.
PS: Why is John Henry Bernard under O? Because he signed John Ossory, as bishop of Ossory. Charles Matthews (talk) 09:29, 11 September 2012 (UTC)
Work on identifying these 1912 authors is now under discussion: Wikisource talk:WikiProject DNB/1912 authors. Charles Matthews (talk) 08:19, 20 January 2013 (UTC)

25,000 up[edit]

The November stats show 25,000 articles (actually a little more). By the way, the percentages on the stats page are calculated with 30,000 = 100%, which is enough to give the trend sensibly, but is only a round number. Anyway this milestone is the last before the first edition is done. Charles Matthews (talk) 21:51, 2 December 2012 (UTC)

Standing and clapping. — billinghurst sDrewth 23:15, 2 December 2012 (UTC)

New year report[edit]

Well, things are now looking in good shape. There are 26,800 articles created. The 1901 Supplement volumes are apparently complete (i.e. articles done), though I've not had time to check them through. And the major technical issues seem to have been covered. Only about 25% of author pages still need articles (that disregards DNB12, where we are still gearing up).

One point of interest is the relationship between Category:DNB No WP and the articles actually missing on enWP. The ever-helpful Magnus Manske worked out a quick way to order this category (about 11.5K articles) roughly by length. There would be various ways to go about that. This one isn't a tool as such. What Magnus does is to look at the number of pages transcluded into a given biography, and orders by descending order. The pages with >4 such pages already all had WP articles. After these were cleaned out, and I went through those spread over exactly four pages in pagespace, the results looked like w:User:Magnus Manske/dnb ws no wp. Something over 80% of the Category:DNB No WP articles with four pages transcluded did already have a matching WP article. That leaves 30-odd to do: our "longest missing articles". These are for the sister project to worry about.

Of course the situation is dynamic: new articles change it. But the percentage suggests that with the DNB match tool the category could be trimmed down now a fair bit. There a few months ahead of article creation here, but it doesn't seem premature to try to get a number representing the total number of DNB articles missing on enWP.

There are plenty of other maintenance tasks, of course, most of which will become easier when the articles are all here. Charles Matthews (talk) 21:28, 2 January 2013 (UTC)

Update: I have been doing some work on the "messy lists" and it has given me a number for the biographies still to do in DNB00: it's about 1800 now. Charles Matthews (talk) 10:17, 11 January 2013 (UTC)

Author pages — Contributors in later volumes[edit]

Starting to fix the author pages for the post 1900 contributors, as I fiddle through the remaining author components. I am seeking opinions on how we progress on the {{DNB contributor}} vs. {{DNB contributor done}} components. To separate or not? At completion of 1900, is the 'done' aspect necessary, or is that more an internal maintenance marker. I will also need to look at the DNB footers. When I am looking at both thee templates I will see if there is a neater way to have both the author page contributor marker, and the article footer components. Our template skills have improved and they probably can be done a lot nicer, even if they have a better underlying template. Comments welcome. — billinghurst sDrewth 16:53, 12 January 2013 (UTC)

Currently, I actively use {{DNB contributor complete}} in a maintenance role, as well as flipping it over to {{DNB contributor done}} when the articles are all there. It should be the case that when the first edition DNB is all posted here, and all the author pages are checked for redlinks (which would indicate wrong titles, therefore), that the distinction between these templates will no longer be necessary. In other words {{DNB contributor done}} could then be redirected to {{DNB contributor}}. I suppose we should be thinking in terms of a complete redesign of the DNB on author pages, anyway. Charles Matthews (talk) 05:15, 13 January 2013 (UTC)

Category:Dictionary of National Biography contributor templates conflicts[edit]

As we get further into time and volumes, we are starting to run into issues with {{DNB footer initials}}. Our initial response was to convert these to convert the conflicting initials to disambiguation pages, eg.

However, we have not been wholly successful

It got worse with the first supplements, and it will only get worse with the second and third supplements. The solutions as I see it are:

  1. to continue to disambiguate; and with every new edition to create new diambigs and to update all pages as required (a bot run will work); OR
  2. to add a volume parameter to the template that is affected, in essence {{DNB FL}} flips out the right volume. Typically, we would default the names to the initial 63 vol. work so there is no need to amend them now, or we could always bot run them.

Neither solution is perfect, depends on what is the community's view of what they want to see and achieve. OR have to remember to do. — billinghurst sDrewth 05:25, 24 January 2013 (UTC)

I see what you are saying. But I'd suggest we stick with option 1. Unless there is a sudden shift of 5 years in the base year for US public domain, we are only talking about DNB1912. As far as I'm aware the problem is not overwhelming. Charles Matthews (talk) 21:12, 24 January 2013 (UTC)
The original disambiguation templates looked like {{DNB EC}}. These were intended to tell the editor to replace the template with the appropriate DAB as needed. I thought that I had found all of the dabs for the original 63 volumes, but perhaps not. Use of a volume parameter with a default is probably not a good idea, because it does not warn the editor in any way if there is a mistake. However, there may be a way to pass a parameter from the article header into the contributor template. If so, we can simply alter the header template to seed the volume parameter, and then use that parameter in our ambiguous contributor templates. This is beyond my current wiki-fu, but I seem to recall that such a capability exists. -Arch dude (talk) 23:03, 26 January 2013 (UTC)
Basically, we need to add logic in the {{DNB00}} template and our other header templates (if we are brave and arrogant enough, we can add this to the {{header}} template) that says something like
set the "work" paramter to "DNB00" and set the "volume" parameter to vol
and then, to each ambiguous contributor template, add
if the "work" parameter is "DNB00" and the "volume" parameter is 50, use "Harvery Schmedlap", else use "Herkimer Smoot"
or whatever logic is needed. -Arch dude (talk) 23:22, 26 January 2013 (UTC)
This wasn't a criticism, more just a development, especially as we move beyond the original volumes, and that our concatenated source of contributors has proven to have errors. I am not looking to hack DNB00 as it doesn't directly effect the footer initials template, and trying to get that relationship with DNB01 and DNB12 is just asking for trouble when the fixes are simpler, especially when it won't show that way in the Page: ns, and writing that bit to pull the volume information is ... MEH! There are two easy solutions, one requires the mistake approach of "Oh, damn that is disambiguated" make the change, or adding the volume information to every footer template. CM has indicated the former suits better, and that is okay with me. Just need to work through the process to do it. — billinghurst sDrewth 04:20, 27 January 2013 (UTC)

Added anchor = parameter to {{DNB link}}[edit]

I have finally got fed up with not having the ability to link easily to a sub-article, and the desire/irritation/need was strong enough today to get around to adding that facility to the template. Added and documented. In short, if you want to get to the sub-article, anchor = whatever is the anchor used. — billinghurst sDrewth 12:53, 29 January 2013 (UTC)

Last big push on DNB00[edit]

As of right now there are 28528 DNB00 and DNB01 biographies posted. That leaves something over 500 to go: Wikisource:WikiProject DNB/Messy lists would say about 510 but that underestimates because of non-disambiguated names. In any case the first edition is 98.5% done. (The completions page is not updated - no time to check volumes!) Charles Matthews (talk) 09:25, 1 February 2013 (UTC)

Getting the place tidy for visitors. I came across a biography with {{incomplete}} today: apparently this is unique, though. According to this there are currently 18 instances of biographies carrying {{migrate to djvu}}. Charles Matthews (talk) 11:40, 3 February 2013 (UTC)

I will track down the incomplete and migration and work out what needs to be done. — billinghurst sDrewth 12:21, 3 February 2013 (UTC)
Yes check.svg Done "migrate to djvu" and you noted the incomplete. — billinghurst sDrewth 00:19, 4 February 2013 (UTC)

The incomplete one is Lowe, Hudson (DNB00). I guess I'll fix it later today, since it is among the batch I'm working on.

It occurs to me that remaining redlinks on volume ToCs are confusing. But they could be commented out: good reasons not to remove them without discussion. Charles Matthews (talk) 16:00, 3 February 2013 (UTC)

I would prefer to only do what is correct to fix them (after consideration), than just react to redlinks because we are reaching a milestone. So we probably should have that discussion. I am so hating the old header template that we hacked below {{header}}. I just haven't had the priority to fix them. — billinghurst sDrewth 00:19, 4 February 2013 (UTC)

OK, I believe we are talking about material on the volume ToCs placed there in imitation of the ToCs that come on pages at the end of each DNB volume. I suppose, though I don't know that we have discussed this at length, that the volume ToC should eventually offer the reader links comparable to a hyperlinked version of those pages? Vol. 34, for example, seems to have a quite full version of the "auxiliary" information. I fixed a couple of corresponding redlinks there, which were simple changes in the link. Can't be any objection to that. In Vol. 33 as it stands the entry that is in wikitext

*Lemens, Balthazar Van :See: [[Van Lemens, Balthazar (DNB00)|Van Lemens, Balthazar]]

is just a bit puzzling because the actual article is at Van Lemens, Balthasar (DNB00) which is a different spelling. There is an actual "See" page for that one at Lemens, Balthazar Van (DNB00), and to fix the "van Lemens" link on that one I changed the piping on Page:Dictionary of National Biography volume 33.djvu/31 to the "s" spelling, as one would. But the volume ToC currently doesn't link to the "See" page, it tries to reproduce the same effect.

A full case analysis is not beyond our powers, and ideally all the redlinks go away with enough work. We haven't really decided on the status of the "See" articles in the project. For me they add value in hypertext terms, but among hypertext issues are not the highest priority (they certainly can be useful - for example in finding the surname relating to "Lord Hunsdon", Hunsdon, Lords (DNB00) came to my rescue).

So we can tackle all this (there seem to be half-a-dozen types of "auxiliary" volume ToC entries) with enough patience. My point was simply that in cases like Vol. 36, where the articles all exist, there are redlinks that are not explained as such. An outsider might be puzzled when told the volume was complete. An alternative way is not to comment out anything, but to add a template to such pages to the effect that redlinks are in process of being sorted out. I would not actually want to remove such redlinks when they are there for some reason. Charles Matthews (talk) 07:12, 4 February 2013 (UTC)

We must be clear on the distinction between the ToCs and the DNB00 index pages. We have total control over the ToCs: these do not exist in the original 63 volumes. We created these as a rough analogy to the abiltiy to thumb through the physical volumes, and we have total control over their content. We have three distinct issues to address:

  • disambiguation entries in the DNB
  • creation of index articles equivalent to the indexes of each volume
  • disambiguation entries in the Toc
For the first: if there is a "See" entry inline in the DNB, should we have an article?
For the second: should we have a faithful index article for each of hte 63 volumes, and what, exactly, should it look like?
For the third: When the DNB has a "disambiguation article" what should the ToC entry look like?

As the project has progressed, I have come to the conclusion that we have not been rigorous enough. If I had to do this all over again, I would to the following:

  • Each dismabiguation entry in the DNB would have a separate article
  • The ToC entry for a DAB would simply point to the DAB article
  • There would be an index article for each of the 63 volumes.
    • The format of the index article is fairly complex. It links to both article space and page space, therefore violates the space separation.

All of these refinements are secondary to completion of the basic work of completion of the DNB transcription. I am in awe of the work that has been done. -Arch dude (talk) 03:52, 5 February 2013 (UTC)

Pictogram voting comment.svg Comment Some of the red links become redirects, those "See" that are sub-articles can now be directed to the article since I added an anchor = parameter to {{DNB lkpl}}.
Re disambiguation articles, I disagree, as we need to look at this in the context of all enWS, not just this work. enWS has a specific approach to disambig and it encompasses this, AND we are not a book, we link to articles (internally or externally from enWP or implicitly from search engines), or people type in the search box and meet the ajax look ahead function. the numbers who will free type a name with the DNB00 suffix would be absolutely minimal.
Reproducing the indexes from the work will come, they are just the lowest priority, going to be complex, IF we choose to link them, there will be some complexity. They will not link to the Page: ns as that has been a determination to the community, and I don't see the point of it for the page numbers as the article links already to do that to the top of the article. — billinghurst sDrewth 05:55, 5 February 2013 (UTC)
General point: we are right now on the cusp between "heavy lifting" and "intense curation". Getting the last DNB articles up is a matter of days away. I have been trying to list the remaining "issues" in some sort of orderly way, and there are more than a dozen points. So we shall have to chip away at a long agenda. Perhaps a fitting way to start would be to archive this page next week, sorting out the active threads. Charles Matthews (talk) 07:38, 5 February 2013 (UTC)

The final article in the first edition went up yesterday. Charles Matthews (talk) 11:48, 11 February 2013 (UTC)

Volume 31 update[edit]

As of 10 February 2013, volume 31 has been replaced with an improved quality source file on Commons. No bulk-moves or bulk-deletions were neeeded nor any changes made to existing mainspace articles.

There were, however, a handful of Pages: previously marked Problematic among a dozen or so others mrked un-proofRed - all most likely due to the prior thumbnail scans being blurred, cut-off and the like (the content itself looks like its been processed though). Knocking out the current 20 or so remaining red pages would bring the Index: into the rare Validation phase for someone who likes the taste of low-hanging fruit. Another volume fix in a couple of hours... -- George Orwell III (talk) 09:18, 11 February 2013 (UTC)

Volume 40 update[edit]

As of 11 February 2013, volume 40 has been replaced with an improved quality source file on Commons. Two prior duplicate pages no longer exist and the bulk-move to correct for that has been completed. All mainspace articles in the affected Page: range have already had their pages tag-lines adjusted as well.

Dozens and dozens of Pages: previously marked Problematic, among a handful of others marked un-proofRed - all most likely due to the prior thumbnail scans being blurred, cut-off and the like - still exist. The content itself looks like its been processed from 3rd party sources however so the remaining proofreading needed should be "light".

Now that the Project is technically "Live", I'd hate to see any uptick in visitors as a result come across something as unusual to the unfamilar as the Problematics may be than they really should have to, but I leave addressing them up to the members to prioritize. -- George Orwell III (talk) 23:36, 11 February 2013 (UTC)

All Articles? Great![edit]

Charles informs us that all articles in DNB00 are now present. This is a huge milestone: congratulations all. Should I now update the boilerplate in the main article and the boilerplate in the "access to scanned articles" template? -Arch dude (talk) 23:59, 13 February 2013 (UTC)

After reading the announcment at the UK site, I decided to just make the changes. -Arch dude (talk) 01:31, 14 February 2013 (UTC)
And why not. Charles Matthews (talk) 08:19, 14 February 2013 (UTC)

A broken header[edit]

Something went wrong HERE. Could someone please fix it? --P. S. Burton (talk) 01:38, 15 February 2013 (UTC)

Le Marchant, John Gaspard (1766-1812)_(DNB00)[edit]

Le Marchant, John Gaspard (1766-1812)_(DNB00) is proof read but there seem to be a lot of OCR errors in the text. Also at least one of the page joins needs fixing. I am working on other things at the moment so I am posting here in the hope that someone else has time to fix the errors. -- PBS (talk) 17:04, 22 February 2013 (UTC)

On the bigger problem of validating the work now posted, we basically have no slick solution: dozens of people have contributed to the project, and we AGF all round, naturally. Here's how I see it. There is the digitisation on the ODNB site, of a later edition. It can certainly be used to patch our versions. And we should be doing plenty of volume-by-volume passes through the DNB, for specific tasks such as hyperlinking. So in a piecemeal way many of the problems will get picked up. But can't we be more systematic and smarter? Well, yes, perhaps.
Here's the deal. If there were not the "spurious linebreaks" in the text we have, some semi-automation could be used to display the diff between our text and the ODNB text. For example in this case the linebreaks had been fixed by the proofreader. Also no change had been made from the first edition (ours) to the ODNB edition (in effect 1912, but there may have been a few tweaks later). So it wasn't hard to replace the text, at the cost of redoing a little format.
If the linebreaks are still the "OCR legacy" type, then "Show changes" doesn't display a proper diff, which makes it hazardous to use ODNB text. It is possible to remove said linebreaks by find-and-replace, but then you lose the para breaks that are needed. So marking the paras by tokens, replace the linebreaks (with due allowance for hyphenation), put back the paras by replacing the tokens, is one pass that fixes the essential problem. Then "Show changes" is OK to monitor that nothing improper to the first edition gets added in - it would be a bad idea to correct one load of issues just to introduce others (though generally in the readers' favour, which is why I have used ODNB text in proofing).
To sum up, to get a more systematic validation path, we can either try to get onto the linebreak issue; or (and I think it gets interesting here) is there a technical fix, a more intelligent way to display the diff that can cope with linebreaks that are not after "."? The latter doesn't sound so much out of reach.
Charles Matthews (talk) 14:04, 24 February 2013 (UTC)

DNB12 authors[edit]

To be specific, missing authors from the first volume of the 1912 supplement. There is a page up at Dictionary of National Biography, 1912 supplement/List of Writers which transcludes the pages listing authors by their abbreviations. It would be traditional to put the detailed discussion on Talk:Dictionary of National Biography, 1912 supplement/List of Writers. I have just done a pass to pick up some obvious identifications and typos for the first volume there; it leaves 59 redlinks. Some of those at least are not at all hard.

This is topical because there is some article creation now proceeding for vol. 1. NB that there is an author page template {{DNB contributor 2ndSupp}} and if you look at backlinks to Template:DNB contributor 2ndSupp you'll find quite a number of new author page creations for DNB12 by User:Dsp13. Dictionary of National Biography, 1912 supplement, Volume 1 for recent action on the biographies. Charles Matthews (talk) 08:40, 9 April 2013 (UTC)

Author abbreviated L. M. M.[edit]

How do we know that Miss Middleton abbreviated L. M. M. is Author:Lydia Miller Middleton? Particularly as she married Sir Middleton to get that surname. At Page:A Dictionary of Music and Musicians vol 3.djvu/10 I have a Miss Louisa M. Middleton who is referred to in this publication as the contributor to DNB. Beeswaxcandle (talk) 04:34, 23 May 2013 (UTC)

You appear to have a very good point. It is Lydia Miller Middleton in Gillian Fenwick's Contributor's Guide. But no supporting references there, I believe. Charles Matthews (talk) 08:37, 23 May 2013 (UTC)
Married 1890, so that isn't a help. I'll see what else I can dig up. — billinghurst sDrewth 14:46, 23 May 2013 (UTC)
Actually, it does help us, when I fully reread the statements. The first LMM in the DNB is 1889, which is a year before the marriage. So, I do find Louisa Middleton, b. c1855 Calcutta, who is listed in the 1891 census as having occupation of Literature/Author; and from 1861 census is the daughter of a Calcutta merchant. In school in Scotland in 1871, still looking further. — billinghurst sDrewth 15:24, 23 May 2013 (UTC)
It is worth noting that the ODNB website gives the name as "L. M. Middleton". Charles Matthews (talk) 06:06, 30 May 2013 (UTC)
I think that we should be moving these over to the alternate author as provided by BWC. Many of them are musicians, so the evidence tends to point to the alternate view. — billinghurst sDrewth 12:02, 24 November 2013 (UTC)
Agree, a good catch. Charles Matthews (talk) 08:46, 25 November 2013 (UTC)
Yes check.svg Done and putting a copy at Author talk:Lydia Miller Middletonbillinghurst sDrewth 13:02, 25 November 2013 (UTC)

To note Author:Louisa M. Middleton

DNBmatch on Labs[edit]

Magnus Manske is migrating tools to Wikimedia Labs. The DNB match tool is done:

The look is improved - note the query form at the end that includes two other works. But the big plus is that the performance is much better than we've had to put up with on the toolserver. if you have requests for other tools.

Charles Matthews (talk) 18:40, 1 June 2013 (UTC)

The maintenance tool has also been ported:

Charles Matthews (talk) 13:32, 3 June 2013 (UTC)

And a new tool ...[edit]

The stability of Labs has made possible something that previously was only a "cunning plan":

shows WS:WP, ie the ratio of lengths (in bytes) of DNB articles here and the linked article on WP. Once again, our thanks to User:Magnus Manske.

The top hits have ratios over 100, and there is a reason for that, namely the "linked article on WP" may currently be a redirect. I think this is useful at the moment as a maintenance feature. We should make the "wikipedia=" field point to the actual title.

The WS version used is cached, while the WP version is live. That means that the updated articles will stay at the top of the list until the cached version is refreshed. I'll post more about that when I know more.

As of right now, I have done the top of the list down to Scotus, i.e. the first ten. Charles Matthews (talk) 11:48, 3 June 2013 (UTC)

Update. Re the caching, a tweak has been done, and the refreshing issue has gone away. Charles Matthews (talk) 11:55, 3 June 2013 (UTC)

Further update. After some work at the coalface, the tool is now providing the intended data. See w:Wikipedia talk:WikiProject Dictionary of National Biography#Ratios tool for a run-down. As a by-product maintenance on the "wikipedia=" field here can be done fairly painlessly. Charles Matthews (talk) 07:37, 24 June 2013 (UTC)

Matching pass[edit]

I have just completed a pass through the whole alphabet with the DNB matching tool. As a result, Category:DNB No WP now stands at just over 9K articles, i.e. under 70% of what it was when the DNB00 and DNB01 articles were completed. Matching and linking to WP is not only good in itself, it paves the way to further automation for the project as a whole.

The tool has quite a few quirks. Generally it suggests "false negatives", which is OK if one is aware: false positives are much more troublesome. Some remarks:

  • It doesn't cope with hyphens in names. Names that are hyphenated are worth checking by hand.
  • It doesn't cope with apostrophes, e.g. in O'Brien. I have just worked over the Irish names of this kind, by hand.
  • Singleton names, e.g. Osmund, are confusing to the tool.
  • It may transpose, e.g. "Lewis Thomas" when you want "Thomas Lewis".
  • The treatment of disambiguated names is rather uneven, so you may need to find the main dab page from the hit given.

All in all, my pass will not have picked up everything. There is quite a bit more, and I'm starting a second pass now.

Category:DNB No WP has some pre-made searches and there should be more. While the performance of the tool at Labs is much better, it probably fails when asked to search more than 300 to 400 names. So initial pairs of letters are good.

NB that the Epitome lists on enWP, such as w:Wikipedia:WikiProject Missing encyclopedic articles/DNB Epitome 01, are a separate operation. Reconciling those lists fully with the "wikipedia=" field here would be a good idea, but currently would be labour-intensive. Charles Matthews (talk) 10:31, 27 June 2013 (UTC)

Tracking page on WP[edit]

More the business of the other end of this project, but the discussion at


and direct link to


may be of interest. The technology makes tracking a project of about 30K at least something that can be attempted. Charles Matthews (talk) 12:33, 4 July 2013 (UTC)

Fourth Supplement on Internet Archive[edit]

Here, and it says it is not in copyright. I have to assume that's a mistake. Charles Matthews (talk) 21:42, 23 August 2013 (UTC)

Tools page[edit]

There is now a project page Wikisource:WikiProject DNB/Tools about the special DNB tools. Charles Matthews (talk) 07:42, 24 August 2013 (UTC)

DNB12 remaining authors[edit]

The discussion of authors has gone on spread over various pages, in a scatty fashion. There are currently five hard-core redlinks, at Wikisource talk:WikiProject DNB/1912 authors#Remaining authors. Some other author pages, out of the 300-odd needed for DNB12, have been created as "details not available", though.

Disambiguation of the author initials of the DNB12 authors has been proceeding, not quite complete. Charles Matthews (talk) 07:12, 4 September 2013 (UTC)

Really just two tough ones left. Author:A. L. Armstrong is definitely a connection of the Harcourt political family, but I don't know more. Author:D. J. Owen wrote three biographies of mathematicians; there is no reason to connect those to the David John Owen of the London Port Authority, author on ports. So he may be a connection of Author:W. B. Owen who was on the staff in 1912. In which case there is a candidate who was of the same age group of graduates, I believe, but hard to say much more.

There was the Northern Ireland historian [Sir, David John OWEN (1874 Mar 8 – 1941 May 17)] who is a possibility. Re Armstrong I can only find one biography and that usually indicates a personal knowledge, though I cannot find a relationship to this time. — billinghurst sDrewth

These are the last few discussions for around 800 authors.User:Charles Matthews/Companion theorises about one way to make a scholarly work from all the research that has been done, allowing for future changes of mind. This has been suggested before: in 2014 we should get down to "next steps", when all the DNB text has been posted. Charles Matthews (talk)

DNB12 milestone[edit]

The last of the DNB12 articles is now posted. User:Slowking4 did the bulk of the second supplement. That's all the public domain DNB articles, for now.

There is of course a large amount of checking still to do. There is a lot of work left round the edges: tables of contents, and "see" articles, in particular. There are hyperlinks to add, and the article start and page end format issues to pursue. There are still scans for DNB00 that should be replaced.

I have thoughts about author pages. I have posted them to get an outside opinion, at User talk:AdamBMorgan, to see if they fit into a sitewide style guide. Charles Matthews (talk) 10:27, 5 November 2013 (UTC)

So, with Adam's help, there is now something to look at, Author:Frank Herbert Brown in a proposed format. Note that the detailed proposal includes the idea of putting the supplements in a single alphabetised list, rather than at the end; and rationalising the contributor template family.

Please weigh in with any views. What I'm suggesting affects other works than the DNB, so if people here are happy, I'll mention it on the Scriptorium. Charles Matthews (talk) 05:51, 6 November 2013 (UTC)

Format dilemma[edit]

Not so accurate a title: "format inconsistency" would perhaps be better. I'm doing a pass, which will take a while, formatting the initial words of the articles, which allows me also to troubleshoot a few other things. Some of the supplement volumes don't bold the surnames. I assume that when I get to those, I also don't bold the surnames. Charles Matthews (talk) 10:39, 28 November 2013 (UTC)

IMNSHO just bold them. The intent of the authors hasn't change, and it would be an oversight by the typographers. — billinghurst sDrewth 12:31, 28 November 2013 (UTC)

Fixed page width[edit]

As someone who often reads DNB pages on Wikisource, I think that the new fixed page width is less than helpful.

The format forces a line of text to a specific width and on a wide screen device it is like reading a newspaper column. It involves lots of scrolling because my web window is filled with white space which previously contained text. Conversely if I am reading it on a small screen device that is narrower than the text, instead of adjusting the text to fit the screen, it forces the reader to scroll to the width set by the text formatter.

If this format replicated the layout of the physical DNB pages then there would be some justification for it, but it does not.

As far as I can tell from this discussion page, this change in format was not discussed, so please put the format back to how it has been for many years until it is shown that there is a consensus for the change.

I am against this new fixed width format. -- PBS (talk) 13:15, 30 November 2013 (UTC)

See also Wikisource talk:WikiProject 1911 Encyclopædia Britannica#Fixed page width it seems that this change affects more than one project and that this change was not trailed on that talk page either. For the same reasons as given here (including not replicating the original layout), I do not think the change go a fixed with page format is an improvement for EB1911 pages. -- PBS (talk) 13:29, 30 November 2013 (UTC)
Archived Wikisource:Scriptorium/Archives/2014-01#Fixed page width -- PBS (talk) 12:21, 22 February 2014 (UTC)

Standard abbreviation for £ ?[edit]

I've been working on some articles in Wikipedia which are based on material from the DNB and have found what appear to be monetary amounts expressed as "200l". Can that be rendered as "£200", or what does it mean? Thank you. SchreiberBike (talk) 01:15, 21 December 2013 (UTC)

In books of this period sterling currency was often referenced in terms of l.s.d. from the Latin librae, solidi, denarii or pounds, shillings and pence. So, yes, you can treat 200l. as typologically equivalent to £200. Beeswaxcandle (talk) 01:28, 21 December 2013 (UTC)
Thanks. Keep up the good work. SchreiberBike (talk) 03:28, 21 December 2013 (UTC)

End notes[edit]

Here is a crazed idea, which might turn out not to be useless or totally barking. Someone with a bot to extract the end note sections (within [ and ]) from all the DNB articles, parse them at the semi-colons, and then alphabetise the lot.

I was thinking how interesting it would be to know what the major references used in the DNB are. But equally, at this point, it occurs to me that the endnotes are a major source of remaining typos. The reason being fairly obvious: the smaller text may defeat the OCR and human proofreader alike. A typical typo is, say "Diet." for "Dict." abbreviating Dictionary, or just "Hist," for "Hist." with a punctuation error.

Some of the endnotes do have sentence structure after a period, rather than just ";" separators all through. Many, of course, will have ":" for ";" as separator as a typo, and that would show up on a listing. Some endnotes may not be properly within [ and ]. It would be worth working over these issues first, and iterate, to get a cleaner listing. (On a technical note, the wikitext uses a number of different techniques to get the small text: there may be a preliminary issue to solve here.)

So the output would be of the order of 500K entries, I guess? All that is required is a bulleted list, with the bit of endnote followed by a link to the article where it is found. Charles Matthews (talk) 09:58, 22 December 2013 (UTC)

Consideration to moving all DNB works to subpages of respective Dictionary of National Biography editions[edit]

This has been something that I have been long considering, however, wanted to leave it until the work was completed. Now it seems with WS ↔ WD coming, it pushes it high up my alert list (that mentioned later).

From early days the DNB was transcribed as pages named Name, Name (DNBnn), and it was a little contentious even at the time, however, we started without scanned volumes and it was simply what eventuated. [Long story and it is in the WS:S archives (2008?) if you really want to know what happened] What I would like for us to now do is move all these biographical works to being subsidiary to their respective publications, i.e.

(Noting that I am comfortable ignoring their respective volumes)

The community has (long) been looking to keep true to the published work, and to have the work at the root level, and any (split) components of the work as subpages. The advent of scans allowed that in easier and logical sense, and redirects allow us to direct to subpages where considered desirable. Other biographical works DMM, DAB, IndianBio, CE, SBDEL, IrishBio were able to be configured that way from their beginnings.

What puts this into TO CONSIDER basket, and with some urgency, is the forthcoming inhalation of Wikisource data to Wikidata, and how and what do we interlink, root pages vs. subpages, etc. At the moment with all DNB pages sitting at root level they will all be inhaled, and that probably not what is the neatest and best way to do that with DNB biographies, but it is something that is wanted to be done with the compendium works.

Mechanically, this would mean moving the works to their respective subpages, and leaving redirects in place. There is a little downside with the category listings as they will be the extended/long name (note that there may be a solution for that with mw:Extension:SubPageList which I hoping that we can test somewhere. Bugzilla: 59762. There should not be any requirement to have any other changes (maybe need a defsort put into the header template). Templates are fine as they are; links are okay, and all those bits should work. — billinghurst sDrewth 11:38, 7 January 2014 (UTC) @Charles Matthews, @Beeswaxcandle, @George Orwell III, @Arch dude, @Hesperian, @Mpaa, @JamAKiska:

  • I support this in principle and always have, but wouldn't be comfortable proceeding without Charles' blessing. Hesperian 12:00, 7 January 2014 (UTC)
  • So this is about having everything as subpages, rather than using suffixes? That for me is far worse for the reader. I use the DNB all the time by typing in a surname, in the search box, when I get a prompt in the form of a drop-down list. Very useful for research. So I would argue against the subpage restructuring on those grounds alone.
    • The redirect names will still exist, it just they do not show in the type ahead functionality. I will ask the question of Nik, what capacity there is to have redirects appear in redirects, is it just a config issue, or a current impossibility that would need to progress via a development request. I would argue that while you may search that way by surname first, I am not sure that the common WP reearcher is that aware that we have names back-to-front to how they use them there. Also you can just go the next step and hit enter and get a list of results to your query. So I understand your issue, as I use it a lot for disambiguation. In thinking about this, if push comes to shove, I could just ask the bot operator to skip any page with a {{DNBxx}} header. I don't see that they want all the loose subpages, tehy definitely not in first phase where they are looking at interwikis.
  • Which is to say we need to be looking at this from another angle. I'm aware of Wikidata's needs and there is a DNB (actually ODNB) related tool-based initiative. The drop-down lists are a big plus for humans: the subpage titles may be for machines. For the best of both worlds we surely need a rational policy on redirects, with what goes to what the crux. Charles Matthews (talk) 06:28, 8 January 2014 (UTC)
    We have a redirect approach, mindfully redirect as required where a work could be at the root level and you want to point to its subpage location'. Redirects are cheap to the system. The subpages approach would allow for us to build a tool to more easily search within the DNB, either for authors or for content, which is a little hard at this point of time as the works are not structured to be search as a collective. (he says without having fully explored the new search componentry). — billinghurst sDrewth 09:29, 8 January 2014 (UTC)
  • Could we just step back a bit? We can obviously have a definite policy on 'aliasing', which would be a way of talking about certain navigational options. Which would be good also to have site-wide. I have mentioned a point about what I find user-friendly. There must be other points, such as "disambiguation" and "similar pages", that relate to organising the material on the site better. User:Arch dude actually raised the suffix issue with me a while back. So I know there are others who think the same way as you about it. If there is to be such a policy, I'd like a discussion of what it is enhancing and what it could enhance. The point about redirects being a cheap way to do things is exactly my own point of view, in fact. Charles Matthews (talk) 16:36, 8 January 2014 (UTC)
We do have guidance and it is at Help:Redirects and Help:Disambiguation. Lots of general discussions about policy/guidance/thought bubbles have been had and presumably will continue to be had at WS:S. — billinghurst sDrewth 13:04, 9 January 2014 (UTC)
While I'd usually support something like this in principle as well, I'm not sure what the anticipated end-product would look like here and to what end. Is each mainspace subpage still going to be a single entry found on one or more pages as transcribed in the Page: namespace or is a mainspace subpage going to be based on a range of pages as transcribed in the Page: namespace covering a bunch of entries (say all the "A" last names)? I hope its not the latter though technically that is the type of "division" marker used in the originals if I'm not mistaken. -- George Orwell III (talk) 01:05, 9 January 2014 (UTC)
My proposal is the easiest possible of just moving the existing pages to be subpages clumped under one of the three editions of the existing pages listed at the dot points above. Moving pages and having redirects. eg. "Machin, John (d.1761) (DNB00)" moves to "Dictionary of National Biography, 1885-1900/Machin, John (d.1761)". The parent pages exist, the content pages of the volumes exist; all links would redirect. The prev/next links exist and will redirect. No retooling, no new transcibing, nada. — billinghurst sDrewth 13:04, 9 January 2014 (UTC)
  • So a bit of preliminary research suggests that the drop-down feature that mainly interests me is not particularly well documented. I know now how to turn it off, assuming my skin were Vector (which is it isn't). How it handles redirects is something I'd like to clarify, to make progress here. Charles Matthews (talk) 08:46, 9 January 2014 (UTC)
    I sent an email to Nik yesterday about redirects and the typeahead function to see what is possible, and I await a response. I mentioned that they didn't work for long title names, and horrid for subpages. — billinghurst sDrewth 13:04, 9 January 2014 (UTC)

From the point of view of links from Wikipedia it would be slightly simpler if volume numbers were not incorporated into the paths, because at the moment with the Wikipedia templates if volume is not set the page is still found, but there is a proposal to add a volume parameter value to all instances of the DNB template in article space on Wikipedia that link to Wikisource, so it is a small not large obstacle to overcome.

As to the search string problem would that be simplified by setting up a name space that mapped say "DNB" onto the full name? (This is I think something that billinghurst mooted as a possibility with the Encyclopaedia Britannica 11th edition which currently reside at 1911 Encyclopædia Britannica) -- PBS (talk) 12:43, 22 February 2014 (UTC)

Data item in header[edit]

I'm getting ever more interested in the potential of Wikidata, as applied here. As things stand, most articles on enWP have a matching Wikidata item. Of the DNB articles, 73% now have been matched to enWP, a figure that is rising steadily (say 4% a year) now that most of the "legacy" matching has been done, and the growth basically comes from article creation.

It would be therefore, quite soon, be possible in theory to automate the process of filling in a "data item=" field in the DNB header, for about three-quarters of the DNB articles. It could be treated as a sister link, though on a different basis from those on author pages.

Here are some possible applications:

  • Matching up data items between the Author: pagespace and the DNB articles, there could be an automatic check of which author pages could have a DNB link and don't.
  • Create a list of duplications within the DNB.
  • Use the data items as part of a larger topical classification.

The last of these is an old issue for WS: how to list/categorise/portalise non-fiction texts here, by subject matter. Thinking just of biographies, there is the potential to create {{similar}} pages in an automated fashion, by doing the same data item indexation on EB1911, Catholic Encyclopedia etc.

It comes down to saying that standardising on Wikidata codes as our underlying "library classification" scheme now looks like an idea whose time has come. Charles Matthews (talk) 08:03, 9 April 2014 (UTC)

Page ends[edit]

I've reached the half-way point of my first general pass through the DNB volumes: I have just done vol. 35 of the 69. I have mainly been concerned so far with the article starts, and the links to WP. There are some other matters I feel need to raise, about the page ends, which will have to be the main area in a second pass. I believe the main issues are:

  • Use {{hws}} and {{hwe}} to deal with word breaks at the page end;
  • Use {{smaller block/s}} and {{smaller block/e}} to deal with page ends falling across the endnotes;
  • Use now {{nop}} to force a newline, when a para ends with a page.

The third of these is relatively new to me. I get the feeling that the relevant behaviour of ProofReadPage has shifted around, with upgrades; but is usually then shifted back to the status quo ante? Though not in the change that made {{nop}} now part of the business?

In any case these matters should now be written up. A couple of recent diffs suggest to me that this all is not quite as well understood (and I include myself) as it could be; and the DNB project should be moving to a final "manual of style". I take it that {{smaller block}} everywhere is the choice for the manual. Charles Matthews (talk) 07:12, 17 April 2014 (UTC)

I don't want to encourage people to validate without a final decision on these matters, so I will remove the "already proofread" link on the statistics page I put there. ResScholar (talk) 10:36, 18 April 2014 (UTC)
None of them are new. HWE is more important than HWS (you can just take the hyphenated word into the footer). The blocks are important to note break the text of the notes; and nop is solely for the wiki environment as it gobbles empty lines.
  • I can probably get a bot run through the volumes looking for terminating hyphens with a pretty good success rate to identify the hws issue. Wikisource:WikiProject DNB/Archive 3/Terminating hyphen (pages identified Yes check.svg Done ; corrections X mark.svg Not done)
  • I can run the same but to identify terminating }} and </small> though the former will have numbers of false positives as we know that it is okay to terminate if it does not continue
  • Not much we can do, it is an eyeball test.
billinghurst sDrewth 14:10, 19 April 2014 (UTC)

That's not quite the project's practice, though. Validation, piecemeal, has been ongoing. I have never known exactly how we are going to get the whole 69 volumes validated; nor quite what the scope of "finishing" we are intending to aim at is. Seems rather better to take things in stages.

My own view: the patches of green don't always live up to the standards one can hope for; but it is going to be a long job anyway. I have not wanted the "overhang" of validation to do casting too long a shadow, and distracting from getting things done. We have working text, and it is still quite bad in parts. My typo finder is meant to track down some of those bad parts by means of recurring OCR issues: I'm building it up as I find likely searches.

On another front, there are things like: endashes for hyphens (not always clear in the original, but the ODNB site's transcriptions give a standard); ligatures per the original; apostrophes per the original. These do not really have to be taken care of in validation. I don't like the "gratuitous spaces" that are, in my view, artefacts of the old typesetting and convey no information, so I'd like them removed. The bolding at the article start is not yet uniform across articles, but this pass of mine is intended to include that.

There are some unresolved issues with article titles and disambiguation, also.

As is typical of the DNB, it is hard to confront all the issues at once. Thank you for your interest. I can only suggest an ongoing pattern of passes to try to get the standard up eventually. Charles Matthews (talk) 08:02, 19 April 2014 (UTC)

My major issues, apart what you have mentioned above, are for where they wrote works to have author links (both ways), and adding the links where we have [q.v.] to works. Plus I would like to identify any redlinks, and resolve them. My methodology is to just bash away on the pages as I can. Typography bothers me less. — billinghurst sDrewth 14:10, 19 April 2014 (UTC)

Apparently broken this morning are {{smaller block/s}} and {{smaller block/e}}: Elderton, William (DNB00), Eldred, John (DNB00). I can't see any syntax problem, so assuming this is a "transient" software issue (i.e. random breakage). Charles Matthews (talk) 08:33, 24 April 2014 (UTC)

Manual draft[edit]

I have written down the things that immediately occur to me at:

User:Charles Matthews/DNB manual draft

Retrospective legislation for what is "validated" is not the point here, in fact. We will need a second checking system, probably, e.g. hidden category. Much to do first, though, and a thorough spellchecking pass is to be considered more urgent. Charles Matthews (talk) 09:07, 19 July 2014 (UTC)

Trialling an override, need some brain cells[edit]

@Charles Matthews: On a page like Page:Dictionary of National Biography, Second Supplement, volume 2.djvu/171 the author is Henry Stephen with the footer initials are H. S. (ambiguous) however with how we have done the template it was showing as H. S-n.. I have modified Template:DNB HS Stephen so that it has an override function override=yes that pushes H. S. as the text and keeps the right link, I am not certain that override is the right parameter name. Seeking feedback for the best/simplest/obvious means to portray this. — billinghurst sDrewth 03:51, 22 July 2014 (UTC)

Resolving [q.v.] versus [q. v.][edit]

@Charles Matthews, @Slowking4, @Mpaa, @Beeswaxcandle, @Arch dude: + others ... I am running through and fixing existing redlinks in pages, and I notice the variety of ways that we have q.v. Do you have a feel for which way we should unify these?

  1. [q.v.] no space
  2. [q. v.] normal space
  3. [q. v.] &nbsp; to stop line break splits

Noting that we also have these used in Supplements where we have a clear space between the q.v. and Supple...

I will get a bot to go through and to standardise (in a slow replacement) on whatever is the ultimate decision. — billinghurst sDrewth 09:43, 14 September 2014 (UTC)

With the space, as [q. v.] for me. Abbreviates quod vide, so I feel the space is natural and better. Charles Matthews (talk) 10:35, 14 September 2014 (UTC)
yes, agree, i’m afraid i left them as off the OCR (variable); you could run a bot to change, and flag for the template:DNB lkpl insertion. Slowking4Farmbrough's revenge 11:56, 14 September 2014 (UTC)

Authors in the DNB and their author pages[edit]

I'm currently preoccupied with Wikidata, involved in a six month project on the ODNB codes, which might be done in February, say. I thought I'd explain that more fully here later. Many of the DNB biographies are of authors: who could qualify for an Author page therefore. Do we have an idea of the number of those? Charles Matthews (talk) 16:45, 9 January 2015 (UTC)

Died dab extensions[edit]

Currently AFAICT the died dab extensions are in a format where the "d." is followed by that year with no intervening space. I think this is a mistake because I think most people would expect a space between the dot and the year, and within the text of the volumes there is one. -- PBS (talk) 17:20, 12 January 2015 (UTC)

The general philosophy for the choice of titles was minimalism: no space is just part of that. Charles Matthews (talk) 06:40, 13 January 2015 (UTC)

Wikidata Project[edit]

There is now a sister project at d:Wikidata:WikiProject DNB.

It should help Wikidata absorb the many items for DNB entries. Jura1 (talk) 13:32, 25 March 2015 (UTC)

Many thanks! Charles Matthews (talk) 03:59, 26 March 2015 (UTC)

See articles[edit]

The "see article" type of soft DNB redirect has a Wikidata item, d:Q19648608. This is a reason to create them systematically now (anything in Wikisource mainspace can have a Wikidata item).

As we know, there are various kinds, and they need to be handled somewhat differently. d:Q19766142 is an example that points from one surname to a variant. Charles Matthews (talk) 08:10, 7 April 2015 (UTC)

DNB01 and DNB12 main subjects on Wikidata[edit]

I have finished going through the "main subject" links for the two DNB supplements, 1901 and 1912. These make up about 10% of the total DNB articles here. On this smaller scale, it is still possible to explain what possibilities are opened up by the matching.

What this means that the "data item" link on an article here, which leads to the Wikidata metadata page for that particular article, can be followed by another link, to the Wikidata item for the person who is the subject of the biography. While the metadata page is not intended to carry a huge amount of information, the biography page well might.

For example, it may carry a link to enWP, if there is a corresponding English Wikipedia article. These links to enWP ought to be in sync with the header links here. If that is not the case, there are a number of possibilities:

  • What is in the header here is a redirect, which can and should be replaced.
  • There is nothing in the header here, but there is an enWP article that could be.

This is interesting to me, since finding those links is now something that can have an automated component.

  • There is a link in the header here, but no enWP link on the Wikidata item.

One should be alert to what is going on, in this case. It may, for example, indicate that the enWP article has not had a Wikidata link put in yet (which is true in about 1% of cases). Just as likely, the enWP has a valid Wikidata link running to it, but the item from which it comes needs to be merged into the item found starting here.

In other words, starting here at article A, there is the matching Wikidata item B, and a "main subject" link takes us to C. On the other hand, starting from a header link here to enWP, we get to D. We may expect C on Wikidata to link to D. It may simply not do so yet. Or item E on Wikidata links to D on enWP, where E is some other item that is a merge candidate to D.

Investigating D and E in this situation may throw up interesting checks. For example, it is perfectly possible that some of our links to enWP are not to biographies. Sometimes a link is to a company, while the DNB article here is about its founder.

Therefore, in case we have A -> D -> E, it is going to be worthwhile to check first whether E is "instance of human", or of something else. I expect instances of "duo" (two people), "family", "list article", "fictional human", "fictional character", "company", and odder things such as pubs, ballads and so on that may be named after people.

There may of course also be straight errors and omissions.

This all comes at the beginning of using Wikidata to manage search here by topic. Much checking to do, at the outset. DNB00 is on its way, but there are about 4K links to put in. This query is one way to find those, if you'd like to help.

NB that in principle there is a bot-created item to which the "main subject" link should run, if only you can find it (ODNB access really required); but the caveat is that those items only cover around 98% of the ODNB. By all means leave me a note on Wikidata for any baffling cases. This is another dimension of the project over there, and will end up with a complete old DNB -> ODNB matching that is machine-readable, an independently useful by-product.

Enough said for the moment. Charles Matthews (talk) 17:48, 15 February 2016 (UTC)