Wikisource talk:WikiProject DNB

From Wikisource
Jump to: navigation, search
Wikisource: WikiProject DNB Main Talk

Scanned index pages[edit]

I've proofed Page:Dictionary_of_National_Biography_volume_01.djvu/493 and Page:Dictionary_of_National_Biography_volume_01.djvu/494. Each was several hours of quite handraulic cut/paste work, given the poor OCR. Eventually, though, it came to a repetitive routine that I think should be possible to bot-assist.

1. Set up the headers and footers. 2. For each entry indexed on the page, paste one of two template forms as follows: 2a: For a "See so-and-so" use the form: {{Template:Dotted TOC page listing||Aboyne, second Viscount (''d''.1649). See [[Gordon, James (d.1649) (DNB00)|Gordon, James]].|}} 2b: For a regular entry use: {{Template:Dotted TOC page listing||Abraham, Robert (1778-1850)|[[Abraham, Robert (DNB00)|66]]|5}} 3. In a separate browser tab, find the target page, either by searching for "Abraham, Robert DNB00" or by clicking the link to the next entry in sequence, for instance as found in the page header at Abney, Thomas (d.1750) (DNB00), which targets e.g. Abraham, Robert (DNB00). Either way, select and copy the target page title e.g. "Abraham, Robert (DNB00)". 4. In the pasted template form, select the part identifying the target page and Ctrl-V to paste the correct title. 5. Repeat for the human-readable first template parameter, "Abraham, Robert". As appropriate repeat the copy/paste for Abraham's lifespan dates "(1773–1850)", from the target page onto the template. Markup italics as needed for b, d, or fl. 6. Edit the page number to match the scan, as confirmed on the target page's left margin. 7. Visually check that the entered data matches the scan image. 8. Repeat 2 through 7 as required.

The page layout isn't a perfect match, but it suffices. The effort involved doing this manually for four pages each of sixty volumes would be substantial, but not impossible. A bot assistant could make it quite tractable. Is it worth doing? LeadSongDog (talk) 21:13, 8 February 2017 (UTC)

Could be helpful, in checking the article creation, and giving a key to the various "see articles", which we are probably going to create some day. The volume listings haven't reached a definitive form; and these index pages complement them. Thank you for looking at this issue. Charles Matthews (talk) 05:18, 10 February 2017 (UTC)
Botting standard formatted text is pretty easy. Get the text right, and I can run a bot through. — billinghurst sDrewth 09:07, 10 February 2017 (UTC)
@Billinghurst: Thank you. Could we bot or bot-assist the harvest of page titles and lifespans? That would go a long way. A big part of the problem is that the OCR on these index pages is just terrible. The good news is that the entries are in alpha order, so about 90% of the job for one index page is just harvesting all DNB00 entries in a specific alpha range. Only the "See so-and-so" entries vary from this. LeadSongDog (talk) 16:19, 10 February 2017 (UTC)
@LeadSongDog: first step would be see if our local and improved OCR function (you may need to turn the gadget on to get the button) can do a better job? We may also be able to see if there are other volumes with better scans are available, each volume seems to have variability in its scan quality throughout. So we may be able to scrape text from another copy of the index page, paste and work with that. Running a bot to fix poor OCR is a variable process. — billinghurst sDrewth 00:34, 11 February 2017 (UTC)
Our OCR is worse on those pages. :-/ — billinghurst sDrewth 01:33, 11 February 2017 (UTC)

Index page hovertext[edit]

Pages such as Index:Dictionary_of_National_Biography_volume_01.djvu could be significantly improved by some small changes. When one points at "200" on that page, the hovertext/tooltip pops up saying "Page:Dictionary of National Biography volume 01.djvu/214". This is useful, but it omits the key information that that page is Airay-Airay, which could be mined from the header of that page. Similarly, pointing at v.30 on the index page one sees the hovertext "Index:Dictionary_of_National_Biography_volume_30.djvu" rather than the more useful "Dictionary of National Biography, 1885-1900, Vol 30 Johnes - Kenneth". Is there any chance this could be easily fixed? LeadSongDog (talk) 22:45, 10 February 2017 (UTC)

The page links point to the underlsying scanpage to edit. The id is a hard page link primarily based on the <pagefile> data on the corresponding Index: ns page. To pull the text from the page of the running header would be a complex task, and in many DNB pages that information is not there (early style issue). Also it sits within a <noinclude> tag, so its availability is problematic. What were you trying to achieve through the hover?
With regard to volume information, the (linked) volume data is at the top of the page. When I hover over that I get the pop-up data that shows the vol, and through the redirect the SURNAME to SURNAME. I suppose we could consider the presentation of the surname components within the header, though I am not sure that the extra detail is always relevant to present, and may be busy/noisy on the page. — billinghurst sDrewth 00:48, 11 February 2017 (UTC)
Oh, on the _Index_ pages. We can amend the template:DNB indexes to get hover information for the volumes. That would just take some time, though could be done incrementally.

There is no way to populate the pagelist with names.

If there is a separate list then we can add those below the pagelist, though I am not certain of the value. As a sort of test, I have added the compiled list to the bottom of Index:Dictionary of National Biography volume 56.djvubillinghurst sDrewth 01:02, 11 February 2017 (UTC)

The issue (that I neglected to clearly specify) was that the current scheme requires the reader to go on an Easter Egg hunt to find the article they're after. A list of volume numbers or of page numbers conveys little to the user. The "through the redirect the SURNAME to SURNAME" approach to finding a volume works only after clicking to follow the link, so requiring attempts on several volumes. That is why, on the physical bound books, the spine showed not just vol number but also the first and last SURNAME. A hovertext/tooltip/popup would allow them to know which page they are after before opening it. While wear and tear on the spines isn't a problem, there are still places where bandwidth is scarce or expensive. Users familiar with the project's scheme may know that they can a)search for the name; b)open the article; c)find the scan page link as a page number on the left side of the displayed article; and d)follow that link to get the scan page. As something of a newbie on s:, however, I certainly took a while to figure it out. That transclusion on the v.56 index page certainly seems to me to be a step forward: a human-readable list of names that are linked to the corresponding articles. It does not, though, make it obvious what the proofing status of the target article is (as does the colour-coded numeric scanpage map). LeadSongDog (talk) 16:30, 13 February 2017 (UTC)

Maintenance of WP links[edit]

Big advance via Wikidata: Petscan queries on d:User:Charles Matthews/Queries#Petscan allow one to find rapidly the English Wikipedia articles corresponding to a DNB article here, but not yet linked. This is done separately for DNB00, DNB01 and DNB12; takes just a couple of seconds each time.

Caveat: this approach does of course depend on data being in Wikidata. So English Wikipedia articles have to have a Wikidata item; and item must be the one to which the data item of the DNB article points under "main subject". If an English Wikipedia article A has an item D1, while the article B here that corresponds has its data item pointing to D2 which is different, the query will only pick it up as and when D1 and D2 are merged.

A few misidentifications of "main subject" are showing up.

All this said, these long-sought one-click queries (to which User:Jheald helped me) are going to be really helpful. Other works here can be treated the same way, if the essential infrastructure is put in place, analogous to Category:DNB No WP here, and a set of "main subject" links. The special situation is that the ODNB property on Wikidata has been maxed out.

Charles Matthews (talk) 05:42, 21 February 2017 (UTC)

Authors here in DNB lacking enWP article[edit]

This is quite a neat use of SPARQL: query here. Today it brings up 76 authors with pages here, not having enWP article (according to Wikidata), but being DNB people. Charles Matthews (talk) 16:12, 1 March 2017 (UTC)

Time for a FAQ?[edit]

I see much of this page has been archived. That may make sense, given that the project is still active, but discussions are no longer so frequent as in the past. Consolidation of conclusions as a FAQ would make sense. Charles Matthews (talk) 04:06, 31 March 2017 (UTC)

I am not against that; or if it is easier, I am not against bringing back pertinent conversations and marking them not to be archived. We could also pretty easily build a central ToC for the archives and paste it into a top section. Examples of ToC are WS:Scriptorium/Archives. — billinghurst sDrewth 04:31, 31 March 2017 (UTC)

So I'm now revising Wikisource:WikiProject DNB/FAQ. One format point: do we want the author templates to be on a new line? Charles Matthews (talk) 09:48, 3 April 2017 (UTC)

Do you mean the article footer initials templates? If so, I have been doing so, it just makes them more eye readable IMNSHO. — billinghurst sDrewth 12:28, 3 April 2017 (UTC)

Yes, I noticed that, which is why I asked. Should this be adopted for our Manual of Style? Charles Matthews (talk) 04:24, 4 April 2017 (UTC)