User talk:Charles Matthews

From Wikisource
Jump to: navigation, search

/Archive1 - /Archive2

Congrats on 100k[edit]

You passed the 100k edits on enWS this past month. Nice. — billinghurst sDrewth 11:02, 5 January 2013 (UTC)

So it's about 5 edits per DNB article, really. I'll start to worry when WS edits overtake WP edits, but that will be a while yet. Charles Matthews (talk) 11:06, 5 January 2013 (UTC)

James Wilson link — do you know?[edit]

Got a link to a James Wilson at Page:Dictionary of National Biography volume 52.djvu/37 however it is too early for those listed. Can you guess who? Or do I just dead link it? — billinghurst sDrewth 13:22, 13 January 2013 (UTC)

It will be w:James Wilson (anatomist) - not in DNB, or that article would have been easier to create! An interesting test case. What serves the reader's interests best? There is an argument for having a standardised page with the message "qv in DNB is misleading, but enWP does have an article". Charles Matthews (talk) 15:10, 13 January 2013 (UTC)
I did a bit of text around blind links, and made a comment at the project page. I think that we have scope to add enWP links under that design. — billinghurst sDrewth 15:30, 13 January 2013 (UTC)

retitling page[edit]

Hi Charles. I moved Seymour, Francis (1590?-1664) (DNB00) from a page title which was inexplicably mentioning 1669, and then I got over-excited and edited the template and the fromsection / tosection etc. This has messed up the way the page appears. Not sure how much it's best to change. Dsp13 (talk) 19:11, 15 January 2013 (UTC)

Done - needs the transclusion to be updated between the ## and ## markers, if you change it in the other place (which you don't have to).
By the way, while you're here, DNB00 is going well, with just about 1500 articles left to do; and DNB01 is apparently finished. So not so long until we are down to DNB12. Now for that, there is a whole bunch of new authors to identify. There is a list at Wikisource:WikiProject DNB/1912 authors. Also we are quite sophisticated compared to the old days, tagging with authority control. There is a tool for that, inevitably written by Magnus Manske. Bearing down on either or both of these tasks would be very helpful. Charles Matthews (talk) 19:56, 15 January 2013 (UTC)
I've had a go with Magnus' tool... very nice! as far as redlinks in the 1912 authors are concerned, where the author's already there under a different name, should I just add a redirect? e.g. Author:John E. Sandys to the existing Author:John Edwin Sandys?
Or alter on the list - I suppose either way is good. Charles Matthews (talk) 06:49, 19 January 2013 (UTC)
OK, great. (I wasn't sure whether the list itself was a transcription which shoudl be presersved as surface text.) My next question is about authors for whom there are WP pages, eg. Mrs Blanco White is w:Amber Reeves. The WS author pages have some standard templates etc, and I don't have a full grasp of the conventions. How should I set one up just to say that she's a 1912 DNB author? Do I need to add what she wrote in the DNB at that point? If I have an example to copy, then I can crack on through that list I think. Dsp13 (talk) 10:27, 19 January 2013 (UTC)
Great. I have just lashed up {{DNB contributor 2ndSupp}}. If you add that to author pages you create for the additional authors, it will do as a start. That is, tracking those pages will be easy. We are still a bit short of infrastructure for DNB12, obviously. But as it is only 5% of the size of the first edition it is not going to be a serious issue to match the work, in time. ({{DNB contributor}} has the code for adding the initials and that can be carried across when someone wants to go through systematically. I think we can cross the bridge when we come to it for listing articles: I was only able to do it for the first edition because you lent me the Fenwick book.) Charles Matthews (talk) 10:40, 19 January 2013 (UTC)
Great. I think I'll hang out on ws for a while :) Watching Rich F undergo death by 1000 cuts on WP is making me absurdly upset when I go there. Dsp13 (talk) 10:51, 19 January 2013 (UTC)
He's be very welcome here also. Category:DNB No WP has 11,000+ entries and we really need to fill in links to existing WP articles, as he has done in the past. I hadn't got round to showing you, which is joint work of Magnus and myself, ordering that category by the number of DNB pages a biography draws on. Charles Matthews (talk) 10:57, 19 January 2013 (UTC)
Opened discussion on hard to identify 1912 authors (S to Z so far). Dsp13 (talk) 18:05, 19 January 2013 (UTC)
Got one via ODNB contributors, left a couple of other comments. Charles Matthews (talk) 18:31, 19 January 2013 (UTC)

'Zaphod Beeblebrox'-style[edit]

Apart from the first question of "why isn't he in DNB?", I have had my heads smacked together lightly by an AWB hacker, and I have been directed to the way around the 25k category limit. 29037 articles! Then I remembered that we have it embedded in the header, to automatically apply, so the list at Category:DNB No WP will show where there is no parameter, which means that it should be pretty good. I will run a check to make sure that we don't have any empty parameters which may throw the process, and fix accordingly. Not sure how we check the validity of a sister link. I will see if I can find someone to ask. — billinghurst sDrewth 01:21, 8 February 2013 (UTC)

Did some tests, the lsit should be complete for work to be done. — billinghurst sDrewth 01:43, 8 February 2013 (UTC)

Question on CE[edit]

Hi. A question on the approach to be followed. For articles like this: [1], Diocese and University are in the same page. On OCE the article is named after the town, with sections inside but the article on WS has been called Catholic_Encyclopedia_(1913)/Diocese_of_Sigüenza, even if University is included.

On the other hand, for Perugia, both on OCE and WS a different choice has been made, and the article splitted in two. See Catholic Encyclopedia (1913)/Archdiocese_of_Perugia and Catholic Encyclopedia (1913)/University of Perugia. To me both approaches are the same, but I think it should be a consistent choice. I started to change in one direction, but stopped once I realised both choices were in place. Any advice?--Mpaa (talk) 16:20, 28 April 2013 (UTC)

In time I think the articles should be merged so that they have the same structure as the original. In fact since we have the scan here in the Page: namespace, I don't think there is really any choice: the transcluded version should match the original version. Splitting out the universities was just someone's arbitrary idea.
There are other types of examples.
As far as titles are concerned, the Sigüenza article does start Sigüenza, Diocese of and I have been consistently naming these articles "Diocese of X". It is justified from the original, and is also more helpful to the reader, really, to know that the content is primarily ecclesiastical. So this is a convention that I think should be applied. Charles Matthews (talk) 05:03, 29 April 2013 (UTC)
OK. I will mark articles for merge or will do it directly when I will come across such cases. I already splitted one, too bad ... --Mpaa (talk) 06:47, 29 April 2013 (UTC)
Hi. To cope with the wrong order, I tried the following. I fetched the ordered sequence of articles in OCE (with prev/next, Volume and link to the scanned page). There are (should be) 11487 articles. In WS there are 11648 articles (I think the difference being splitting choices here and there). Comparing them, 2048 do not match (after taking care of Blessed, Venerable, Saints, etc.). There are several reasons for that, like:
  1. e.g. Diocese of ..., Prefecture of ..., not yet moved on WS
  2. OCE has changed convention in some cases (e.g. Apostolic Vicariate instead if Vicariate Apostolic of ..., see Apostolic Vicariate of Kiang-nan and Catholic Encyclopedia (1913)/Vicariate Apostolic of Kiang-nan)
  3. accented charachters, ligatures, etc. (by the way, what is the convention to be used? ae-like or æ-like?)
This could be a way forward in dividing articles per volume and fixing prev/next, as OCE should be reliable. Names could be tuned on both sides, articles merged, keeping the scan as reference, until convergence. What do you think? If you want to take a look, I can post the lists (maybe too long?) or send you an excel file.--Mpaa (talk) 20:55, 29 April 2013 (UTC)
To get serious, lists should be posted as project pages, as subpages from WS:CEU. And that initial page should be reconsidered.
One contribution to mismatches would be that, for example, a beatified person who is now canonised would appear as a Saint in the CE version, because what we have is a mirror of the version on where the editor did such things. But there are also a number of typos to find.
By the way, our version also has some missing articles. There used to be a whole block missing under letter E: I filled that in. But there are isolated instances of skipping. Charles Matthews (talk) 11:25, 1 May 2013 (UTC)
To give you a better idea of what I am pursuing, I have posted Vol.1 here. In black there are OCE data (and links to scan page), in blue links to WS. Where there is one line, a 1:1 match has been found. Where no perfect match is present, alternatives are presented, obtained with a fuzzy match. Where there is a mismatch, we can change what is wrong, WS title or OCE title in this page. E.g., if accents are OK in the following case:
  • Vol:1 Lucas D'Achery -> Lucas D'Achery [2]
  1. Lucas d'Achéry
OCE reference could be changed as follows and at the next run of the script a perfect match could be obtained.:
  • Vol:1 Lucas D'Achery -> Lucas D'Achéry [3]
  1. Lucas d'Achéry
  1. OCE as reference (not perfect due to arbitrary splitting of articles or spelling errors) but scans are one click away
  2. changes to OCE references can be done in place and a script can be run to refresh this as needed
  3. a refresh can be done to updated WS links after page move
  4. fuzzy match algorithm can be made a bit smarter and show more accurate results
  1. OCE inaccuracies might be present
  2. not all WS links are considered, but it could be identified what is not yet matched.
Feedback appreciated, given your expertise on the matter.--Mpaa (talk) 22:18, 1 May 2013 (UTC)

I'll put detailed feedback on that list on its talk page when I have a moment: I'm a bit busy right now. Just to make sure we are communicating well here, you do know that the CE is here in pagespace as an upload? The reason I use the OCE site is that there is an indexation of the articles, so that you can get to the scanned page from an article title usually quite quickly. Until we have the articles here in the correct order, that is a timesaver. In a systematic project it would of course make sense to link from a list to our own pagespace scan. Charles Matthews (talk) 09:59, 3 May 2013 (UTC)

Yes, I am aware of that. Problem is that there are already articles present (with unsure sorting) and Page ns is almost empty. I was just trying to use OCE article order (available now) to sort articles in WS. It might not be 100% correct but could improve current unreliable status.--Mpaa (talk) 11:27, 3 May 2013 (UTC)

It does look as if reconstructing the volume ToCs linked from Catholic Encyclopedia (1913) is going to be a major step forward. Charles Matthews (talk) 11:41, 3 May 2013 (UTC)

Rees's Cyclopaedia project suggestion[edit]

Hi Charles, to continue the correspondence we've been having on your WP talk page about putting Rees's Cyclopaedia on Wikisource, I just read your Wikisource thread - lots of very detailed information to digest. I checked my records and find there are 30,400 pages in Rees, over 39 volumes. The text is in double column with an average length of 1480 words per page, which makes it around 43 million words. The pages are un-numbered, and only the botanical articles are signed. The plates (in separate volumes) are keyed to the articles, and it would be really useful to hyperlink the plates to the text, if you see what I mean. This would be possible with Commons, of course. It will be a very, very long job indeed, and at my age I shall probably not live to see it finished. However that is no reason for not beginning!! The spade work done in organising the logistics of getting the DNB done will make the job far easier. I'll be in touch here later on. Apwoolrich2 (talk) 16:02, 11 May 2013 (UTC)

Yes, 30,000 pages is the same scale as the DNB. You mentioned that the Hathi trust scan is better. I think you should look carefully at it: with a facing image to look at, it is possible to make changes as you go, provided they are minor. Charles Matthews (talk) 16:51, 11 May 2013 (UTC)
Unless I've mis-read the Hathitrust catalogue, they only have scans of the American edition, apart from vol 39 of the British one. I detected problems in not being able to download more than one page at a time in PDF format. The HathiTrust scan shows the long ESS as a long ESS. I'll re-check this Apwoolrich2 (talk) 17:53, 11 May 2013 (UTC)
I've just copied and pasted into my text editor a sample page, and its come over very well with all the lines in the right order. Minimal editing is needed. Some of the OCR'd texts in the IA, however, sometimes conflate the lines across the columns (First line col 1 followed by first line col 2, line 2 followed by 2, 3 followed by 3 etc). I see the Hathi Trust now has the British version as well as the American. I did not see it when I made the listing of digitised copies for the WP article last year. I do have my own copy of Rees, so can proof read from that. A tip - The British edition is dated 1819, and the American 1805 on this site. Apwoolrich2 (talk) 18:29, 11 May 2013 (UTC)
Some further thoughts. Checking through a volume on the HathiTrust site, I see that all formatting of mathematical tables is lost in the OCRd version, so these will need to be re-typeset from scratch. Many of the tables run over the entire page instead of being confined to the columnar format of the text, some orientated sideways as well. I'm sure it can be done, but it will be a fiddle. I must see what other books there might be in WS with similar tables and check out how it was done.
Tho' there are 30,000+ pages, there are far, far more articles then this, when the very short dictionary articles are taken into account. I suppose an index pages will need to be created for each volume listing each article in turn. The fact that the original was not paginated has caused problems ever since the work was published. Would there be any profit in adding page numbers in the Wikisource version? I've copied off a number of the WikiProjectDNB documents, and will spend a bit of time digesting them. Also see if there is something similar for WikiprojectEB11, if there is one. Kind regards. Apwoolrich2 (talk) 09:13, 12 May 2013 (UTC)
There is a Britannica 1911 project here, but they don't transclude, I think. The tables would not be impossibly hard to format in wikitext, in place and from scratch.
What we have done for the DNB is to list the main articles in tables of contents, and make the "previous" and "next" fields skip the short articles. In other words the first aim is to have a ToC that covers the essential ground of the content. One can always come back later to interpolate the short articles. You'll find that there is just a bit of such interpolation around for the DNB.
On pagination: once there is an index page, and transclusion from pagespace, page numbers do appear automatically in the left-hand margin. A typical index page is like Index:Dictionary of National Biography volume 58.djvu. You can see there is a certain amount of control there on things: there are some Roman-numbered pages, and skipping. If you click through on page 100 there you get to page 108 of the pagespace version: as we would say, there is an offset +8. At Vanderbank, John (DNB00) you can see, though, that the page displayed in the margin is 100, corresponding to the index page.
Therefore there are two things going on. The scan has been uploaded with a straightforward numbering from 1 to 477. The index page has its own version, basically with 8 front-matter pages. This is a bit technical: the question is whether it is technical enough for Rees! I've never actually edited an index page before, but I see that it reads essentially as
Whatever the specifics are, there is clearly a good deal of control on how the marginal numbers are displayed. So it ought to be the case that you could set up the numbering in a helpful way. Charles Matthews (talk) 09:51, 12 May 2013 (UTC)
I've created a sandbox page on my WS talk page and posted in it the entire preface and the first page of the work. The latter is about half a regular page in length because of the depth of the heading. I've edited this page with all the small caps and italics of the original.The Preface does have typos in it, I know. It was not very onerous, and occupied a part of a damp Sunday afternoon. I was glad to find the WS spell checker correctly found most of the 'long esses'. Next phase to post a message on Scriptorium making a proposal? Also to have a play with a header? Apwoolrich2 (talk) 16:28, 12 May 2013 (UTC)
Procedurally I suppose you can just get on with it. Adapting {{DNB00}} to make {{Rees}} is probably less scary than it looks. NB that here on WS documentation tends to be at Template talk:X rather than Template:X/doc. User:Billinghurst is my go-to guy for templates, as for much else.
The Scriptorium - WS:S - is a good place for technical queries. The big technical issue is actually getting vol. 1 uploaded. That needs an admin: I'm one but have never in fact done a bulk upload. Others would, but a case has to be made. Charles Matthews (talk) 18:13, 12 May 2013 (UTC)

I've just posted the list of long biographical articles from Rees on the WP Rees Page. Quite fun to do, but it was a pig getting the tables to work properly. All is well now though. Can't say the same for the page name. I've written '...ON Rees...' instead of '...IN Rees...' I will try and work out how to change it. I'll be interested to see how long it takes for somebody to Wikify the names. I'm still thinking around the form of the WS Rees project. Get back to you later Apwoolrich2 (talk) 14:27, 18 May 2013 (UTC)

On my WS Sandbox page, I've posted a draft proposal about Rees for the Scriptorium. I'll be glad of any comments, please. There is a queer glitch I can't resolve. I italicised a book title, yet when saved, the title remains in Roman, and the italic is shifted a few words to the right. I wrote the original on my text editor (NoteTab), then cut'n'pasted it to the sandbox, where I added the markup. All very odd. Apwoolrich2 (talk) 19:01, 19 May 2013 (UTC)

The format issue was a line break. Fixed now. Charles Matthews (talk) 08:17, 20 May 2013 (UTC)

Hullo Charles. Its been 2 months since I posted my proposal, but apart from you there has been no response, so I am wondering if I am being a bit over enthusiastic with the idea of getting the text of Rees on WS now. In the last two months I've been indexing the Rees biographies on Botany. I took an existing list, but found it was highly inaccurate, so have been through each page of every volume. This is the first time I have ever done it, and am amazed at the wealth of material Rees contains. The musical writings of Charles Burney are very readable and I plan to index these next, since there is a fair amount of academic interest in Burney which does not appear to have made much use of his Rees writings. They provide a wealth of info about the C18 London musical and theatrical scene. I've been looking at the scans of the Rees plates on IA, but find they are very coarsely scanned, as well as being crooked and cropped in some instances so as to be inadequate for WP use. When I've finished Burney will scan my set of the plates at a better quality resolution and post those on Commons. Once they are there maybe a demand for the texts on WS might arise. I must confess to getting side-tracked looking up botanical biographies on WP of names in Rees. Also musical examples of relevance to Burney on You Tube. I'll keep in touch on progress. Kind regardsApwoolrich2 (talk) 19:43, 20 July 2013 (UTC)

Wikisource User Group[edit]

Wikisource, the free digital library is moving towards better implementation of book management, proofreading and uploading. All language communities are very important in Wikisource. We would like to propose a Wikisource User Group, which would be a loose, volunteer organization to facilitate outreach and foster technical development, join if you feel like helping out. This would also give a better way to share and improve the tools used in the local Wikisources. You are invited to join the mailing list 'wikisource-l' (English), the IRC channel #wikisource, the facebook page or the Wikisource twitter. As a part of the Google Summer of Code 2013, there are four projects related to Wikisource. To get the best results out of these projects, we would like your comments about them. The projects are listed at Wikisource across projects. You can find the midpoint report for developmental work done during the IEG on Wikisource here.

Global message delivery, 23:20, 24 July 2013 (UTC)

Thinking ahead[edit]

Hi Charles, I'm thinking ahead to the December Proofread of the Month, which has been tagged with a games theme. I see we already have The Game of Go by Arthur Smith (1908). Is this the best we can do or is there a better PD book on Go that could be dropped in alongside a book on Chess and Association Football? I'm asking you because I've just tripped across your name on the Sensei Library and hope that you've got a better knowledge of the available literature than I do at a rusty 25kyu. Beeswaxcandle (talk) 07:47, 26 July 2013 (UTC)

Ha. Yes, I know much of the literature in English. I co-wrote "Shape Up!" with a friend and many people assume it's PD; but not so. I also wrote "Teach Yourself Go".
Here's the deal: Wikisource could add plenty of value to the Smith book, by creating diagrams for the problem sets that are given in algebraic notation. There are various kinds of diagram software. Otherwise it seems to me that it would be a reasonable POTM, because it seems well advanced and mostly needs conscientious validation. But the book itself is no longer indicated for beginners. Charles Matthews (talk) 09:27, 27 July 2013 (UTC)
I've already re-started the stalled validation and plan to get through it over the next couple of weeks. I haven't got to the problems yet, but may come back to you on the diagram software. Since I messaged you above I've found a copy of Cheshire's 1911 work on the Internet Archive. A quick scan through suggests that this is also not indicated for beginners and is probably more of historical interest. Nonetheless, is this worth consideration for a more general proofreading audience or should I just add it my list?
What surprises me is that Go doesn't seem to have been picked up in the English craze for things Japanese of the 1870s and 1880s. The concept of Ko-Ko singing Tit-willow to Katisha over a goban is intriguing. Beeswaxcandle (talk) 10:31, 27 July 2013 (UTC)
Now that book I didn't know. The game on p. 36 seems to have been played between top pros on 9 December 1910 (Nozawa Chikucho versus Iwasa Kei). I've never met the notation system on p. 32 before. This book is certainly of historical interest, and as a self-published book must be rare. Not for beginners, as you say. The other games in the book could probably be identified, given that (I imagine) they were published in Japanese newspapers. Charles Matthews (talk) 13:13, 27 July 2013 (UTC)
The article by Tony Atkins linked from here on the BGA site suggests the posting (from 2010) is not so generally known. These books are sort of test cases for annotation, in my view. Charles Matthews (talk) 13:25, 27 July 2013 (UTC)
OK, there's now The Game of Go (annotated) with illustrations of game 3 in chapter 5, the missing joseki in chapter 6, the alternate for ending no. 2 in chapter 7, and all the problems and answers in chapter 8. Probably needs some further layout attention, but I think it's usable now. I'll have a break and focus on other projects, but do plan to come back to Cheshire later in the year. Beeswaxcandle (talk) 06:06, 21 August 2013 (UTC)

Re: Hashes and DNB transclusion[edit]

Uh, what are you talking about? Not to be difficult, but it's been a month since my last edits to Wikisource, so I found your message cryptic. If I removed hashes -- & I probably had, I simply don't remember which ones you could be referring to -- it was because of display issues in the browser I was using. (Which browser, I don't remember off the top of my head: I tend to edit from a number of different computers.) So with a little more information I could understand what I did & if there's a problem with the site or just not understanding how things are done here. -- Llywrch (talk) 18:09, 21 September 2013 (UTC)

I was referring to this diff, indeed from August. What actually happens is that <section begin=""/> and <section end=""/>, that mark out the beginning and end of sections transcluded into the freestanding article, get consolidated by software into a single section marked out by ## and ## at the start of a biography (assuming it's the DNB). So I found some biographies that weren't properly formed, and fixed them. Your edit summary suggested you weren't alert to the mechanism. Charles Matthews (talk) 19:00, 21 September 2013 (UTC)

1,000,000th Content Page[edit]

The 1,000,000th content page Wilkinson, George Howard (DNB12) new content page origination edit by you took place at 09:10, 29 October 2013 (UTC). It had 408 bytes. Congratulations! This anticipated edit was discussed on the Scriptorium at Wikisource:Scriptorium#Approaching_1_million_content_pages_at_enWS. ResScholar (talk) 04:06, 31 October 2013 (UTC)

Thanks - through the marvels of Echo I had seen that, via @Prosody:'s edit. Unexpected good news. Charles Matthews (talk) 04:25, 31 October 2013 (UTC)

Catholic Encyclopedia & DNB[edit]

You seem to be the person most involved in trying to get the above complete here, from what I have seen. What do we need done yet? To the degree that I can help, I think it would be great to get some of the old reference works completed, because I am also right now developing a rather huge list of the older reference works still counted as useful as per a 1986 book on reference works, and I think having a few such reference works completed might make it more likely that some of the others get attention as well. John Carter (talk) 19:12, 3 November 2013 (UTC)

The DNB had, as of this morning, 37 articles of the second supplement to post, and then the public domain text should be complete. In other words, it is pretty much done.
The Catholic Encyclopedia here is more complicated to discuss. It was first posted by a bot, in a most unfortunate way. The division of articles didn't exactly match the original; the ordering was somewhat arbitrary, so that you can't easily match the "volumes" with the originals; and the bot skipped in posting. Also the text was apparently scraped from the New Advent digitisation, which often omitted the endnotes (and did worse things ...)
I would guess the Catholic Encyclopedia is about 95% complete here: there was a gap of about 300 articles, mostly with initial E, but I filled that in. The other missing articles are presumably cases where the bot skipped one. They are hard to detect without going through the whole text against a scan. We have a scan uploaded here.
So, frankly, the Catholic Encyclopedia is still a mess from the point of view of completeness. Some progress has been made on author pages, which is one way to cross-check. We won't really know until the text is "migrated to djvu", as the DNB is, how much there is to do, but "plenty" covers it. Charles Matthews (talk) 19:23, 3 November 2013 (UTC)
Well at least vol. 7 of the CE here is in djvu, maybe some others as well. I've never started pages for works before, and don't think it would be a good idea for me to try one this early, but if might be a start. If you've got a link somewhere indicating what isn't done in the DNB, I can maybe at least look that over and maybe try to start some of them as well. John Carter (talk) 19:35, 3 November 2013 (UTC)

We do have all the Catholic Encyclopedia files we need: Index:Catholic Encyclopedia, volume 1.djvu and so on cover it. If you find a missing CE article, you can post text with the header at Template:CE13, which is the easy way to do it. In the fullness of time the text will go by the scan, as is done for example at Page:Catholic Encyclopedia, volume 1.djvu/25 for Catholic Encyclopedia (1913)/Aachen. If you go to edit Catholic Encyclopedia (1913)/Aachen you can see the syntax that pulls the text into the article. Yes, it's a bit complicated at first sight, and there needs to be markup on the Page: versions to make it work. Anyway that is how we do business here, by preference, these days. ProofReadPage is the name of the system, and it means proofreading is verifiable by anybody.

The DNB remaining redlinks are at Dictionary of National Biography, 1912 supplement, Volume 3. I see six have been done this afternoon, so the end is in sight. If you want to experiment on say Warner, Charles (DNB12), you need to go to Page:Dictionary of National Biography, Second Supplement, volume 3.djvu/605. The DNB12 header to use is like

{{DNB12 |article= |previous= |next= |volume= 3 |contributor = |wikipedia = |extra_notes= }} <pages index="Dictionary of National Biography, Second Supplement, volume 3.djvu" from="6" to="6" fromsection="" tosection=""> </pages>

with the relevant page numbers 6.. inserted. fromsection and tosection relate to whatever you put in <section begin=""/> and <section end=""/> to mark out the start and end of the text you want in the article. (When you place nowiki> and </nowiki> correctly on the same page they resolve to a header like ## Warner, Charles ## if, as I would, the marker is Warner, Charles.)

To some extent this is straight in at the deep end, and may look forbidding.

You can get some idea for the Catholic Encyclopedia from another site with scans, e.g. . If you compare with Catholic Encyclopedia (1913)/Quinquagesima you will see that the next article matches, but after that the OCE site has "Quiricus and Julitta" and we apparently don't. That turns out to be because the "next" link from Catholic Encyclopedia (1913)/Agustín Quintana needs to be changed! That is about where we are, and you are of course very welcome to help in checking. Where there really is a missing article, you could use the OCE text to create an article with the CE13 header. Charles Matthews (talk) 20:23, 3 November 2013 (UTC)

Sorry if I jump in. I have aligned to my best the articles in CE. Volumes TOCs (e.g. Catholic Encyclopedia (1913)/Volume 12) should reflect OCE, and hopefully scans (except articles to be merged). Using your example, you can see that in Vol. 12 TOC the right article is there, and also Catholic_Encyclopedia_(1913)/Sts._Quiricus_and_Julitta points back to the right article. So, work in progress on this side.
IMHO, the next point would be to attack scans but the most important issue to be defined is how to tag sections. If we use CE convention (e.g. "Quintana, Augustine"), it would be cleaner but a bit more challenging to write a bot to automatically transclude links to the current title pages. Any suggestions on this?--Mpaa (talk) 21:24, 3 November 2013 (UTC)
Now prev/next for articles are aligned with ToCs for all volumes.--Mpaa (talk) 21:30, 5 November 2013 (UTC)
But that's excellent! Charles Matthews (talk) 16:50, 6 November 2013 (UTC)

I should review the situation with the Catholic Encyclopedia, then. The DNB posting has about another day in it, and then that project needs to take stock. For the CE, to adapt my past DNB method that used long strips of marked-up text, the first thing is to get the table of contents correct; then use the list of names of articles to generate marked-up text in bulk. I used {{polysect}} and some list manipulations to do about 30 pages at a time, with the page titles serving as the transclusion markers, and had a system for producing templates to minimise work. I used {{DNBset}} for the actual article creation. A serious amount of work, though, and the Ce text requires more remedial work. Charles Matthews (talk) 06:40, 4 November 2013 (UTC)


You have new messages
Hello, Charles Matthews. You have new messages at AdamBMorgan's talk page.
You can remove this notice at any time by removing the {{Talkback}} or {{Tb}} template.
You have new messages
Hello, Charles Matthews. You have new messages at AdamBMorgan's talk page.
You can remove this notice at any time by removing the {{Talkback}} or {{Tb}} template.

Tenth Anniversary Contest[edit]

Continuing from the discussion on the Wikimedia UK mailing list, I've started a draft page for this: Wikisource:Tenth Anniversary Contest. Does this look even slightly appropriate to you?

It's only partly done because I haven't had a lot of free time so far this week. I've left space for ten texts, to match ten years of Wikisource, but I've only found a few that seemed appropriate so far (and I uploaded a new one; a WWI work seemed right this close to that anniversary). It might end up being a lot less than ten. I also need to find out how WMUK see themselves being involved in this. - AdamBMorgan (talk) 22:34, 13 November 2013 (UTC)

Thanks - I've been on holiday and offline, need to catch up. Charles Matthews (talk) 08:25, 17 November 2013 (UTC)


Hi Charlies, I replied to you here. Ed [talk] [en] 07:42, 29 November 2013 (UTC)

Links from CE1913 to en:WP[edit]

Hi. Do you think it would be interesting to add such links? The approach would be to look for {{Cite Catholic Encyclopedia}} (any other useful templates) on en:WP, and check that what is linked back to WS does not contain the corresponding WP link.--Mpaa (talk) 21:57, 8 January 2014 (UTC)

Need just to clarify. This would be from the text here of the CE, not just for the wikipedia= field in the header? Charles Matthews (talk) 08:28, 9 January 2014 (UTC)
I meant "just for the wikipedia= field in the header".--Mpaa (talk) 11:03, 9 January 2014 (UTC)
Right. Then there is a great tool for this type of matching. But it is not working right now. And then there is this other tool, which is matching CE pages up to Wikidata pages, whence the enWP page could typically be found. But it would be more up-to-date to work on putting "wikidata=" as a header field anyway, I guess.
So there is some overlapping work to take into account.
The things I'm talking about are
Now that doesn't run - it was ported from the toolserver by Magnus Manske, and it didn't run there either, as I recall. It is probably some relatively trivial thing the code, I guess, given that the CE pages are subpages and the DNB pages have a suffix, and the DNB version of the tool runs very well for me. Might be worth a few minutes of your time to look into this; I can ask Magnus if this business really isn't transparent. (I tend to assume everyone here has more technical knowledge than me, and it is usually a good guess.)
The wikidata-related tool is This has a CE setting which really does work, and others are working on it, meaning you might be able to reuse.
I'm sure there is nothing wrong with your original idea. Just seemed worthwhile documenting what is out there already. Charles Matthews (talk) 19:51, 9 January 2014 (UTC)
I am not completely familiar with Wikidata, and as far as I know wikisource is not supported yet. I have no idea of what the effect of the wikidata= field in {{header}} is. I'll try to dig a bit more, otherwise I guess will stick to my original idea.--Mpaa (talk) 22:02, 9 January 2014 (UTC)
WS:S#Reminder: Wikidata coming on January 14th! Charles Matthews (talk) 22:10, 9 January 2014 (UTC)
I posted a question on Scriptorium. Let's see.--Mpaa (talk) 22:50, 9 January 2014 (UTC)
You might be interested in following up the discussion at Scriptorium, as my understanding is that the matching done by the tool you have listed above associates different 'entities'.--Mpaa (talk) 08:50, 10 January 2014 (UTC)

wikimania panel?[edit]

a bird was suggesting that User:Moondyne was interested in WikiSource activities at wikimania? how about a WS panel, about the DNB success story. a reception would be nice to recruit, although there aren’t many pubs near the barbican, ? i defer to your local knowledge. Slowking4Farmbrough's revenge 18:50, 20 January 2014 (UTC)

The sort of panel that would interest me would be around "Digitization and reference material". Charles Matthews (talk) 19:39, 20 January 2014 (UTC)
hate to keep harping, but i see the wikisource meetup social, but the panel, did it get not accepted? [4] could it be a workshop? WMUK has not been very transparent. i guess in a month things will have settled. Slowking4Farmbrough's revenge 02:35, 7 June 2014 (UTC)
I've been away, and missed an "availability check". But I have just heard that the panel has the green light. I'll post more to the Scriptorium: the panel is 6031 on Charles Matthews (talk) 15:45, 12 June 2014 (UTC)


To mention, nothing more, that I have uploaded nine vols of Thomson's A biographical dictionary of eminent Scotsmenbillinghurst sDrewth 09:59, 27 January 2014 (UTC)

Thanks. I was getting a bit puzzled, given that w:Thomas Napier Thomson mentions another number of volumes. Page:A biographical dictionary of eminent Scotsmen, vol 2.djvu/287 shows a volume start. So I suspect that those volumes were bound as nine, rather than being nine originally. Charles Matthews (talk) 14:22, 27 January 2014 (UTC)
They are page numbered as 3, though bound as 9, in nine divisions (title pages for each) corresponding to the binding. As it is the 1857 publication date, presumably it was a reprint in a library form of the 1851, maybe it is the second edition of 1851, all a bit hard to tell and maybe I need to do so research on it. All that said, it was the best scan available, the pages seemed to align and be present (though it was mental arithmetic late at night). — billinghurst sDrewth 05:40, 28 January 2014 (UTC)

Automated import of openly licensed scholarly articles[edit]

Hello Charles Matthews,

We are putting together a proposal about the automated import of openly licensed scholarly articles, and since you are an active Wikisourceror, we'd appreciate yourcomments on the Scriptorium. For convenience, I'm copying our proposal here:

The idea of systematically importing openly licensed scholarly articles into Wikisource has popped up from time to time. For instance, it formed the core of WikiProject Academic Papers and is mentioned in the Wikisource vision. However, the Wikiproject relied on human power, never reached its full potential, and eventually became inactive. The vision has yet to materialise.
We plan to bridge the gap through automation. We are a subset of WikiProject Open Access (user:Daniel Mietchen, user:Maximilanklein, user:MattSenate), and we have funding from the Open Society Foundations via Wikimedia Deutschland to demo suitable workflows at Wikimania (see project page).
Specifically, we plan to import Open Access journal articles into Wikisource when they are cited on Wikipedia. The import would be performed by a group of bots intended to make reference handling more interoperable across Wikimedia sites. Their main tasks are:
  • (on Wikipedia) signalling which references are openly licensed, and link them to the full text on Wikisource, the media on Commons and the metadata on Wikidata;
  • (on Commons) importing images and other media associated with the source article;
  • (on Wikisource) importing the full text of the source article and embedding the media in there;
  • (on Wikidata) handling the metadata associated with the source article, and signalling that the full text is on Wikisource and the media on Commons.
These Open Access imports on Wikisource will be linked to and from other Wikimedia sister sites. Our first priority though will be linking from English Wikipedia, focusing on the most cited Open Access papers, and the top-100 medical articles.
In order to move forward with this, we need
  • General community approval
  • Community feedback on workflows and scrutiny on our test imports in specific.
  • Bot permission. For more technical information read our bot spec on Github.

Maximilianklein (talk) 18:27, 20 June 2014 (UTC)

Template:Collective still needed?[edit]

Is this template still needed? Unused, and we seem to have stopped development. — billinghurst sDrewth 02:16, 22 July 2014 (UTC)

Seems not, and could be reinvented at need. Charles Matthews (talk) 06:30, 22 July 2014 (UTC)
For the record though: Template talk:Polysect. I used {{polysect}} all the time in DNB work, but only in preview. So there was no reason to save it to a page. Charles Matthews (talk) 06:36, 22 July 2014 (UTC)
Back. It could do with some simple {{documentation}}. — billinghurst sDrewth 15:08, 22 July 2014 (UTC)
Mmm. It is part of my method for doing long "strips" of text to paste into pagespace. And I could document that: would be relevant to EB1911 and the Catholic Encyclopedia. I did explain that to Adam B. at a meetup once, over several minutes and much handwaving. The trick is to start with a list of titles/markup choices, use the template, scrape the preview text, and then paste the real text into the gaps between begin and end. Not very intuitive. Charles Matthews (talk) 16:23, 22 July 2014 (UTC)

Need edit help[edit]

Hi Charles, Would you have the time to look at [5] i have section breaks I can't seem to fix. --Daytrivia (talk) 01:07, 7 August 2014 (UTC)

If it's not working now (I have moved the reference footer up into the text) it looks like an artefact created by the footnote. Charles Matthews (talk) 04:21, 7 August 2014 (UTC)
I had to wait until the caching allowed me to see the new effect, but it does seems fixed now. Charles Matthews (talk) 05:43, 8 August 2014 (UTC)
Thanks a million Charles. Daytrivia (talk) 08:57, 8 August 2014 (UTC)

Edition problem?[edit]

Here the text diverges from the image in CÆSAR, Sir THOMAS (1561–1610):

  1. ", and M.P. for Appleby in 1601" added to the text
  2. "His career at the bar was undistinguished" rather than "wholly undistinguished"
  3. "cursitor baron" rahter than "puisine or cursitor baron"
  4. "next month" rather than "ensuing month"

I am assuming that these are due t replacing the image with a better one, but with different text!

Rich Farmbrough, 03:10 24 August 2014 (GMT)

@Rich Farmbrough: Looks as though the scan was replaced c:File:Dictionary of National Biography volume 08.djvu in April 2014, and from today's access I cannot see the older djvu version to know about the scan at that page (it may have been a dud image scan). We should proofread to the scan, so feel free to make the appropriate changes, and any addendum will be added. You can always make a comment in the notes section to the transcluded work of the additions in a later edition, presuming that is what we are actually seeing, though history unknown. — billinghurst sDrewth 00:35, 25 August 2014 (UTC)

Thanks for the corrections. The issue is caused by my use of the DNB text from the ODNB site, which is a later edition. Here they've added in the MP information, and shortened other pieces of the text to fit it in: better for the reader, but worse for the WS "norm" of faithfulness to the scan and the first edition. There will be other examples: my plan in particular is to find examples in DNB00 where later references have been added by searching for 1901, 1902 etc. In any case I tried in proofing the text to pick up on these changes; but I was not completely successful, and if you find more of the same you can just correct them.

I don't believe the replacements of djvu have caused a change away from the first edition, but it did happen once, I think, and was fixed. Charles Matthews (talk) 04:45, 26 August 2014 (UTC)

UK Wikisource training[edit]

Hi - it would be great to hear a bit more about Wikisource and what UK training sessions might involve. I see from your Wikipedia userpage that you're in Cambridge - I don't suppose you're coming to EduWiki this year in Edinburgh? If not, I am coming down to London in the first week of November for what might be a couple of nights. It would make sense to work in a trip to Cambridge while I'm down south to pick your brain on either the 4th or 5th of November if you happen to be available. I'm doing the Train the Trainers session after EduWiki, and I think I'll focus on using that to work through some ideas for a Wikisource training session. It would be great to be able to speak to you about it and get your input. ACrockford (talk) 11:23, 10 October 2014 (UTC)

Not at EduWiki. I think it is in the nature of an unsolved problem how to do a basic Wikisource training session, so, yes, some discussion would be good. Charles Matthews (talk) 04:38, 11 October 2014 (UTC)
Would you be free on either 4th or 5th November? If you prefer to arrange something by e-mail let me know ACrockford (talk) 10:28, 15 October 2014 (UTC)
I've just seen that there might be a Wikidata meetup in London, evening of 5 November. So I might well be at that. Not confirmed yet, though. Charles Matthews (talk) 14:11, 15 October 2014 (UTC)
I hadn't seen that, but if so it would be good to attend. Let me know if you'll be there or whether 4th/5th would work. Happy to come through to Cambridge! ACrockford (talk) 12:14, 20 October 2014 (UTC)
OK, you can mail me from the sidebar a bit nearer the time, and we'll firm something up. Charles Matthews (talk) 16:31, 20 October 2014 (UTC)
So the Wiki Wednesday meetup is set for 6 to 8 pm at Development House, Wednesday 5 November. I'm intending to be there, and so if you are that would do for a date. Charles Matthews (talk) 11:25, 22 October 2014 (UTC)
Yes, I was just looking at that. I'll be there for sure - I will probably see if I can hotdesk at WMUK HQ that day anyway and have some meetings with other WMUK staff, would you possibly be able to come a bit before the meetup? If not that's fine too ACrockford (talk) 12:41, 22 October 2014 (UTC)
Probably, for 5 pm anyway. Charles Matthews (talk) 13:21, 22 October 2014 (UTC)
That would be perfect then - shall we tentatively set that in place? ACrockford (talk) 12:06, 23 October 2014 (UTC)

Conversation at WD to note[edit]

Just wanted to point to you d:Property_talk:P972#Question_on_usage which is probably relevant to note the proposed alternative and probably what we will use to cite DNB to people. I also see that the contributors to DNB will be listed to each volume of the DNB to which they contributed by d:Property talk:P767. Something that we will need to get to when time permits. — billinghurst sDrewth 09:36, 22 October 2014 (UTC)