Darwinism (Wallace)[edit]

Hi. I saw your work on Darwinism (Wallace). At a quick look, pages look ready to be proofread. If this is the case, pls mark them as Proofread, so someone can validate them. Good job. --Mpaa (talk) 18:23, 22 June 2012 (UTC)

The first two chapters can be proofed. I'm working on the images for ch 3 but you can do all the other pages. They're red because I've only done the rough work, I haven't actually checked the text yet tho it's from Gutenberg and should be pretty good. Saw your work on the Darwin/Wallace paper. Well done. Chris55 (talk) 19:27, 22 June 2012 (UTC)

Re your image uploads[edit]

If you can populate the works with the {{information}} and put tags on the works, we will get those transferred to Commons via the bots (tagging with {{move to Commons}} works here). Thanks. — billinghurst sDrewth 03:03, 23 June 2012 (UTC)

Help:Copyright tags and these should pretty well align with Commons:Copyright tags. I think that you will find that we predominantly use {{pd/1923|year of death}} for our works. — billinghurst sDrewth 12:55, 25 June 2012 (UTC)



Regarding your analysis of wikistats on your user page, I think there is something you've overlooked: wikistats only considers mainspace edits. This is a good decision for the Wikipedias, because it counts actual contributions to the encyclopedia and not navel-gazing meta stuff. But for the Wikisources it is problematic, because all of the core editors have embraced use of the ProofReadPage extension, and over 90% of their edits are in the Page namespace. You yourself are making about 1000 edits per month, only about 20 of which are in mainspace. At present I am listed on wikistats as inactive, because I managed to go three months without editing the mainspace at all! Anyhow, I think if you were to have a fresh look at the numbers with this fact in mind, you might see things differently. For example, I do not accept the equivalence between "core set of editors" and "very active librarians" (as measured in mainspace edits alone), and am unconcerned by the latter having shrunk from 20 to 7.

Hesperian 00:04, 6 July 2012 (UTC)

Reform Month[edit]

I saw your post on Dominics talk page. Adam has started this page and is trying to flesh it out before bringing it to the main page, I think. Digipoke (talk) 17:34, 7 July 2012 (UTC)

"Engine" is doomed to fail at this point in time[edit]

Gday Chris. Nice idea, though unfortunately it isn't going to work. Due a failure in how mediawiki is indexed, the subpages are indexed without being transcluded, hence the subpage search fails. We've been there and fallen for that ourselves. There will be a discussion in the archives of Scriptorium and there is a bugzilla in place, however, there is basically nobody working on the search engine. :-( — billinghurst sDrewth 15:57, 11 July 2012 (UTC)

I don't understand you. In the cases where I've used engine it works perfectly well. I didn't start using it without testing! Chris55 (talk) 16:05, 11 July 2012 (UTC)
ps. I can't find any discussion in the Scriptorium unless it's the request about search engines I made which no one has responded to. In any case the scriptorium search uses the engine template so on your theory that won't work either. Maybe you're getting confused with the "incategory:" search which I tried out and found to be wanting. Chris55 (talk) 16:32, 11 July 2012 (UTC)
hmm, having looked a bit further, I see what you mean. I suppose it demonstrates how little real content is transcluded on this wiki. The searches work perfectly well for titles of articles, which is the first requirement for encyclopedias but not for the contents of those articles if they are transcluded. It's an interesting implementation issue: I doubt that every page is transcluded every time you look at it which means the database probably contains many of the full pages somewhere, but they aren't being indexed properly. This doesn't affect Wikipedia which doesn't use the different name spaces in the same way and that's the high volume site. I don't have access on WS to most of the system (:-) but I suppose I could download a copy of mediawiki and have a play. Presumably this has been taken up with the maintainers at a high level already? Of course, if you hadn't goaded me into upping my edit count in the article space, I wouldn't have used it as much as I have. :-| Chris55 (talk) 17:43, 11 July 2012 (UTC)
Eventually found the discussion I presume you were referring to at Wikisource:Scriptorium/Archives/2010-06. The unfortunate thing is that no action was taken even tho the end of the discussion seemed more upbeat. In particular no one noted it in the template documentation - which I have now. I'm inclined to leave the boxes as "better than nothing" while we explore improvements. Is that sensible? Chris55 (talk) 19:24, 11 July 2012 (UTC)
oldwikisource:Wikisource:Wishlist and entered into bugzilla includes the discussion. There is a time when we stop beating our heads against the keyboards for the lack of hacking that gets for WS issues, and we go back to transcription. — billinghurst sDrewth 08:19, 14 July 2012 (UTC)
I'll bet. I really liked your March contribution on bug 18861 and was also glad to see that my feeling above after a fix is on the same lines that orenbochman is following. But it's hard to see how far up the priority tree our problem is. Chris55 (talk) 08:57, 14 July 2012 (UTC)

Help:Adding texts[edit]

I've copied the contents of Help:Adding texts to the relevant Beginner's guide... pages, so if you still want to copy over it, go ahead. Sorry about the delay in getting that done; real life keeps insisting on taking up my time. - AdamBMorgan (talk) 20:37, 11 July 2012 (UTC)

Adam, I was happy to see your additions to Adding texts which seems to deal with most of what I was commenting on. But I'm not sure what you mean by copying it somewhere else. The Introduction to editing Wikisource I wrote still refers people to it. What needs to be sorted is the status of the 'quick guides'. Should these become simply part of the general help system leaving your beginner's guide as the main introduction or should we have 2 sections: beginners and quick guides and then various reference sections? Then there's my introduction to editing—some bits of which could be merged into your beginner's guide. But that discussion should be on Help talk:Contents. Chris55 (talk) 14:43, 12 July 2012 (UTC)

No need to speedy[edit]


You might want to undo all those recent speedy requests... the bot needs to be run to complete the soft-redirect cycle for Feb. 2012. See the CAT at the bottom of those pages. -- George Orwell III (talk) 15:01, 14 July 2012 (UTC)

Hm, sorry. I'll stop doing it!! Chris55 (talk) 15:05, 14 July 2012 (UTC)
No worries and a hearty thanks is in order - seems like nothing soft-redirected in 2012 has run all the way to the end where those pages normally get deleted. -- George Orwell III (talk) 15:13, 14 July 2012 (UTC)
Normally three months. We may need to push GZ to see if he can get TalBot back and functioning. Special:Contributions/TalBotbillinghurst sDrewth

Header template[edit]

It was a typo! You gave me a scare, there. No worries. --Eliyak T·C 16:20, 15 July 2012 (UTC)

You are now an administrator[edit]


You are now an administrator. Congratulations and enjoy.

Hesperian 00:45, 24 July 2012 (UTC)

If you have language skills other than English, can you please update Wikisource:Administrators#Current administrators to reflect that. Cheers, Hesperian 00:53, 24 July 2012 (UTC)
  • First I want to be clear that I do not see anything of substance wrong with your deletions. However it has always seemed wiser to me not to personally delete items that I nominated myself. I don't think there is rule against choosing to do otherwise, but since you are so newly an admin, I just wanted to make sure you have at least thought about this issue. That you are making the choice purposefully. Most of the time it really doesn't matter, but it has the potential to make hard case turn uglier than it might otherwise. Just some unsolicited advice.--BirgitteSB 02:50, 1 August 2012 (UTC)
Thanks for the advice - I thought the protocol might be the other way round. (And I'm not expecting everything to be written down!) Chris55 (talk) 08:52, 1 August 2012 (UTC)


Hi, there's no need to mark the advertisement pages as "not-proofread". They're not transcluded as they are not part of the works and are perfectly OK as "non-existent". Cheers, Beeswaxcandle (talk) 09:38, 26 July 2012 (UTC)

yes I agree, but no harm done! I was just trying to work out which index files should be marked as "Done". Quite a few actually Chris55 (talk) 10:30, 26 July 2012 (UTC)
Yep, no harm—I just didn't want to you waste your valuable wiki-time on a minor. I'm pleased to see that Hesperian has updated his Indices list (which I had forgotten about as it's been a while). I usually have a rush of enthusiasm at this point and start moving "Proofread" to "Validate" or up to "Done". More than happy to have someone else doing it as well. Because Hesperian's done it as a sortable table, I usually sort on either "non-existent" or "not proofread" columns and go for the low-hanging fruit. Cheers, Beeswaxcandle (talk) 07:15, 27 July 2012 (UTC)

Community portal[edit]

The following was posted to Erasmo's talk page and brought here by me -wmm2.


Erasmo, you've suggested I help on changes to main page & community portal and I'm happy to. Can I explain what I think is needed?

Basically we need to set up a proofreading programme and for that we need lists of "to be proofread" and "to be validated" titles to parallel the list of recently completed titles, which Adam has just improved a lot. It's quite a difficult task to pick realistic targets from the 4,000 odd incomplete index files which is why I posted on MOTM about categorising index files. I've cut the list down to more like 1,000 but it's still far too many. But the failure of anyone to respond makes me wonder if I'm totally out of step. Am I? Chris55 (talk) 09:06, 2 August 2012 (UTC)

Chris55, I think your ideas are good but I don't always see them. It was luck that I saw the above. I ask that you please post in Scriptorium when possible because I do read that area on occasions. I do not think you are "totally out of step." I think you are very good at organizing. The failure of "anyone to respond" perhaps is due to several issues including not seeing statements like the above but also because people are busy with their own projects whatever they may be. People have to at least be alerted to specifics and Scriptorium is where I look for that sort of thing. I would have posted the above and this reply on Scriptorium but your statements and question was more specific to Erasmo and I did not feel right in placing this on Scriptorium. I state that I am willing to "respond" on whatever the situation may be as long as I think I can. I saw a list of things you created on your talk page under a sub-page and those things listed showed me areas where I certainly would respond regarding works that need proofreading and validating including volumes of works yet untouched. IF you have already stated these things on Scriptorium—I have not seen them because of editing three works on Turkey and Armenia that some person asked for but has done nothing with them. I was also validating other works including one on Texas. Please, create your lists of done, proofread but need validating, validation completed &c. and post to Scriptorium as to where these lists are. Make sure they can be edited (i.e. insert volumes on proper list.) Kindest regards, Maury (William Maury Morris II (talk) 17:08, 2 August 2012 (UTC)

Thanks William. The reason I haven't posted to the scriptorium yet is that the ideas are very much developing. I'm getting there and will soon post I hope. Chris55 (talk) 18:20, 2 August 2012 (UTC)

From my talk page:


I've just been through WikiProject and was surprised not to see a Southern/Confederate project, particularly since there's a huge set of Confederate Veteran works as well as the SHS. Would you like to start one or have I missed it? Chris55 (talk) 12:22, 3 August 2012 (UTC)

Chris, I am in haste at this point and do not know *exactly* what you are asking me. The Southern Historical Society Papers consist of 52 volumes totaled. AdamBMorgan and myself have placed many of those on WS. Both he and I have edited many pages. Volume 1 was completed but not all verified and Adam submitted that a few months ago under new texts. For the Confederate Veteran Volumes I have focused mainly upon volume 3 which is Virginia, my home state. I would certainly complete all of vol. 3 there first and partly because I know that history and the places and the people's names who are Virginians. I used to belong to the Sons of Confederate Veterans and the Sons of the American Revolution and thus, in part, my interest in those subjects about Mr. Lincoln's War of Invasion upon the South. Adam and I had been working hard on the SHSP (SHS) volumes together but as time passed we ended up doing other works. I hope that you don't expect me, alone, to transcribe all of those volumes. I did volumes 1,2,3. Volume 3 needs to be validated by someone else and she'll be ready. I have done all of those volumes back in 1994 and had them placed on a searchable CDROM by Guild Press of Indiana for sale. I only did plain ascii. Others at Guild Press did the rest of that work. Point is I don't like to do something that big all over again. William Maury Morris II (talk) 20:31, 3 August 2012 (UTC)
I believe he is asking: Do you want to create Wikisource:WikiProject Confederate States of America (or something with a similar title). It would be a page for people interested in the subject to gather and coordinate their work. A similar project is Wikisource:WikiProject Popular Science Monthly. - AdamBMorgan (talk) 23:03, 3 August 2012 (UTC)

Confederate Volumes on WikiSource[edit]

  1. Confederate Veteran Military History (CMH) 12 vols. online out of 12 total
  2. Confederate Veteran (CV) 31 online out of 40 vols. total
  3. Southern Historical Society Papers (SHS, SHSP) 41 vols. online out of 52 vols. total

Chris55 (talk) 09:55, 4 August 2012 (UTC)

William Maury Morris II

William, Adam has stated what I was getting at. There's about 40,000 pages of proofreading that you've listed up there. That's an awful lot for one person. I can see that other people have done some of the proofreading but my impression is you've done most of it. The wikiproject pages are a way of gathering people around particular interest areas - identifying those with an interest, setting priorities, mutual encouragement, whatever you want or need. Afaics it's an important factor in keeping things going in these very large projects. All you need to do is to create a page such as Adam has suggested. Have a look at some others on the WikiProject page. Chris55 (talk) 09:55, 4 August 2012 (UTC)

Proofreading program[edit]

Chris, I see that you are making a lot of edits to your proofreading program, so I let you work quietly. Inform me when there is something I can do. For now please give me your feedback about the main page draft I prepared. Thanks in advance.--Erasmo Barresi (talk) 10:17, 4 August 2012 (UTC)


Thanks very much for pointing this out - all tips most welcome! HeartofaDog (talk) 10:15, 5 August 2012 (UTC)

Reaching out[edit]

Hey Chris55, just wanted to send a thank you for making edits at Index:An Essay on the Principle of Population (1798).djvu. Whenever I find myself uploading and proofreading books here, I hardly ever see anyone get involved on the same scan. For what it's worth, I'm also a fan of Wittgenstein, Russell and the other philosophers you have listed on your user page. Best, Blurpeace 11:38, 5 August 2012 (UTC)

(in reference to [1])
I've no problem with that. I'll watch the page for changes made and continue editing in a little while. Blurpeace 11:53, 5 August 2012 (UTC)
Ok. it's done the preface and chapter 1 and is chuntering on so you can start. One thing to watch out on is that it often gets the page split wrong by 1 or 2 words. I corrected the chapter headings but not the others. There are a couple of large footnotes missing. Happy proofing! Chris55 (talk) 11:59, 5 August 2012 (UTC)
I notice you've rendered the first couple of pages of the preface using the old convention of f's for intermediate s's. I believe it's Wikisource policy to render these in present day orthography, though I'd have to look up chapter and verse. The match and split version certainly uses the modern convention. Ok? Chris55 (talk) 12:06, 5 August 2012 (UTC)
I was under the impression that there wasn't much consensus over the use of long s's, per Wikisource:Style guide#Formatting. I prefer remaining faithful to the text but could see an argument simply for readability's sake. What do you think? Also, if it isn't too much trouble, can we keep the whole conversation here? I have your page watchlisted, and I don't like jumping between conversations. Best, Blurpeace 12:15, 5 August 2012 (UTC)
Fine, I'll answer here. I agree that the issue isn't clear. The style guide refers to "Special characters such as accents and ligatures" but not to this precise situation. Personally I don't see any merit in using f's. I can't see any precedents. The first folio of Shakespeare (e.g. Macbeth (First Folio)) doesn't have them but there's no facsimile. Can you see any? Chris55 (talk) 12:36, 5 August 2012 (UTC)
I've been looking more closely at the print version of Malthus. The s's aren't f's. They're close but different. It's not surprising the OCR gets it wrong because the s only has a stroke on one side and the f on both. So I don't think we ought to worry. I know it all derives from the German but I can't at this moment remember the history. Chris55 (talk) 12:53, 5 August 2012 (UTC)
Well, you can see I'm not a classicist, but I've been catching up on {{long s}} and see that's what you've been working on. The last thing I want to do is to discourage you from proofreading, but you can see that for the unaware reader there is little difference discernible. One possibility would be just to use them for the preface, though it would be a bit odd. What do you want to do? Chris55 (talk) 14:35, 5 August 2012 (UTC)
Well, the way I see it, down the line, Wikisource is bound to host different copies of this title, much like a well-funded library would. I would prefer that we use the long s to remain faithful to the 1798 original and, if readers are looking for more readable, contemporary typography, we can host recent copies (up to 1923, of course) too. Blurpeace 23:16, 5 August 2012 (UTC)
Just slipping my oar in here. Our "policy" is that we don't have one on Template:Long-s vs s, other than that it should be consistent throughout a work. It's up to the main editor of a work. The best way to communicate the decision for a particular work is on the Index Talk page. Beeswaxcandle (talk) 21:12, 5 August 2012 (UTC)
Well, in response to both most recent posts, it seems like it's possible to have one's cake and eat it, though I'll only be able to check that when a whole chapter is transcluded to the mainspace. I've put the css code in my common.css so that I can see modern typography and as long as it's ok to put a note in the header to help others who want to do the same, I'm happy for that arrangement. I'm not yet sure the css works equally if one uses the unicode directly rather than the template, but I assume it does (and it would be nice if the epub download can use it too). It would be pretty easy to write a script to convert the whole text to long s, though I don't yet have bot rights. But if Blurpeace wants to convert it all manually (or maybe you're using a script too) then that's ok, as long as you don't stop halfway. I'll try and keep ahead with the first proof but can't promise. Chris55 (talk) 10:51, 6 August 2012 (UTC)


Hi Chris, thanks for the welcoming and if some doubt exist for something, I'll rememmber that.

Regards, --Distriker (talk) 21:34, 7 August 2012 (UTC)

One question: I just saw the title of the Declaration that I published says "of the Peoples" and is "for the Peoples", I had an oversight. How do you change a title?
Regards, Distriker (talk) 11:16, 8 August 2012 (UTC)
At the top next to "View history" there's a pulldown which should include "Move". If it doesn't (there may be a delay as you're newly registered), let me know and I'll do it. Chris55 (talk) 11:54, 8 August 2012 (UTC)
Done! Thank you very much!
Regards, Distriker (talk) 01:07, 9 August 2012 (UTC)[edit]

To Chris55 From Maury65,


Chris, I think we need to get some conversation and attention going on the talk page. William Maury Morris II (talk) 04:18, 12 August 2012 (UTC)

Fire away, I'm watching the page. You may notice I added an item about cleanup teams because my original tasks were mainly technical but what we need to establish is a collaboration in which people are watching progress. Similar tasks are required. Chris55 (talk) 09:18, 12 August 2012 (UTC)
Chris, are you watching close? I have already fired away on the area's talk page. The fire should have been noticeable. If we don't start with what little fire is there will burn out. Toss some gasoline on it. Kind regards, William Maury Morris II (talk) 03:05, 13 August 2012 (UTC)
You're right. Not sure why it was still red-lined when I last looked, but I've responded now. Chris55 (talk) 10:10, 13 August 2012 (UTC)


Hey Chris,

I noticed that you marked a validated page "no text," one which is an advertisement. While these typically hold no weight when proofing the work, I prefer that they still go under proofreading. For example, a work which, I guess is (ugh) dear to me... would be History of West Hoboken N.J., as it is literally a snapshot of my neighborhood from a century ago. With this being said, even the advertisements are of great interest to me. In that regard, I suppose any advertisement could be interesting to someone. Following that, they should be proofread and validated too, eventually, or if need be.

If you look at the status of an index, it states "work proper," which should address the confusion between the "whole publication" and the "proper work". So, you were right to change the index status to proofread, as the "work proper" has been proofread, but I prefer if you leave the advertisements to be proofread themselves.

Any questions, let me know :) - Theornamentalist (talk) 20:25, 13 August 2012 (UTC)

They can be transcluded, take a look at Amazing_Stories/Volume_01/Number_01 using the {{advertisements}} template, in my aforementioned work, I created an ].entire page simply listing the adverts.
I think there has been a discussion on this, which led to changing the index status from something like "All pages have been proofread" to "All pages of the work proper have been proofread." This is to distinguish between the scenarios. - Theornamentalist (talk) 20:57, 13 August 2012 (UTC)

Pages with text marked as "without text"[edit]

Hi, just noticed that you've created a few pages and marked them as being without text, when they do have text on them. Even though that text will not be transcluded to the main work, it's wrong to say that they have no text. Beeswaxcandle (talk) 22:53, 16 August 2012 (UTC)

Chris, if it helps, I was thinking of updating my index progress script to exclude pages that are in Category:Not transcluded. Hesperian 00:56, 17 August 2012 (UTC)

I have responded to your request for comment on WS:S. Until resolution is reached, please stop marking these pages in this way. If the consensus goes the other way, there's going to be a lot of work to undo what you are doing. Beeswaxcandle (talk) 22:50, 17 August 2012 (UTC)

Index:SRF Articles of Incorporation 1935.pdf and Index: US District Court Jury Verdict SRF v Ananda 2002[edit]

Hello Chris, I need help with how to complete these two uploads. I am new to Wikisource and am a bit confused. Plus I am ready to upload another one. Can you help me with this? Thank you Red Rose 13 (talk) 16:08, 19 August 2012 (UTC)

Hi Chris left you a message at Red Rose 13 (talk) 20:04, 19 August 2012 (UTC)

Thank you for helping me with editing - I added all the underlines and found a couple of more things to fix. How is it looking now? Red Rose 13 (talk) 22:17, 19 August 2012 (UTC)

Hi Chris, we are almost done. I still need you to edit this article - - you proofread the first page but we still have 6 pages to go. Are you still able to help? Thanks so much. Red Rose 13 (talk) 21:00, 21 August 2012 (UTC)

A start on Index categorising[edit]

Hi, I've gone through the A's and B's in Category:Index Proofread and created User:Beeswaxcandle/Sandbox2. Is this along the lines of what you're thinking?

Doing this has a dual purpose for me, in that Validation month is coming up and we try to cover a range of works. So, having some idea of what's there is useful. Beeswaxcandle (talk) 01:18, 26 August 2012 (UTC)

Thanks for that. Yes and no. My interest is really in the category: Not proofread and most of them don't have a main page which is categorised. I notice that some of these don't yet have a main page (e.g. the first one on Nabathean agriculture) but most have and are already categorised, though that can be improved. For those that don't yet have a main page I still think that there is a point in adding a category to the index page, but that the category should be transferred to the main page when it is created.
If the categories are already there, then presumably you should be able to create a complete list from the database. I can't yet because my application for a toolserver account is still pending. Chris55 (talk) 09:47, 26 August 2012 (UTC)

Community portal and proofreading games[edit]

Hi Chris, I started a draft to revise the community portal's design and content. Since you have worked a lot to improve the proofreading rate, you probably know what it should include for this purpose. I also signed up for WikiProject Proofreading and I got the idea of the "Wikisource Proofreading Games," a competition between language communities to see which one makes more proofreadings in a certain number of days. I'd like to have your helpful opinion on both things, of course.--Erasmo Barresi (talk) 09:37, 28 August 2012 (UTC)

Chris, we could change these two messages (1, 2) to explain to registered users that they can proofread or validate the page, and eventually how they can. Does this sound good?--Erasmo Barresi (talk) 14:37, 30 August 2012 (UTC)
Do you mean them to say something like "You could proofread this page"? It's certainly possible, though I think you might end up annoying people who are already proofreading and I wonder if one wants to catch them earlier, e.g. when they find a half-finished work. Nor is there room to say much more at that point without cluttering everything. Chris55 (talk) 14:21, 31 August 2012 (UTC)


Chris55, How about, "You can proofread this page as opposed you "could" proofread this page. Happy September 1, 2012! —William Maury Morris IITalk 08:37, 1 September 2012 (UTC)

You're right, it may be annoying.--Erasmo Barresi (talk) 19:21, 31 August 2012 (UTC)
But a newbie who reads "This page has been proofread" may not know that the page still needs to be validated. So adding this information ("This page has been proofread and needs to be validated") will probably be useful. Shall we propose this change on the Scriptorium?--Erasmo Barresi (talk) 08:10, 1 September 2012 (UTC)
Erasmo, your idea seems good to me. You may also want to explain the color codes of no text, proofread, &c. Happy September 1, 2012! —William Maury Morris IITalk 08:37, 1 September 2012 (UTC)
What about "This page has been proofread once"?

WMM2, we don't need to explain the colours as the message in each case already links to Help:Page Status. Beeswaxcandle (talk) 08:51, 1 September 2012 (UTC)

Yes, I think that's good, Erasmo, and first message could read "This page needs to be proofread". Possibly "but" is better than "and". Whether one can link "proofread" or "validated" to the appropriate help page I don't know - it might be better than appearing to shout. Chris55 (talk) 10:39, 1 September 2012 (UTC)

I've just opened a discussion on the Scriptorium with these changes. I made two edits by mistake because I wanted to post another change to be made, but then I considered writing it on your talk page to not be confusing... but I forgot to delete the text and I saved... I'm sorry. The other thing to be done is changing the links, pointing to Help:Page status instead of Help:Page Status as the second is a redirect, in all messages (0, 1, 2, 3, 4) and also in the index template. Please do this.--Erasmo Barresi (talk) 08:30, 3 September 2012 (UTC)

Rivers of Great Britain[edit]


Your recent upload for Index:Historicalaccou01priegoog.djvu is peppered with dozens of pages like This and like This where the left hand side is clipped beyond the normal margin well into the text. If you like, I can clean the original PDF from GoogleBooks of all watermarks, etc. so you can upload it to IA for OCR & DjVu conversion for one more try at producing a workable source for us. If not, the way the current DjVu stands is more a candidate for deletion rather than for proofreading in my view. -- George Orwell III (talk) 22:08, 30 August 2012 (UTC)

Yes, George, it's a disaster. I'm a bit ahead of you but have encountered a snag. I tried converting the Google version (pdf 33.3MB) to djvu using pdf2djvu and it came out at 75.6MB! Not sure what is wrong. I've got djvulibre on my machine and maybe another module will work better. What's your experience?
Without getting too down into the weeds... first, the IA hosted "original" PDF is rarely the PDF as it exists on GoogleBooks so unless you are using the URL for the PDF on the front details page to get the GoogleBooks PDF downloaded, you're working with a disadvantage. The IA PDF is compressed and flattened ontop of the compression already used by Google to be backwards compatible with versions of Acrobat as old as 5.0.
I've found downloading the PDF from GoogleBooks fresh and decompressing it completely works best. Images are a separate matter and should be done afterwards. Other than that I use PDFto DJVU when needed instead of uploading a PDF as well but I run it from a command-line with adjusted settings instead of the GUI. Testing conversion of 5 pages at time helps determine the best setting. This is a waste of time when the PDF does not have it own text-layer & most GoogleBook PDFs don't have one when downloaded in case you haven't come to realize as much yet on your own.
Bottom line -- take the temp pdf I'm uploading as I type, File:Xriver.pdf, with all the Adobe work done already, and upload to IA for OCR and DjVu conversion. Then take the best images from the existing IA repository or from the new one as needed. -- George Orwell III (talk) 10:56, 31 August 2012 (UTC)
  • Yes, I did download the original Google pdf as you describe. It's slightly bigger - the Stanford library copy rather than the Bodleian version you seem to have loaded: 33 to 23MB probably represents the quality difference.
I've been trying to compile gsdjvu, the dvjulibre version of a utility which should be much more effective a way to generate pdf->djvu than the one you describe. Unfortunately they couldn't agree on licensing with AT&T so there's no distributed binary and I always seem to have problems with Ubuntu's libraries so the first 2 attempts failed. I'll try another way.
I don't see a difference in quality but that can be my old ass system.
Good luck with that but IA's OCR routine is superior in that it provides for section, column and paragraph coordinate mapping while the other [free] rountines out there only handle words and lines. Just extract a text layer from an IA djvu vs one from Any2DjVu and you'll easily see the difference. I'm not saying the Proofreading extension is smart enough to utilize one over the other right now but there is always the hope that the extension will eventually work as the DjVu file spec. says it should one day in the not so distant future (making proofreading dramatically easier even for the seasoned contributor in the process). -- George Orwell III (talk) 16:13, 31 August 2012 (UTC)
PS. I have also since found an HTML version of this book so I don't really need the OCR. I'm intending to get the wikipedia UK waterways wikiproject involved in the proofreading as it'll be an excellent resource for a large number of articles there. Chris55 (talk) 09:51, 31 August 2012 (UTC)
I frown upon such practice. Please explore this elsewhere if you must. -- George Orwell III (talk) 10:56, 31 August 2012 (UTC)
It comes off the website of an old stalwart of the English canals, long since retired, so I think he would be pleased for his work to go further, though it needs further proofreading. I don't think such things should be done without acknowledgement nor do I see any point in redoing Gutenberg's efforts (except to tidy what we have). I'm also assuming it's not getting extra proofreaders you disapprove of. Chris55 (talk) 14:07, 31 August 2012 (UTC)
My problem with the practice has little to do with sources or methods. Much like you had to wait for the kinks of Epub to be worked out before you found en.WS worthwhile, so too is the case when it comes to the current utilization of embedded text-layers. Even a bad layer in place today can facilitate possibility of text layer replacement once a work is validated via PR in the future. In other words, when you skip providing a layer today -- even a bad one -- all you are doing is making more work for someone if and when that day arrives in the future. -- George Orwell III (talk) 16:05, 31 August 2012 (UTC)
I'm not intending to leave it without a text layer. Obviously it can't use match and split directly, but I was hoping to do something equivalent: the page numbers are actually marked, though how reliable I don't yet know. (I've just got an account on the toolserver.) Chris55 (talk) 17:33, 31 August 2012 (UTC)
Don't follow you - embedded text layers can only be applied to source files. A match n' split would never update the source file with a text layer. In fact, any kind of manual editing at the point in the PR process relating to Page: creation/editing would not affect the embedded text layer of a source file in the slightest. Why are you so reluctant to use IA for something other than pilfering their source files? If they will take a source PDF, OCR it and then create a djVu from it, what exactly is the problem? Bandwidth? -- George Orwell III (talk) 18:30, 31 August 2012 (UTC)
What I intend is to fill the page space. Sorry if my language wasn't clear. Still battling with gsdjvu. Chris55 (talk) 10:45, 1 September 2012 (UTC)
I'm not sure if everyone is using the same terminology. The Page namespace and the DjVu text layer are separate. When a page in the Page namespace is started it extracts some information from the DjVu text layer but that's all the interaction that occurs between the two. Any edits made on Wikisource remains on the page in the Page namespace and the original, unaltered text layer is still part of the DjVu file on Commons. In the future this might change. An example of a future use for a DjVu text layer is in collaborations with libraries. It has been mentioned that they would be more interested in our project if they could get human-proofed text back in a usable form. The DjVu text layer contains additional information about the position of the text, which is ignored when we extract it. Re-applying our proofread and validated text to the DjVu, removing all of the OCR errors but keeping the extra information in the DjVu, is a gain for libraries. This cannot be done at the moment but people are at least thinking about it. However, if the DjVu has no text layer then this is impossible. It may just mean downloading, processing and re-uploading the DjVu in the future but, as George Orwell III says, that is making more work for someone in the future. - AdamBMorgan (talk) 21:43, 1 September 2012 (UTC)
I agree. My point, however, was keeping a focused eye toward the eventual arrival of that day and not so much the current state of affairs. Things like Match-n-split or scripted imports of plain-text hosted elsewhere into the Page: namespace over a file that has no text-layer whatsoever to begin with is a cheat as far as I'm concerned and a practice that should be grand-fathered out of standing policy. All I want is as much uniformity as possible at this stage and that means every work begins with an embedded text-layer of some sort in every source file uploaded. After that basic rule, I don't care one bit what hook or crook one uses to create and proofread content. The time will come one day when the possibilities of a two-way text-layer will arrive and works by editors who took shortcuts before that point in time will eventually be identified as sub-par and fall to the wayside just like the hundreds and hundreds of copy & paste imports created in the mainspace in the era before the PR extension arrived are quietly considered as today.

As far as the development of that 2 way text-layer goes - I've seen it done locally to test files at least a dozen times by now. Impressive would be an understatement. And, like I said before, in order for this to work to its full potential it all hinges upon the more detailed coordinate mapping generated by AABBYfine(?) of text during OCR (Internet Archive) being present instead of the more watered-down Any2DjVu type -- but regardless of complex or simple -- it all starts with a text layer Of Some Kind & Quality being present to begin with. -- George Orwell III (talk) 22:35, 1 September 2012 (UTC)

Ok, I do understand the terminology, but am just exploring the new tools. I did eventually manage to get djvudigital to work (package gsdjvu) and uploaded George's Xriver.pdf as Index:Rivers, Canals, Railways of Great Britain.djvu (didn't rename first time). It's reduced it slightly to under 20MB once I worked out how to use the flags, and I then learned that djvulibre doesn't have the OCR software so why it was important to use IA - and I'm now uploading it there and will post an update to commons when it arrives. I can see that despite using other tools to populate the pagespace it's nice having the OCR there, though I can't think of it as a cheat. I'm sceptical of George's 2 way text-layer - ever tried converting non-tagged postscript back into text, George, a much easier task? I'm afraid human proofreaders will be around for some time to come. Chris55 (talk) 15:01, 3 September 2012 (UTC)
Why would you bother trying to convert non-tagged postscript back into text? You're just proving my point - trying to build upon a foundation that lacks the basics (i.e. even a bad text-layer) only serves to limit the possibilities not expand them. And 2 way extraction/insertion does indeed work even with DjVuLibre, it's XML features, a well defined customized .dtd file and some scripting beyond my skill-sets. Our .php file concerning these features still refect the software versions where the XML portion was still buggy (long since fixed) and, because of that, was never fully developed. We got 1 way plain text dumps instead. -- George Orwell III (talk) 15:41, 3 September 2012 (UTC)
George, I haven't come across "2 way text layers" and don't know what they are. I used that analogy picking up on "more detailed coordinate mapping" but if you can point me at an explanation we might have a more fruitful discussion. I've been faced with trying to turn non-tagged pdfs (postscript not image) into text in the past, which is why I know how tedious it is (in general). Chris55 (talk) 20:50, 3 September 2012 (UTC)

"Two-way text-layers" is probably not the best term to describe what amounts to being able to both embed and extract -- plus an ability to edit -- a text-layer in either state, but its the best that I can come up with. Sorry for the confusion.

If you accept the idea that a DjVu file is primarily nothing more than an image of some other formatted file taken of a scan or scanned image at its core, then you are well on your way to understanding the premise. If you have seen a an image where hyperlinks have been embedded under a set region or area of some graphic or photograph, then you understand the premise behind coordinate mapping --> ( a picture is 300px by 150px and the embedded URL starts at x-axis coordinate 50(px), ends at 120(px) (a total of 70[px] long) and the height of the link starts on the y-axis near the top of the picture at 130(px) to 140(px) (a total of 10[px] high). Well swap out the photograph image for an image of scanned text, swap out the embedded hyperlink's URL for visible text on a page and you have the same basics of a hidden text layer used in both the thumbnail view for Page: editing as well the dumped text from the text-layer to the edit box.

The above coordinate mapping is frequently accomplished by using XML (see here). To make the XML useful not only for conversions to or from other file types/formats but for uniformity under a specific file type (a la DjVu) all you need is a corresponding .dtd file (see here) for that file type.

Now look at your DjVuLibre package of executables - a basic DjVu .dtd file is in one of the folders and the DjVutoXML & DjVuXML parser executables should be in the main folder. The only reason the stand alone DjVu to XML thingy doesn't work today is the folder path for the needed .dtd file is wrong. When the DjVu extension was first developed it didn't work because ... well DjVuLibre itself didn't fully work - so the entire XML portion was basically abandoned backed then. I suspect nobody bothers to continue the development because the erroneous file path in the current DjVuLibre release makes it seem like the XML stuff was never fixed. Well it works now if you build the right folder path and copy the .dtd files to it.

To wrap up - we could be going from OCR'd text layer in a DjVu --> to an XML file of that text layer --> proofreading that XML using basic Wikicoding of HTML in the Page namespace --> re-inserting the validated text via the edited XML --> source file will always have the validated text plus page & font parameters to reproduce it faithfully over and over again.

Instead we have OCR'd text layer in a DjVu --> plain-text dump of text-layer to Page: namespace --> left to fend for ourselves when it comes to page and font sizes. Which is fine for the time being, but you know as well as anyone - the process from start to end is too labor intensive and too complex for the average Joe to become a truly regular contributor. Just imagine being able to spell check the entire document (XML) before presenting the text-layer for page-by-page review or having line-breaks, bullet-lists, paragraphs and tables coming up as such automatically? -- George Orwell III (talk) 22:38, 3 September 2012 (UTC)

Well, I don't think my comparison with untagged postscript was too far from the mark. There also you have individual lines tagged with coordinates, size, font info etc. And the net effect is that if you don't have the right tools, the whole thing is far more complex to deal with even than plain text. XML only helps if there is a sensible structure to begin with (contrast MS Word). The most common issue, paragraph structure, can be relatively easily dealt with at present using a few RE's: these just need to be packaged up better (editor buttons) so people can use them without having to go into the details. We could do a lot by improving the editor: centering, font sizes, wysiwig tables, even a bit more wysiwyg in general. Most of the javascript is out there already.
You haven't grasped the point of how a well-designed custom .dtd file used in conjunction with the resulting structure of an OCR generated text-layer by AABBYYfine(?) can go towards providing a sensible and uniform structure as a foundation even with available freeware like DjVuLibre.

Regardless, under every Summary Info section for a file on Commons is a link to edit that file using external software. When I see somebody actually offering something other than what I've already seen done, albeit locally, that allowed for this 2 way insertion/extraction - I'll shut up. Until then sorry.

But it seems that your goal is to put back the results of the proofreading process into the same djvu file. It's a neat idea, but I'd need some convincing it's useful. Do we need to carry around the original, warts and all? Currently there's a big storage overhead, though that gets less important over time. Chris55 (talk) 13:28, 4 September 2012 (UTC)
Its only useful in getting real libraries to open there vaults to scans in return for layers - no costs involved. IA is nice but not very top-end. The good stuff has yet to be freely put online despite all of Google's preaching to the contrary over the years. -- George Orwell III (talk) 15:11, 4 September 2012 (UTC)

Premature deletion of files[edit]

Hi, I've just seen your bot request, but couldn't remember the WS:COPYVIO discussion being closed by an uninvolved admin. When I checked it it's still open. This means that deletion of the files was premature. I agree that they have to go, but we do have to wait for the discussion to close. Also, I suggest the best sequence for deletion is the page namespace, followed by the index namespace and then the files. Cheers, Beeswaxcandle (talk) 09:19, 29 September 2012 (UTC)

Oops. I just saw this in the middle of the deletion process requested at Bot requests. So, some pages Page:Million … have gone. Now I stop pending this thread to be finalized. Hope it is not an issue. A discussion directly in the bot page would have prevented me to proceed. Bye --Mpaa (talk) 12:39, 29 September 2012 (UTC)
Sorry, all. I thought copyvio deletions were more urgent than that. Unfortunately I couldn't see any guidelines. Chris55 (talk) 22:41, 30 September 2012 (UTC)

Help page index[edit]

Chris, I know you've worked on help pages a lot, thus I'm asking you if you like this draft I designed for Help:Contents. The index should include all help pages except subpages. The concept is probably good, and you may have some ideas to improve the graphics, too. Please let me know.--Erasmo Barresi (talk) 19:02, 12 October 2012 (UTC)

Hi Erasmo, thanks for consulting.
I certainly like the look of the page, but I think it may be a little premature, in the following sense. To make a hierarchical arrangement work you must be able to make the right choices and at the moment it doesn't always work. e.g. if you select "Reading" you don't find the "quick guide" on reading which is actually the main help file. Also it leaves out Adam's set of "beginner's" files altogether.
Personally, I think the "quick guide" concept is out of date and should be scrapped. It's not complete or anything like an "overview", just a few random topics. Adam has made the "Proofread" page into an overview and indeed made it a meronym for the whole creation process. I assume that he is using the "proofreadpage extension" as the basis for this, but I would actually see proofreading as just one stage in the production process.
Are you happy if I hack your page around to see if we can get a more sensible arrangement?
And incidentally, are you ok with my highlights section on the main page mockup? Chris55 (talk) 09:37, 13 October 2012 (UTC)
Hack the page around freely. The list of quick guides was already existing, but for an index it's not the best.
I'm okay with your highlights: I labeled the lines alternately and I like the selection.--Erasmo Barresi (talk) 11:11, 13 October 2012 (UTC)
Sorry for annoying, Chris... have you forgotten?--Erasmo Barresi (talk) 14:03, 26 October 2012 (UTC)
Sorry, been busy with other things. I'll try and look at it later. Chris55 (talk) 10:09, 27 October 2012 (UTC)
Re-arranged according to your comments.--Erasmo Barresi (talk) 14:33, 29 October 2012 (UTC)

Chris, I did not know of Help:Reading offline when I proposed and built Wikisource:eBook. There is a discussion about the relationship of the two at Wikisource talk:EBook, would you like to comment? JeepdaySock (AKA, Jeepday) 22:37, 7 December 2012 (UTC)

Project Longshot[edit]


I validated the pages. Thanks, Yann (talk) 19:59, 17 November 2013 (UTC)


Hello Chris55. I updated your common.js page to the latest version of TemplateScript. This is just to enable automatic updates, so you shouldn't see much difference. If you notice any problems or have questions, let me know! :) —Pathoschild 02:38, 24 August 2014 (UTC)

