User talk:Mpaa

From Wikisource
Jump to: navigation, search

(Archives index, Last archive) Welcome

Hello, Mpaa, and welcome to Wikisource! Thank you for joining the project. I hope you like the place and decide to stay. Here are a few good links for newcomers:

Carl Spitzweg 021-detail.jpg

You may be interested in participating in

Add the code {{active projects}}, {{PotM}} or {{CotW}} to your page for current wikisource projects.

You can put a brief description of your interests on your user page and contributions to another Wikimedia project, such as Wikipedia and Commons.

Have questions? Then please ask them at either

I hope you enjoy contributing to Wikisource, the library that is free for everyone to use! In discussions, please "sign" your comments using four tildes (~~~~); this will automatically produce your IP address (or username if you're logged in) and the date. If you need help, ask me on my talk page, or ask your question here (click edit) and place {{helpme}} before your question.

Again, welcome! — billinghurst sDrewth 12:00, 7 April 2011 (UTC)

A quick mass subst[edit]

The template concerned should have been subst, (and note to self this needs documenting)ShakespeareFan00 (talk) 11:37, 28 November 2015 (UTC)

Done.— Mpaa (talk) 17:21, 28 November 2015 (UTC)

Re: Wikisource:Scriptorium#Checker[edit]


Are you still interested in this topic? I ask as an interested party myself; and of course the Scriptorium entry is shortly destined for archive oblivion. I did initiate some experiments (of a variant upon your scheme) with the assistance of Ineuw but held off in light of George Orwell III revealing this is but the resurrection of an older abandoned scheme. Even if this devolves into a project of interest to only a few individuals I am happy to advertise the result; but then again if your advice is to "let it die" I'll just keep it to myself.

Please pardon the name-dropping above but it does serve to set context (especially after Wikisource-bot wields its specialised memory-axe!) In short, what do you advise?

AuFCL (talk) 21:42, 17 January 2016 (UTC)

Yes, I am still interested. IMO it should be a gadget tat upon a preview action highlights the "usual suspects" and on a save action, asks for a confirmation if there are "suspects". Even better if "suspects" could be selected/loaded per user (or also work would be nice).
As I see it, in a project like PSM or POTM this would be of great help. I scanned PSM and found a lot of minor mistakes undetected that could have been intercepted by this.
For expert proofreaders with their own set of tools, this might be superfluous, for beginners it might be an help.— Mpaa (talk) 20:38, 18 January 2016 (UTC)
Thank you for replying. Well I have an 80% solution (and yes I am well aware that completing the final 20% of any project takes 200% of the time or thereabouts!)
Before proceeding I hope I am repeating the obvious when I caution you against executing un-trusted code with bureaucrat privileges. I have done my best to write safe code but lay no claim to being a security expert.
So in case you take the prudent path, but still want to see what this thing actually does, installing User:AuFCL/common.js/typoscan.js and then opening Page:Folk-lore - A Quarterly Review. Volume 10, 1899.djvu/219 yields File:St George screen capture.png.
Now for the hurdles/drawbacks etc.:
  1. It performs this mark-up at all times except for an embarrassingly long and apparently ever-growing number of exception cases, instead of only when "Save" is selected upon "Preview". Implementing this would solve a lot of problems but introduces ones when false-positives are flagged and/or the user wants to force a save in any case.
  2. I don't know the jQuery library well and as a consequence I suspect I have reinvented a lot of wheels a purist might improve mightily upon.
  3. Although I have attempted to make this as gadget-ready as I am able to it is clearly not there yet and needs to be formally (somehow?) split three ways:
    1. Stable code
    2. Shared scan-set
    3. User-customisable scan set
  4. The actual highlighting code might be done better using an entirely different technique.
  5. Some means of gracefully handling "false positives" needs to be added (entirely missing here.)
  6. So far as possible the temptation to turn this into a spell-checker needs to be resisted as that is a function better handled by other utilities.
In short this is only really a proof of concept, and I am not pretending it is in any way of means ready for prime-time. At best it is a stop-gap until another alternative steps forward. AuFCL (talk) 03:00, 19 January 2016 (UTC)
Hi. At least someone gave it a try ... :-) I hope that this could be useful and someone will "productify" it. I will try it and give you feedbacks. Right now actually I do most of my checks using pywikibot. An idea could be to advertise it through POTM.
As I said, I do not have enough skills in this area to attempt to carry on the development, otherwise I would be glad to help you.— Mpaa (talk) 21:16, 19 January 2016 (UTC)
You got it in one. I lack the skills to do much better than this but at least it is as you say "a try" and I sincerely hope it might prompt somebody with better skills than my own to improve/supplant it. And at least doing this much has made me better appreciate features that a fuller solution might involve. AuFCL (talk) 06:01, 20 January 2016 (UTC)

Typo corrections[edit]

Hi and thanks for all your corrections as you hover over me like the angel of typos. I recommend that you install AuFCL's script because it makes noticing typos much easier. This is my version, and it is installed as a subfolder of common.js for the time being, until it's debugged. It is called by the following code placed in the common.js:


My version distinguishes [] square braces from {} curly braces by different color highlights. The procedure highlights typos in both the page, and the main namespace where I use it. In the main ns, it's easier to scroll through an article. When a typo is found, I open the page where its also highlighted. The script highlights most, if not all typos, but spell check is another matter. The most common spelling errors caused by poor scanning are already flagged by the scripts. But there are other words which the scan alters the meaning, but it is still a valid English word, and these are problematic but few and far in between.

Finally, I found that my most common errors overlooked, are the curly brace "{" instead of a parenthesis "(", followed by the caret "^". Volumes 1 to 5 were riddled with typos and errors, but this diminished gradually to almost none.  — Ineuw talk 03:27, 1 February 2016 (UTC)

Thanks. I prefer to use pywikibot directly. It also open a page in the browser in case of need. I am scanning PSM against most commons errors. E.g. one of the reason why "{" instead of a parenthesis "(" are diminishing might be that I scanned against them :-) ? (see I noticed that the typo are very "volume-dependent". Now I am addressing ^.--Mpaa (talk) 19:38, 1 February 2016 (UTC)
WOW! At least you can say that I am consistent in my errors and their oversight. — Ineuw talk 18:26, 2 February 2016 (UTC)

Typos of PSM[edit]

Greetings and salutations, or in another word, Hi.

Thanks for checking and correcting typos in my favourite project. Would you consider to check some articles randomly in volume 44, and let me know your findings? I ask because it would help me to further analyze AuFCL's script, and ask him to make modifications if need be. I recently proofread and scanned V44 for typos, before listing it for validation.

Once AuFCL's script was running, I undertook to check every volume (currently checking volume 16), from the main namespace because it's easy to scroll through the articles. The results (to me) were interesting because it made me realize how many typos the proofreaders (meaning me), and the validators overlooked. So far I completed typo check of volumes 1 to 15, but spell check is another matter. Did fix a number of spelling errors, seen while fixing typos, but I am sure that there are many more. — Ineuw talk 20:57, 14 February 2016 (UTC)

I wouldn't mind to help. If you have a list of the most common errors, or regexes to detect them, send it to me. It would me very quick for me to scan against it.— Mpaa (talk) 21:14, 18 February 2016 (UTC)
My sincere apologies for the confusion and misunderstanding I caused. I am doing fine with AuFCL't tool and don't need help. Would never ask anyone to undertake such a task, since I feel that it's my "responsibility." I meant, whether you would be inclined to randomly check PSM V44 with your pywiki tools to see if I missed anything, and whether there is room for improvement. — Ineuw talk 03:42, 19 February 2016 (UTC)
Without knowing what to look for, pywiki tools are lost ... they need hints or clues.— Mpaa (talk) 17:47, 19 February 2016 (UTC)

{{...}} and ellipsis[edit]

Hi. I see, after doing some replacements, that your bot was putting in Template:... rather than … I am not sure whether there was a request for that, or what, as I wasn't aware that is was a practice that we were looking to undertake a stylised look-alike, rather than utilise the respective character. — billinghurst sDrewth 04:41, 15 February 2016 (UTC)

I didn't start to use it, but after it became the de-facto standard for this work, I took care to align everything with it. See Index_talk:Hunger_(Hamsun).djvu.— Mpaa (talk) 18:50, 15 February 2016 (UTC)
I can align to … if it is preferred.— Mpaa (talk) 20:07, 17 February 2016 (UTC)

Author:Thomas Adams[edit]

will be likely to stand for w:Thomas Adam and therefore should loose some "s"? -- Gymel (talk) 15:19, 8 March 2016 (UTC)

Good catch! There are a few "typos" in the printed book itself, one of which is this, as well as alphabetic order of names. This definitely should be fixed. (Missed the fact that there wasn't an "s". Sorry.) Humbug26 (talk) 16:38, 8 March 2016 (UTC)

Need help in Bengali Wikisource[edit]

Hi I need help for Bengali Wikisource. We had one proofread completed Index file with 2 missing page (200 page book) . Now We found the scan book with full page and uploaded to commons. The missing pages were 12 and 13. So now total pages are 202. Now I need to shift/move 182 page to proper page. ( as like 200-->202, 199-->201, 198-->200 page.) Are there any pywiki script/tool? Because from the history I have seen that similar kind of work you have done here.Thanks in advance for help.Jayantanth (talk) 17:07, 8 March 2016 (UTC)

Moving the pages in Page ns is not an issue. I need the Index filename and the definition of what need to move where. Better to uses pages referred to the DjVu file, not the book page numbering. So something like: Dnnn-Dmmm -> Dnnn+2-Dmmm+2. The tricky part is if you have already transcluded the work, I need to revamp an old script of mine there.— Mpaa (talk) 19:42, 8 March 2016 (UTC)
Thanks Mpaa, for reply. I shall share the all problematic file. I just wanted to know how do you do that. Are there any AWB custom module or script? If have, could you please share this? Jayantanth (talk) 16:33, 9 March 2016 (UTC)
I use pywikibot scripts.— Mpaa (talk) 18:31, 9 March 2016 (UTC)

Bot is breaking index pages[edit]

Oh dear. Your bot is breaking pages. Hesperian 02:05, 24 March 2016 (UTC)

Any page where the table of contents field begins with a table is now broken, because the brace that initiates a table must be at the start of the line. Hesperian 02:11, 24 March 2016 (UTC)
Fixed them (hopefully). I think they were in the order of 15-20. Thanks for spotting that. Let me know in case I did not find all of them— Mpaa (talk) 19:14, 24 March 2016 (UTC)
Hi. This may or not relate to "breaking" but I have these three pages which show up on the index page as not proofread. Would you please tell me how to correct them (teach me how to fish)?
Page:Popular Science Monthly Volume 45.djvu/800
Page:Popular Science Monthly Volume 45.djvu/823
Page:Popular Science Monthly Volume 45.djvu/827
Thanks. — Ineuw talk 08:41, 25 March 2016 (UTC)
They are fine for me. Maybe usual caching ... (purge, null edit, etc. already tried I guess ...?).— Mpaa (talk) 16:58, 25 March 2016 (UTC)
Did it all, of course by this hour it's OK. Have you purged it today? Or, perhaps Mediawiki purged the cache?. — Ineuw talk 00:35, 26 March 2016 (UTC)
@Ineuw: Might I venture a theory? By modifying this it is possible you loaded up the job queue with requests (i.e. for each and every of the roughly 10,000 pages affected) which had to be processed before your purge was able to be acted upon? AuFCL (talk) 02:11, 26 March 2016 (UTC)
Moved topic to my talk page. — Ineuw talk 03:56, 26 March 2016 (UTC)

Maybe I'm misunderstanding this, but does look right to you? Dubliners was transcluded, I think. Outlier59 (talk) 23:44, 25 March 2016 (UTC)

Page:1917 Dubliners by James Joyce.djvu/5 was not marked with Category: Not transcluded when the bot made the edit, AuFCL marked it after the bot edit.— Mpaa (talk) 07:58, 26 March 2016 (UTC)
Pardon if I only confused matters further. I was unsure whether this was the correct thing to do; let alone retrospectively after MpaaBot had made its pass. The checker display does not appear to change, so I am not sure how useful an activity this is at this point in time in any case. AuFCL (talk) 08:07, 26 March 2016 (UTC)
You made the right tagging. The idea is to check that everything that needs to be transcluded is transcluded, and what is not, is done on purpose (this is expressed by assigning the page to 'Not transcluded').— Mpaa (talk) 08:10, 26 March 2016 (UTC)
Actually, even if it were, I am breaking down the work in small steps, and for now I only accept 'Without text' pages as acceptable Not trascluded pages. These cases will be handled later on. Feel free to mark it as transcluded=yes.— Mpaa (talk) 07:58, 26 March 2016 (UTC)

Filling Pages with OCR text[edit]

Hi. You have a script that might be able to help with something I posted on the help section of Scriptorium?.

Much appreciated if you took a look. ShakespeareFan00 (talk) 17:38, 21 April 2016 (UTC)

I am afraid my script cannot help. It just fetch the text 'as is' from the pdf file.— Mpaa (talk) 18:30, 21 April 2016 (UTC)
OK do you know of a different script that may help? I've not got very far so I don't mind loosing the few odd pages I've done if you are able to extract direct from the PDF. ShakespeareFan00 (talk) 18:32, 21 April 2016 (UTC)
No, what I can do is just save the page with the same text you find when clicking on the redlink.— Mpaa (talk) 20:29, 21 April 2016 (UTC)

Index transcluded tags[edit]

Hi, I noticed that the MpaaBot inserted the tag {{index transcluded|transcluded=no}} HERE and maybe on others. FYI, all PSM indexes between volumes 1 to 87 have been transcluded to the main namespace. Just thought to bring to your attention. — Ineuw talk 05:40, 4 May 2016 (UTC)

@Ineuw: I asked for this to be done as I am slowly working through the validated and proofread works to check the transclusion status. For all sorts of works there have been errors and omissions in transclusions, and PSM has been better though not perfect in that regard. It is a slower maintenance task as we identify complete works that have not been transcluded, or missing pages, or purposefully marking pages that will not be transcluded. — billinghurst sDrewth 07:11, 4 May 2016 (UTC)
And technically, that is true after the checks that were done by the recent run as MpaaBot. This page Page:Popular Science Monthly Volume 40.djvu/742 is not transcluded (or marked as Not Trancluded), so the tag is correct.— Mpaa (talk) 18:12, 4 May 2016 (UTC)
Knowledge liberates, so thanks for the explanation. Should have known that there is a reason. :-) — Ineuw talk 06:31, 5 May 2016 (UTC)
@Mpaa: Found where two images were supposed to be, and inserted them in their respective main namespace articles. When the tagging of PSM is completed, please let me know and I will deal with them.
Up to vol 87 is OK. The only miing is Index:Popular Science Monthly Volume 26.djvu, due to the TOC, which I didn't know how to handle.— Mpaa (talk) 18:29, 5 May 2016 (UTC)
Sorry, but I don't understand the TOC problem. Every Volume is laid out the same. However, they are inverse transclusions — that is, they are defined in the Main namespace and then referred to by the Index pages. I also compared Index 26 to other Index pages and they are lad out identically. — Ineuw talk 21:52, 5 May 2016 (UTC)
See at the end of the index.— Mpaa (talk) 06:06, 6 May 2016 (UTC)
I marked them as without text, hoping this satisfies you. If I was wrong, so be it, and anyone who wishes to proofread them, they are welcome to do so.

At the beginning of this undertaking, I searched for TOC's, and found some volumes which had a page or two, but most none, so I designed my own TOC's. I wasn't going to duplicate them. Currently, I have another stored Vol 26 .JP2 copy downloaded from IA, which has no advertisements and no TOC's. About the advertisements, in later volumes one can find hundreds of duplicated ads. Those that were possible to clean up I cleaned and uploaded them. — Ineuw talk 19:55, 6 May 2016 (UTC)