User talk:Mpaa

From Wikisource
Jump to: navigation, search

(Archives index, Last archive) Welcome

Hello, Mpaa, and welcome to Wikisource! Thank you for joining the project. I hope you like the place and decide to stay. Here are a few good links for newcomers:

Carl Spitzweg 021-detail.jpg

You may be interested in participating in

Add the code {{active projects}}, {{PotM}} or {{CotW}} to your page for current wikisource projects.

You can put a brief description of your interests on your user page and contributions to another Wikimedia project, such as Wikipedia and Commons.

Have questions? Then please ask them at either

I hope you enjoy contributing to Wikisource, the library that is free for everyone to use! In discussions, please "sign" your comments using four tildes (~~~~); this will automatically produce your IP address (or username if you're logged in) and the date. If you need help, ask me on my talk page, or ask your question here (click edit) and place {{helpme}} before your question.

Again, welcome! — billinghurst sDrewth 12:00, 7 April 2011 (UTC)

Typo corrections[edit]

Hi and thanks for all your corrections as you hover over me like the angel of typos. I recommend that you install AuFCL's script because it makes noticing typos much easier. This is my version, and it is installed as a subfolder of common.js for the time being, until it's debugged. It is called by the following code placed in the common.js:


My version distinguishes [] square braces from {} curly braces by different color highlights. The procedure highlights typos in both the page, and the main namespace where I use it. In the main ns, it's easier to scroll through an article. When a typo is found, I open the page where its also highlighted. The script highlights most, if not all typos, but spell check is another matter. The most common spelling errors caused by poor scanning are already flagged by the scripts. But there are other words which the scan alters the meaning, but it is still a valid English word, and these are problematic but few and far in between.

Finally, I found that my most common errors overlooked, are the curly brace "{" instead of a parenthesis "(", followed by the caret "^". Volumes 1 to 5 were riddled with typos and errors, but this diminished gradually to almost none.  — Ineuw talk 03:27, 1 February 2016 (UTC)

Thanks. I prefer to use pywikibot directly. It also open a page in the browser in case of need. I am scanning PSM against most commons errors. E.g. one of the reason why "{" instead of a parenthesis "(" are diminishing might be that I scanned against them :-) ? (see I noticed that the typo are very "volume-dependent". Now I am addressing ^.--Mpaa (talk) 19:38, 1 February 2016 (UTC)
WOW! At least you can say that I am consistent in my errors and their oversight. — Ineuw talk 18:26, 2 February 2016 (UTC)

Typos of PSM[edit]

Greetings and salutations, or in another word, Hi.

Thanks for checking and correcting typos in my favourite project. Would you consider to check some articles randomly in volume 44, and let me know your findings? I ask because it would help me to further analyze AuFCL's script, and ask him to make modifications if need be. I recently proofread and scanned V44 for typos, before listing it for validation.

Once AuFCL's script was running, I undertook to check every volume (currently checking volume 16), from the main namespace because it's easy to scroll through the articles. The results (to me) were interesting because it made me realize how many typos the proofreaders (meaning me), and the validators overlooked. So far I completed typo check of volumes 1 to 15, but spell check is another matter. Did fix a number of spelling errors, seen while fixing typos, but I am sure that there are many more. — Ineuw talk 20:57, 14 February 2016 (UTC)

I wouldn't mind to help. If you have a list of the most common errors, or regexes to detect them, send it to me. It would me very quick for me to scan against it.— Mpaa (talk) 21:14, 18 February 2016 (UTC)
My sincere apologies for the confusion and misunderstanding I caused. I am doing fine with AuFCL't tool and don't need help. Would never ask anyone to undertake such a task, since I feel that it's my "responsibility." I meant, whether you would be inclined to randomly check PSM V44 with your pywiki tools to see if I missed anything, and whether there is room for improvement. — Ineuw talk 03:42, 19 February 2016 (UTC)
Without knowing what to look for, pywiki tools are lost ... they need hints or clues.— Mpaa (talk) 17:47, 19 February 2016 (UTC)

{{...}} and ellipsis[edit]

Hi. I see, after doing some replacements, that your bot was putting in Template:... rather than … I am not sure whether there was a request for that, or what, as I wasn't aware that is was a practice that we were looking to undertake a stylised look-alike, rather than utilise the respective character. — billinghurst sDrewth 04:41, 15 February 2016 (UTC)

I didn't start to use it, but after it became the de-facto standard for this work, I took care to align everything with it. See Index_talk:Hunger_(Hamsun).djvu.— Mpaa (talk) 18:50, 15 February 2016 (UTC)
I can align to … if it is preferred.— Mpaa (talk) 20:07, 17 February 2016 (UTC)

Author:Thomas Adams[edit]

will be likely to stand for w:Thomas Adam and therefore should loose some "s"? -- Gymel (talk) 15:19, 8 March 2016 (UTC)

Good catch! There are a few "typos" in the printed book itself, one of which is this, as well as alphabetic order of names. This definitely should be fixed. (Missed the fact that there wasn't an "s". Sorry.) Humbug26 (talk) 16:38, 8 March 2016 (UTC)

Need help in Bengali Wikisource[edit]

Hi I need help for Bengali Wikisource. We had one proofread completed Index file with 2 missing page (200 page book) . Now We found the scan book with full page and uploaded to commons. The missing pages were 12 and 13. So now total pages are 202. Now I need to shift/move 182 page to proper page. ( as like 200-->202, 199-->201, 198-->200 page.) Are there any pywiki script/tool? Because from the history I have seen that similar kind of work you have done here.Thanks in advance for help.Jayantanth (talk) 17:07, 8 March 2016 (UTC)

Moving the pages in Page ns is not an issue. I need the Index filename and the definition of what need to move where. Better to uses pages referred to the DjVu file, not the book page numbering. So something like: Dnnn-Dmmm -> Dnnn+2-Dmmm+2. The tricky part is if you have already transcluded the work, I need to revamp an old script of mine there.— Mpaa (talk) 19:42, 8 March 2016 (UTC)
Thanks Mpaa, for reply. I shall share the all problematic file. I just wanted to know how do you do that. Are there any AWB custom module or script? If have, could you please share this? Jayantanth (talk) 16:33, 9 March 2016 (UTC)
I use pywikibot scripts.— Mpaa (talk) 18:31, 9 March 2016 (UTC)

Bot is breaking index pages[edit]

Oh dear. Your bot is breaking pages. Hesperian 02:05, 24 March 2016 (UTC)

Any page where the table of contents field begins with a table is now broken, because the brace that initiates a table must be at the start of the line. Hesperian 02:11, 24 March 2016 (UTC)
Fixed them (hopefully). I think they were in the order of 15-20. Thanks for spotting that. Let me know in case I did not find all of them— Mpaa (talk) 19:14, 24 March 2016 (UTC)
Hi. This may or not relate to "breaking" but I have these three pages which show up on the index page as not proofread. Would you please tell me how to correct them (teach me how to fish)?
Page:Popular Science Monthly Volume 45.djvu/800
Page:Popular Science Monthly Volume 45.djvu/823
Page:Popular Science Monthly Volume 45.djvu/827
Thanks. — Ineuw talk 08:41, 25 March 2016 (UTC)
They are fine for me. Maybe usual caching ... (purge, null edit, etc. already tried I guess ...?).— Mpaa (talk) 16:58, 25 March 2016 (UTC)
Did it all, of course by this hour it's OK. Have you purged it today? Or, perhaps Mediawiki purged the cache?. — Ineuw talk 00:35, 26 March 2016 (UTC)
@Ineuw: Might I venture a theory? By modifying this it is possible you loaded up the job queue with requests (i.e. for each and every of the roughly 10,000 pages affected) which had to be processed before your purge was able to be acted upon? AuFCL (talk) 02:11, 26 March 2016 (UTC)
Moved topic to my talk page. — Ineuw talk 03:56, 26 March 2016 (UTC)

Maybe I'm misunderstanding this, but does look right to you? Dubliners was transcluded, I think. Outlier59 (talk) 23:44, 25 March 2016 (UTC)

Page:1917 Dubliners by James Joyce.djvu/5 was not marked with Category: Not transcluded when the bot made the edit, AuFCL marked it after the bot edit.— Mpaa (talk) 07:58, 26 March 2016 (UTC)
Pardon if I only confused matters further. I was unsure whether this was the correct thing to do; let alone retrospectively after MpaaBot had made its pass. The checker display does not appear to change, so I am not sure how useful an activity this is at this point in time in any case. AuFCL (talk) 08:07, 26 March 2016 (UTC)
You made the right tagging. The idea is to check that everything that needs to be transcluded is transcluded, and what is not, is done on purpose (this is expressed by assigning the page to 'Not transcluded').— Mpaa (talk) 08:10, 26 March 2016 (UTC)
Actually, even if it were, I am breaking down the work in small steps, and for now I only accept 'Without text' pages as acceptable Not trascluded pages. These cases will be handled later on. Feel free to mark it as transcluded=yes.— Mpaa (talk) 07:58, 26 March 2016 (UTC)

Filling Pages with OCR text[edit]

Hi. You have a script that might be able to help with something I posted on the help section of Scriptorium?.

Much appreciated if you took a look. ShakespeareFan00 (talk) 17:38, 21 April 2016 (UTC)

I am afraid my script cannot help. It just fetch the text 'as is' from the pdf file.— Mpaa (talk) 18:30, 21 April 2016 (UTC)
OK do you know of a different script that may help? I've not got very far so I don't mind loosing the few odd pages I've done if you are able to extract direct from the PDF. ShakespeareFan00 (talk) 18:32, 21 April 2016 (UTC)
No, what I can do is just save the page with the same text you find when clicking on the redlink.— Mpaa (talk) 20:29, 21 April 2016 (UTC)

Index transcluded tags[edit]

Hi, I noticed that the MpaaBot inserted the tag {{index transcluded|transcluded=no}} HERE and maybe on others. FYI, all PSM indexes between volumes 1 to 87 have been transcluded to the main namespace. Just thought to bring to your attention. — Ineuw talk 05:40, 4 May 2016 (UTC)

@Ineuw: I asked for this to be done as I am slowly working through the validated and proofread works to check the transclusion status. For all sorts of works there have been errors and omissions in transclusions, and PSM has been better though not perfect in that regard. It is a slower maintenance task as we identify complete works that have not been transcluded, or missing pages, or purposefully marking pages that will not be transcluded. — billinghurst sDrewth 07:11, 4 May 2016 (UTC)
And technically, that is true after the checks that were done by the recent run as MpaaBot. This page Page:Popular Science Monthly Volume 40.djvu/742 is not transcluded (or marked as Not Trancluded), so the tag is correct.— Mpaa (talk) 18:12, 4 May 2016 (UTC)
Knowledge liberates, so thanks for the explanation. Should have known that there is a reason. :-) — Ineuw talk 06:31, 5 May 2016 (UTC)
@Mpaa: Found where two images were supposed to be, and inserted them in their respective main namespace articles. When the tagging of PSM is completed, please let me know and I will deal with them.
Up to vol 87 is OK. The only miing is Index:Popular Science Monthly Volume 26.djvu, due to the TOC, which I didn't know how to handle.— Mpaa (talk) 18:29, 5 May 2016 (UTC)
Sorry, but I don't understand the TOC problem. Every Volume is laid out the same. However, they are inverse transclusions — that is, they are defined in the Main namespace and then referred to by the Index pages. I also compared Index 26 to other Index pages and they are lad out identically. — Ineuw talk 21:52, 5 May 2016 (UTC)
See at the end of the index.— Mpaa (talk) 06:06, 6 May 2016 (UTC)
I marked them as without text, hoping this satisfies you. If I was wrong, so be it, and anyone who wishes to proofread them, they are welcome to do so.

At the beginning of this undertaking, I searched for TOC's, and found some volumes which had a page or two, but most none, so I designed my own TOC's. I wasn't going to duplicate them. Currently, I have another stored Vol 26 .JP2 copy downloaded from IA, which has no advertisements and no TOC's. About the advertisements, in later volumes one can find hundreds of duplicated ads. Those that were possible to clean up I cleaned and uploaded them. — Ineuw talk 19:55, 6 May 2016 (UTC)

Is this still needed: Popular Science Monthly/Volume 26/Advertisements? — Mpaa (talk) 10:49, 7 May 2016 (UTC)

Mass populate[edit]

I note pp. pp. 315-334 of Index:UK Traffic Signs Manual - Chapter 8 - Part 1 (Traffic Safety Measures and Signs for Road). Designs 2009.pdf seem to be the same index as pp, 210-299 of Index:UK Traffic Signs Manual - Chapter 8 - Part 2- Traffic Safety Measures and Signs for Road Works and Temporary Situations) - Operations 2009.pdf

I.e : Page:UK Traffic Signs Manual - Chapter 8 - Part 1 (Traffic Safety Measures and Signs for Road). Designs 2009.pdf/315 is identical to (expect page numbers}}

Page:UK Traffic Signs Manual - Chapter 8 - Part 2- Traffic Safety Measures and Signs for Road Works and Temporary Situations) - Operations 2009.pdf/212 and so on from there...

Any chance of a semi-automated populate from the former to the latter? Thanks ShakespeareFan00 (talk) 00:09, 17 May 2016 (UTC)

I guess you have already done it. Or I am lost ...— Mpaa (talk) 19:21, 18 May 2016 (UTC)
Yes , already done. ShakespeareFan00 (talk) 18:09, 19 May 2016 (UTC)
However I did have another pagepopulation issue,

namely User:ShakespeareFan00/Sandbox/TSGRD2016... I'd already manually put some of the pages into the relevannt postions but it looks like ti could be done by some automated process..18:09, 19 May 2016 (UTC)