User talk:Mpaa

From Wikisource
Jump to: navigation, search

(Archives index, Last archive) Welcome

Hello, Mpaa, and welcome to Wikisource! Thank you for joining the project. I hope you like the place and decide to stay. Here are a few good links for newcomers:

Carl Spitzweg 021-detail.jpg

You may be interested in participating in

Add the code {{active projects}}, {{PotM}} or {{CotW}} to your page for current wikisource projects.

You can put a brief description of your interests on your user page and contributions to another Wikimedia project, such as Wikipedia and Commons.

Have questions? Then please ask them at either

I hope you enjoy contributing to Wikisource, the library that is free for everyone to use! In discussions, please "sign" your comments using four tildes (~~~~); this will automatically produce your IP address (or username if you're logged in) and the date. If you need help, ask me on my talk page, or ask your question here (click edit) and place {{helpme}} before your question.

Again, welcome! — billinghurst sDrewth 12:00, 7 April 2011 (UTC)

Need help in Bengali Wikisource[edit]

Hi I need help for Bengali Wikisource. We had one proofread completed Index file with 2 missing page (200 page book) . Now We found the scan book with full page and uploaded to commons. The missing pages were 12 and 13. So now total pages are 202. Now I need to shift/move 182 page to proper page. ( as like 200-->202, 199-->201, 198-->200 page.) Are there any pywiki script/tool? Because from the history I have seen that similar kind of work you have done here.Thanks in advance for help.Jayantanth (talk) 17:07, 8 March 2016 (UTC)

Moving the pages in Page ns is not an issue. I need the Index filename and the definition of what need to move where. Better to uses pages referred to the DjVu file, not the book page numbering. So something like: Dnnn-Dmmm -> Dnnn+2-Dmmm+2. The tricky part is if you have already transcluded the work, I need to revamp an old script of mine there.— Mpaa (talk) 19:42, 8 March 2016 (UTC)
Thanks Mpaa, for reply. I shall share the all problematic file. I just wanted to know how do you do that. Are there any AWB custom module or script? If have, could you please share this? Jayantanth (talk) 16:33, 9 March 2016 (UTC)
I use pywikibot scripts.— Mpaa (talk) 18:31, 9 March 2016 (UTC)

Bot is breaking index pages[edit]

Oh dear. Your bot is breaking pages. Hesperian 02:05, 24 March 2016 (UTC)

Any page where the table of contents field begins with a table is now broken, because the brace that initiates a table must be at the start of the line. Hesperian 02:11, 24 March 2016 (UTC)
Fixed them (hopefully). I think they were in the order of 15-20. Thanks for spotting that. Let me know in case I did not find all of them— Mpaa (talk) 19:14, 24 March 2016 (UTC)
Hi. This may or not relate to "breaking" but I have these three pages which show up on the index page as not proofread. Would you please tell me how to correct them (teach me how to fish)?
Page:Popular Science Monthly Volume 45.djvu/800
Page:Popular Science Monthly Volume 45.djvu/823
Page:Popular Science Monthly Volume 45.djvu/827
Thanks. — Ineuw talk 08:41, 25 March 2016 (UTC)
They are fine for me. Maybe usual caching ... (purge, null edit, etc. already tried I guess ...?).— Mpaa (talk) 16:58, 25 March 2016 (UTC)
Did it all, of course by this hour it's OK. Have you purged it today? Or, perhaps Mediawiki purged the cache?. — Ineuw talk 00:35, 26 March 2016 (UTC)
@Ineuw: Might I venture a theory? By modifying this it is possible you loaded up the job queue with requests (i.e. for each and every of the roughly 10,000 pages affected) which had to be processed before your purge was able to be acted upon? AuFCL (talk) 02:11, 26 March 2016 (UTC)
Moved topic to my talk page. — Ineuw talk 03:56, 26 March 2016 (UTC)

Maybe I'm misunderstanding this, but does look right to you? Dubliners was transcluded, I think. Outlier59 (talk) 23:44, 25 March 2016 (UTC)

Page:1917 Dubliners by James Joyce.djvu/5 was not marked with Category: Not transcluded when the bot made the edit, AuFCL marked it after the bot edit.— Mpaa (talk) 07:58, 26 March 2016 (UTC)
Pardon if I only confused matters further. I was unsure whether this was the correct thing to do; let alone retrospectively after MpaaBot had made its pass. The checker display does not appear to change, so I am not sure how useful an activity this is at this point in time in any case. AuFCL (talk) 08:07, 26 March 2016 (UTC)
You made the right tagging. The idea is to check that everything that needs to be transcluded is transcluded, and what is not, is done on purpose (this is expressed by assigning the page to 'Not transcluded').— Mpaa (talk) 08:10, 26 March 2016 (UTC)
Actually, even if it were, I am breaking down the work in small steps, and for now I only accept 'Without text' pages as acceptable Not trascluded pages. These cases will be handled later on. Feel free to mark it as transcluded=yes.— Mpaa (talk) 07:58, 26 March 2016 (UTC)

Filling Pages with OCR text[edit]

Hi. You have a script that might be able to help with something I posted on the help section of Scriptorium?.

Much appreciated if you took a look. ShakespeareFan00 (talk) 17:38, 21 April 2016 (UTC)

I am afraid my script cannot help. It just fetch the text 'as is' from the pdf file.— Mpaa (talk) 18:30, 21 April 2016 (UTC)
OK do you know of a different script that may help? I've not got very far so I don't mind loosing the few odd pages I've done if you are able to extract direct from the PDF. ShakespeareFan00 (talk) 18:32, 21 April 2016 (UTC)
No, what I can do is just save the page with the same text you find when clicking on the redlink.— Mpaa (talk) 20:29, 21 April 2016 (UTC)

Index transcluded tags[edit]

Hi, I noticed that the MpaaBot inserted the tag {{index transcluded|transcluded=no}} HERE and maybe on others. FYI, all PSM indexes between volumes 1 to 87 have been transcluded to the main namespace. Just thought to bring to your attention. — Ineuw talk 05:40, 4 May 2016 (UTC)

@Ineuw: I asked for this to be done as I am slowly working through the validated and proofread works to check the transclusion status. For all sorts of works there have been errors and omissions in transclusions, and PSM has been better though not perfect in that regard. It is a slower maintenance task as we identify complete works that have not been transcluded, or missing pages, or purposefully marking pages that will not be transcluded. — billinghurst sDrewth 07:11, 4 May 2016 (UTC)
And technically, that is true after the checks that were done by the recent run as MpaaBot. This page Page:Popular Science Monthly Volume 40.djvu/742 is not transcluded (or marked as Not Trancluded), so the tag is correct.— Mpaa (talk) 18:12, 4 May 2016 (UTC)
Knowledge liberates, so thanks for the explanation. Should have known that there is a reason. :-) — Ineuw talk 06:31, 5 May 2016 (UTC)
@Mpaa: Found where two images were supposed to be, and inserted them in their respective main namespace articles. When the tagging of PSM is completed, please let me know and I will deal with them.
Up to vol 87 is OK. The only miing is Index:Popular Science Monthly Volume 26.djvu, due to the TOC, which I didn't know how to handle.— Mpaa (talk) 18:29, 5 May 2016 (UTC)
Sorry, but I don't understand the TOC problem. Every Volume is laid out the same. However, they are inverse transclusions — that is, they are defined in the Main namespace and then referred to by the Index pages. I also compared Index 26 to other Index pages and they are lad out identically. — Ineuw talk 21:52, 5 May 2016 (UTC)
See at the end of the index.— Mpaa (talk) 06:06, 6 May 2016 (UTC)
I marked them as without text, hoping this satisfies you. If I was wrong, so be it, and anyone who wishes to proofread them, they are welcome to do so.

At the beginning of this undertaking, I searched for TOC's, and found some volumes which had a page or two, but most none, so I designed my own TOC's. I wasn't going to duplicate them. Currently, I have another stored Vol 26 .JP2 copy downloaded from IA, which has no advertisements and no TOC's. About the advertisements, in later volumes one can find hundreds of duplicated ads. Those that were possible to clean up I cleaned and uploaded them. — Ineuw talk 19:55, 6 May 2016 (UTC)

Is this still needed: Popular Science Monthly/Volume 26/Advertisements? — Mpaa (talk) 10:49, 7 May 2016 (UTC)

Mass populate[edit]

I note pp. pp. 315-334 of Index:UK Traffic Signs Manual - Chapter 8 - Part 1 (Traffic Safety Measures and Signs for Road). Designs 2009.pdf seem to be the same index as pp, 210-299 of Index:UK Traffic Signs Manual - Chapter 8 - Part 2- Traffic Safety Measures and Signs for Road Works and Temporary Situations) - Operations 2009.pdf

I.e : Page:UK Traffic Signs Manual - Chapter 8 - Part 1 (Traffic Safety Measures and Signs for Road). Designs 2009.pdf/315 is identical to (expect page numbers}}

Page:UK Traffic Signs Manual - Chapter 8 - Part 2- Traffic Safety Measures and Signs for Road Works and Temporary Situations) - Operations 2009.pdf/212 and so on from there...

Any chance of a semi-automated populate from the former to the latter? Thanks ShakespeareFan00 (talk) 00:09, 17 May 2016 (UTC)

I guess you have already done it. Or I am lost ...— Mpaa (talk) 19:21, 18 May 2016 (UTC)
Yes , already done. ShakespeareFan00 (talk) 18:09, 19 May 2016 (UTC)
However I did have another pagepopulation issue,

namely User:ShakespeareFan00/Sandbox/TSGRD2016... I'd already manually put some of the pages into the relevannt postions but it looks like ti could be done by some automated process..18:09, 19 May 2016 (UTC)

Pre-editing and the preparation of PSM pages[edit]


Please proofread ten or so pages of PSM, which you edited/prepared for proofreading. Please select pages that contain hyphenated words and references, and then tell me if it was worth doing what you did. — Ineuw talk 20:21, 27 July 2016 (UTC)

Sorry, I lost you ... Did not even get if I did something good or bad ... I guess bad ... If you give some example, that might help, so I can take a look and possibly learn. And it would be fair to weight some possible unlucky cases vs. the positive ones. I am pretty sure the benefits will win.— Mpaa (talk) 21:12, 28 July 2016 (UTC)
Much appreciate you efforts to help with the PSM pages by cleaning the � characters, by adding the page headers, and indicating {{smallrefs}} in the footer. However, merging hyphenated words at the end of a row, is forcing me to proofread according to your method which is different from the methods developed by proofreading thousands of pages, and it slows me down.
Merging hyphenated words is incorrect because it shifts the text of the following line and I use the original to locate words in the text, akin to an X - Y coordinate.
I leave merging of hyphenated words to line wrapping after proofreading. Line wrapping by my Autohotkey macro, or pathoschild's proofreading script, identify end of line hyphenation. Since some words must remain hyphenated, I add a hyphen to the beginning of the second segment before line wrapping.
I don't bother with the reference tags until I proofread the page sequentially line by line, and this includes the references at the end of the page. Only then are the tags applied, and moved to where they belong.
The placement of <ref>Footnotes</ref> in the text is another time consuming and confusing issue. I have to erase the word "Footnote" and restore the * because it's easier to notice in the text. Your system identifies only the first footnote. If there is more than one, then the remainder need to be tagged which means that two different methods of identifying footnotes are used. — Ineuw talk 21:06, 29 July 2016 (UTC)
Then you should have said "Please proofread ten or so pages of PSM,according to my current method, and ...". How am I supposed to know what you have or will have in mind as way of working to proofread now or in the future? Some of those volumes where done time ago. I will stop doing that, but I guess there are not many volumes left untouched.— Mpaa (talk) 22:24, 29 July 2016 (UTC)
I can fix the <ref>Footnotes</ref>. Just state what you want instead.— Mpaa (talk) 23:00, 29 July 2016 (UTC)
Don't waste your time, it's not worth correcting it. I wrote this in case you were planning to continue cleaning. Didn't check how far you got with the cleanup. As for proofreading some of your corrected pages, you can still do it for your future reference, as to what not to do. — Ineuw talk 04:09, 30 July 2016 (UTC)