User talk:Mpaa

From Wikisource
Jump to: navigation, search

(Archives index, Last archive) Welcome

Hello, Mpaa, and welcome to Wikisource! Thank you for joining the project. I hope you like the place and decide to stay. Here are a few good links for newcomers:

Carl Spitzweg 021-detail.jpg

You may be interested in participating in

Add the code {{active projects}}, {{PotM}} or {{CotW}} to your page for current wikisource projects.

You can put a brief description of your interests on your user page and contributions to another Wikimedia project, such as Wikipedia and Commons.

Have questions? Then please ask them at either


I hope you enjoy contributing to Wikisource, the library that is free for everyone to use! In discussions, please "sign" your comments using four tildes (~~~~); this will automatically produce your IP address (or username if you're logged in) and the date. If you need help, ask me on my talk page, or ask your question here (click edit) and place {{helpme}} before your question.

Again, welcome! — billinghurst sDrewth 12:00, 7 April 2011 (UTC)


Filling Pages with OCR text[edit]

Hi. You have a script that might be able to help with something I posted on the help section of Scriptorium?.

Much appreciated if you took a look. ShakespeareFan00 (talk) 17:38, 21 April 2016 (UTC)

I am afraid my script cannot help. It just fetch the text 'as is' from the pdf file.— Mpaa (talk) 18:30, 21 April 2016 (UTC)
OK do you know of a different script that may help? I've not got very far so I don't mind loosing the few odd pages I've done if you are able to extract direct from the PDF. ShakespeareFan00 (talk) 18:32, 21 April 2016 (UTC)
No, what I can do is just save the page with the same text you find when clicking on the redlink.— Mpaa (talk) 20:29, 21 April 2016 (UTC)

Index transcluded tags[edit]

Hi, I noticed that the MpaaBot inserted the tag {{index transcluded|transcluded=no}} HERE and maybe on others. FYI, all PSM indexes between volumes 1 to 87 have been transcluded to the main namespace. Just thought to bring to your attention. — Ineuw talk 05:40, 4 May 2016 (UTC)

@Ineuw: I asked for this to be done as I am slowly working through the validated and proofread works to check the transclusion status. For all sorts of works there have been errors and omissions in transclusions, and PSM has been better though not perfect in that regard. It is a slower maintenance task as we identify complete works that have not been transcluded, or missing pages, or purposefully marking pages that will not be transcluded. — billinghurst sDrewth 07:11, 4 May 2016 (UTC)
And technically, that is true after the checks that were done by the recent run as MpaaBot. This page Page:Popular Science Monthly Volume 40.djvu/742 is not transcluded (or marked as Not Trancluded), so the tag is correct.— Mpaa (talk) 18:12, 4 May 2016 (UTC)
Knowledge liberates, so thanks for the explanation. Should have known that there is a reason. :-) — Ineuw talk 06:31, 5 May 2016 (UTC)
@Mpaa: Found where two images were supposed to be, and inserted them in their respective main namespace articles. When the tagging of PSM is completed, please let me know and I will deal with them.
Up to vol 87 is OK. The only miing is Index:Popular Science Monthly Volume 26.djvu, due to the TOC, which I didn't know how to handle.— Mpaa (talk) 18:29, 5 May 2016 (UTC)
Sorry, but I don't understand the TOC problem. Every Volume is laid out the same. However, they are inverse transclusions — that is, they are defined in the Main namespace and then referred to by the Index pages. I also compared Index 26 to other Index pages and they are lad out identically. — Ineuw talk 21:52, 5 May 2016 (UTC)
See at the end of the index.— Mpaa (talk) 06:06, 6 May 2016 (UTC)
I marked them as without text, hoping this satisfies you. If I was wrong, so be it, and anyone who wishes to proofread them, they are welcome to do so.

At the beginning of this undertaking, I searched for TOC's, and found some volumes which had a page or two, but most none, so I designed my own TOC's. I wasn't going to duplicate them. Currently, I have another stored Vol 26 .JP2 copy downloaded from IA, which has no advertisements and no TOC's. About the advertisements, in later volumes one can find hundreds of duplicated ads. Those that were possible to clean up I cleaned and uploaded them. — Ineuw talk 19:55, 6 May 2016 (UTC)

Is this still needed: Popular Science Monthly/Volume 26/Advertisements? — Mpaa (talk) 10:49, 7 May 2016 (UTC)

Mass populate[edit]

I note pp. pp. 315-334 of Index:UK Traffic Signs Manual - Chapter 8 - Part 1 (Traffic Safety Measures and Signs for Road). Designs 2009.pdf seem to be the same index as pp, 210-299 of Index:UK Traffic Signs Manual - Chapter 8 - Part 2- Traffic Safety Measures and Signs for Road Works and Temporary Situations) - Operations 2009.pdf

I.e : Page:UK Traffic Signs Manual - Chapter 8 - Part 1 (Traffic Safety Measures and Signs for Road). Designs 2009.pdf/315 is identical to (expect page numbers}}

Page:UK Traffic Signs Manual - Chapter 8 - Part 2- Traffic Safety Measures and Signs for Road Works and Temporary Situations) - Operations 2009.pdf/212 and so on from there...

Any chance of a semi-automated populate from the former to the latter? Thanks ShakespeareFan00 (talk) 00:09, 17 May 2016 (UTC)

I guess you have already done it. Or I am lost ...— Mpaa (talk) 19:21, 18 May 2016 (UTC)
Yes , already done. ShakespeareFan00 (talk) 18:09, 19 May 2016 (UTC)
However I did have another pagepopulation issue,

namely User:ShakespeareFan00/Sandbox/TSGRD2016... I'd already manually put some of the pages into the relevannt postions but it looks like ti could be done by some automated process..18:09, 19 May 2016 (UTC)

Pre-editing and the preparation of PSM pages[edit]

Hi,

Please proofread ten or so pages of PSM, which you edited/prepared for proofreading. Please select pages that contain hyphenated words and references, and then tell me if it was worth doing what you did. — Ineuw talk 20:21, 27 July 2016 (UTC)

Sorry, I lost you ... Did not even get if I did something good or bad ... I guess bad ... If you give some example, that might help, so I can take a look and possibly learn. And it would be fair to weight some possible unlucky cases vs. the positive ones. I am pretty sure the benefits will win.— Mpaa (talk) 21:12, 28 July 2016 (UTC)
Much appreciate you efforts to help with the PSM pages by cleaning the � characters, by adding the page headers, and indicating {{smallrefs}} in the footer. However, merging hyphenated words at the end of a row, is forcing me to proofread according to your method which is different from the methods developed by proofreading thousands of pages, and it slows me down.
Merging hyphenated words is incorrect because it shifts the text of the following line and I use the original to locate words in the text, akin to an X - Y coordinate.
I leave merging of hyphenated words to line wrapping after proofreading. Line wrapping by my Autohotkey macro, or pathoschild's proofreading script, identify end of line hyphenation. Since some words must remain hyphenated, I add a hyphen to the beginning of the second segment before line wrapping.
I don't bother with the reference tags until I proofread the page sequentially line by line, and this includes the references at the end of the page. Only then are the tags applied, and moved to where they belong.
The placement of <ref>Footnotes</ref> in the text is another time consuming and confusing issue. I have to erase the word "Footnote" and restore the * because it's easier to notice in the text. Your system identifies only the first footnote. If there is more than one, then the remainder need to be tagged which means that two different methods of identifying footnotes are used. — Ineuw talk 21:06, 29 July 2016 (UTC)
Then you should have said "Please proofread ten or so pages of PSM,according to my current method, and ...". How am I supposed to know what you have or will have in mind as way of working to proofread now or in the future? Some of those volumes where done time ago. I will stop doing that, but I guess there are not many volumes left untouched.— Mpaa (talk) 22:24, 29 July 2016 (UTC)
I can fix the <ref>Footnotes</ref>. Just state what you want instead.— Mpaa (talk) 23:00, 29 July 2016 (UTC)
Don't waste your time, it's not worth correcting it. I wrote this in case you were planning to continue cleaning. Didn't check how far you got with the cleanup. As for proofreading some of your corrected pages, you can still do it for your future reference, as to what not to do. — Ineuw talk 04:09, 30 July 2016 (UTC)
Unfortunately, your script made other unacceptable errors. It deleted commas "," and periods ".", when they were before a "quotation mark". It also split end of line words which ended with "!", "?" and "'s" and forced these to the following row. I wish that you consider resetting the pages to the Thomasbot original. There are just way too many unnecessary errors to correct and are often missed. Just compare the originals to not-proofread pages. — Ineuw talk 21:42, 31 July 2016 (UTC)
Can you give some examples? So I can see what happened. I can restore it, it will take a while, even if I strongly discourage it as one needs to weigh pro and cons. How many errors out of how many fixes?— Mpaa (talk) 21:57, 31 July 2016 (UTC)
Nevermind, it does not really matter. Previous text should be there for Vol. 53. Let me know in case you want also the others back.— Mpaa (talk) 22:51, 31 July 2016 (UTC)

Rolland Life of Tolstoy ellipses[edit]

Please pardon my tardy reaction.

Regarding this edit should I avoid use of &hellip; altogether in this work, or is the criticism limited to that one page? (I notice the style of this publication is to use either three or four full-stops in a series and perhaps leaving them as separate characters is the safest choice in any case.)

I am currently aware of four other pages I have (possibly?) incorrectly changed and would like to know your view before changing them all back to separate dots or otherwise proceeding with the other pages. AuFCL (talk) 23:49, 3 August 2016 (UTC)

Hi. Yes, the style is three or four dots as suspension, but do not bother. I will make an alignement pass at the end of the work. It is very fast for me.— Mpaa (talk) 06:08, 4 August 2016 (UTC)
I just wanted to make sure I did not make the situation worse. (I was mainly working through reducing Category:Empty ref tag and proof/validating more as a side-task, so had not picked up on the "house style.") AuFCL (talk) 06:30, 4 August 2016 (UTC) Thank you for your indulgence. I have finished fooling around with Rolland'Tolstoy per my original intent, and await your review and corrections. (This came across far more cynically than I had intended at the point of composition. No offence intended—I plead tiredness!) AuFCL (talk) 11:13, 4 August 2016 (UTC)
No problems, hope the missing ref tags will disappear quickly.— Mpaa (talk) 19:15, 4 August 2016 (UTC)

Follow-up on Fill pages with OCR from PDF[edit]

Hi Mpaa, maybe you remember the mentioned discussion. As a follow-up question i want you to ask if or how it is possible to create pages with providing only the pure text; header and footer should be automatically generated as it is done when a user manually creates a page, e. g. [1]. Btw, how does wikisource.py generate headers and footers? Thank you, --Aschroet (talk) 08:42, 18 August 2016 (UTC)

If you set header and footer via Index page, preload should load them.— Mpaa (talk) 18:24, 24 August 2016 (UTC)