Pre-editing and the preparation of PSM pages[edit]


Please proofread ten or so pages of PSM, which you edited/prepared for proofreading. Please select pages that contain hyphenated words and references, and then tell me if it was worth doing what you did. — Ineuw talk 20:21, 27 July 2016 (UTC)

Sorry, I lost you ... Did not even get if I did something good or bad ... I guess bad ... If you give some example, that might help, so I can take a look and possibly learn. And it would be fair to weight some possible unlucky cases vs. the positive ones. I am pretty sure the benefits will win.— Mpaa (talk) 21:12, 28 July 2016 (UTC)
Much appreciate you efforts to help with the PSM pages by cleaning the � characters, by adding the page headers, and indicating {{smallrefs}} in the footer. However, merging hyphenated words at the end of a row, is forcing me to proofread according to your method which is different from the methods developed by proofreading thousands of pages, and it slows me down.
Merging hyphenated words is incorrect because it shifts the text of the following line and I use the original to locate words in the text, akin to an X - Y coordinate.
I leave merging of hyphenated words to line wrapping after proofreading. Line wrapping by my Autohotkey macro, or pathoschild's proofreading script, identify end of line hyphenation. Since some words must remain hyphenated, I add a hyphen to the beginning of the second segment before line wrapping.
I don't bother with the reference tags until I proofread the page sequentially line by line, and this includes the references at the end of the page. Only then are the tags applied, and moved to where they belong.
The placement of <ref>Footnotes</ref> in the text is another time consuming and confusing issue. I have to erase the word "Footnote" and restore the * because it's easier to notice in the text. Your system identifies only the first footnote. If there is more than one, then the remainder need to be tagged which means that two different methods of identifying footnotes are used. — Ineuw talk 21:06, 29 July 2016 (UTC)
Then you should have said "Please proofread ten or so pages of PSM,according to my current method, and ...". How am I supposed to know what you have or will have in mind as way of working to proofread now or in the future? Some of those volumes where done time ago. I will stop doing that, but I guess there are not many volumes left untouched.— Mpaa (talk) 22:24, 29 July 2016 (UTC)
I can fix the <ref>Footnotes</ref>. Just state what you want instead.— Mpaa (talk) 23:00, 29 July 2016 (UTC)
Don't waste your time, it's not worth correcting it. I wrote this in case you were planning to continue cleaning. Didn't check how far you got with the cleanup. As for proofreading some of your corrected pages, you can still do it for your future reference, as to what not to do. — Ineuw talk 04:09, 30 July 2016 (UTC)
Unfortunately, your script made other unacceptable errors. It deleted commas "," and periods ".", when they were before a "quotation mark". It also split end of line words which ended with "!", "?" and "'s" and forced these to the following row. I wish that you consider resetting the pages to the Thomasbot original. There are just way too many unnecessary errors to correct and are often missed. Just compare the originals to not-proofread pages. — Ineuw talk 21:42, 31 July 2016 (UTC)
Can you give some examples? So I can see what happened. I can restore it, it will take a while, even if I strongly discourage it as one needs to weigh pro and cons. How many errors out of how many fixes?— Mpaa (talk) 21:57, 31 July 2016 (UTC)
Nevermind, it does not really matter. Previous text should be there for Vol. 53. Let me know in case you want also the others back.— Mpaa (talk) 22:51, 31 July 2016 (UTC)

Rolland Life of Tolstoy ellipses[edit]

Please pardon my tardy reaction.

Regarding this edit should I avoid use of &hellip; altogether in this work, or is the criticism limited to that one page? (I notice the style of this publication is to use either three or four full-stops in a series and perhaps leaving them as separate characters is the safest choice in any case.)

I am currently aware of four other pages I have (possibly?) incorrectly changed and would like to know your view before changing them all back to separate dots or otherwise proceeding with the other pages. AuFCL (talk) 23:49, 3 August 2016 (UTC)

Hi. Yes, the style is three or four dots as suspension, but do not bother. I will make an alignement pass at the end of the work. It is very fast for me.— Mpaa (talk) 06:08, 4 August 2016 (UTC)
I just wanted to make sure I did not make the situation worse. (I was mainly working through reducing Category:Empty ref tag and proof/validating more as a side-task, so had not picked up on the "house style.") AuFCL (talk) 06:30, 4 August 2016 (UTC) Thank you for your indulgence. I have finished fooling around with Rolland'Tolstoy per my original intent, and await your review and corrections. (This came across far more cynically than I had intended at the point of composition. No offence intended—I plead tiredness!) AuFCL (talk) 11:13, 4 August 2016 (UTC)
No problems, hope the missing ref tags will disappear quickly.— Mpaa (talk) 19:15, 4 August 2016 (UTC)

Follow-up on Fill pages with OCR from PDF[edit]

Hi Mpaa, maybe you remember the mentioned discussion. As a follow-up question i want you to ask if or how it is possible to create pages with providing only the pure text; header and footer should be automatically generated as it is done when a user manually creates a page, e. g. [1]. Btw, how does generate headers and footers? Thank you, --Aschroet (talk) 08:42, 18 August 2016 (UTC)

If you set header and footer via Index page, preload should load them.— Mpaa (talk) 18:24, 24 August 2016 (UTC)
Hi Mpaa, I know that your script does load this header and footer. But when i generate the OCR by an external tools i usually use to create the pages. However, for this script i need to add header and footer on my own, preload is not supported there. Do you have any idea how i could combine the text preloading and the retrieval of the text from a file? Thank you, --Aschroet (talk) 17:49, 20 October 2016 (UTC)
The script adds the headers/footers as per the fields on the Index:. So if you cannot steal the code from there, you could just run a bot through with that script to create the pages, then run your bot through that applies the OCR text. (maybe?) will overwrite the whole page. I think the best way would be modify pagefromfile,py to fetch the page from wiki first and replace only the body taken from Very hard to have this accepted as a global patch. If you want I can make a custom version from you that will work only in the Page namespace. Or, if you describe how you 'generate the OCR by an external tools' maybe there is a better way.— Mpaa (talk) 19:00, 21 October 2016 (UTC)

question about "p.m." and "p. m."[edit]

Thank you for Mpaabot. I do not understand this edit: [2]. The source says "8 p.m." and "10 p.m." and uses small caps. Is "8 p. m." better, or more standard? It takes up more space and varies from the source. Stakes are small here and I don't really care, but if the extra space is considered better then perhaps this task and the reasoning should be listed at User:MpaaBot. Yours in perfectionist precision, -- econterms (talk) 21:16, 11 October 2016 (UTC)

I can't remember. Best guess I can do right now is that there were several 'flavours' across different volumes and I picked one.— Mpaa (talk) 21:25, 11 October 2016 (UTC)
E.g. here [3].— Mpaa (talk) 21:31, 11 October 2016 (UTC)