User talk:Mpaa/Archives/2017

From Wikisource
Latest comment: 6 years ago by Mpaa in topic Glory of Women
Jump to navigation Jump to search
Warning Please do not post any new comments on this page.
This is a discussion archive first created in , although the comments contained were likely posted before and after this date.
See current discussion or the archives index.

Follow-up on Fill pages with OCR from PDF

Hi Mpaa, maybe you remember the mentioned discussion. As a follow-up question i want you to ask if or how it is possible to create pages with providing only the pure text; header and footer should be automatically generated as it is done when a user manually creates a page, e. g. [1]. Btw, how does wikisource.py generate headers and footers? Thank you, --Aschroet (talk) 08:42, 18 August 2016 (UTC)

If you set header and footer via Index page, preload should load them.— Mpaa (talk) 18:24, 24 August 2016 (UTC)
Hi Mpaa, I know that your script wikisourcetext.py does load this header and footer. But when i generate the OCR by an external tools i usually use pagefromfile.py to create the pages. However, for this script i need to add header and footer on my own, preload is not supported there. Do you have any idea how i could combine the text preloading and the retrieval of the text from a file? Thank you, --Aschroet (talk) 17:49, 20 October 2016 (UTC)
The djvutext.py script adds the headers/footers as per the fields on the Index:. So if you cannot steal the code from there, you could just run a bot through with that script to create the pages, then run your bot through that applies the OCR text. (maybe?)
pagefromfile.py will overwrite the whole page. I think the best way would be modify pagefromfile,py to fetch the page from wiki first and replace only the body taken from pagefromfile.py. Very hard to have this accepted as a global patch. If you want I can make a custom version from you that will work only in the Page namespace. Or, if you describe how you 'generate the OCR by an external tools' maybe there is a better way.— Mpaa (talk) 19:00, 21 October 2016 (UTC)

Hi Mpaa, i successfully used wikisourcetext.py several times. So i did for de:Index:Sylvicultura oeconomica.pdf. However, now i have the case where i had to add some pages to the underlying PDF of an index. This moved many already created pages so that the OCR do not fit anymore. To fix it i tried to run the script with -force again against those pages. Interestingly, the script returns that the pages were written but they did not. Any idea why this happens? When you want to try please only do it for the mentioned page in the command:

python pwb.py scripts/wikisourcetext.py -summary:Seitenerstellung -index:Index:Sylvicultura_oeconomica.pdf -pages:434 -always -force -pt:1

Thank you, --Aschroet (talk) 20:14, 5 February 2017 (UTC)

Hi. The options are misleading. The script only works is pages are not existing. I think there is no way of getting the file text once the page has been created. You have two options: 1. move pages to compensate for the shift with movepages.py (easiest is to provide 'from' 'to' pages via the -pairsfile option) or 2. delete and recreate pages from scratch. If you submit a bug in https://phabricator.wikimedia.org/, it will be easier to get it fixed. Thanks for reporting this.— Mpaa (talk) 18:23, 6 February 2017 (UTC)
Thanks, task T157535. --Aschroet (talk) 09:07, 8 February 2017 (UTC)

mass

What are units of mass Spectacular Gaets (talk) 18:39, 15 February 2017 (UTC)

kg.— Mpaa (talk) 20:28, 15 February 2017 (UTC)

File:X.jpeg

still needed? — billinghurst sDrewth 13:35, 1 April 2017 (UTC)

djvu google cover pages

Where are we with getting the google cover page removals undertaken at Commons? Was I meant to be doing anything in this space? — billinghurst sDrewth 02:09, 25 April 2017 (UTC)

No thanks, it is just me having busy times in RL and I have put this on hold for a while.— Mpaa (talk) 17:46, 25 April 2017 (UTC)

Special:AbuseFilter/30

Hi Mpaa. Does Special:AbuseFilter/30 still need to be enabled? If not, it should probably be disabled to save on resources and save time (not that this one will impact either significantly). Thanks, Sam Walton (talk) 23:06, 4 May 2017 (UTC)

Disabled. I used it to carry out some tests in pywikibot.— Mpaa (talk) 17:26, 5 May 2017 (UTC)

Test entry

Another test.— Mpaa (talk) 17:25, 13 July 2017 (UTC)

This is for test purposes. unsigned comment by SomeUser (talk) 09:20, Jun 17, 2013 (UTC-02:00).

Fractions

Some tests:- User:ShakespeareFan00/Sandbox/Fractions

Want to attempt a re-write of frac into something using the Unicode points, collapsing the common ones into a single character point? ShakespeareFan00 (talk) 10:23, 6 August 2017 (UTC)

I can follow this: http://unicodefractions.com/. Note that there is the choice to put a thin space in such case: 4⁴⁄₆ vs. 4 ⁴⁄₆. So what shall be done here?
If you want this done on your work, please ask for a bot request, so if someone has comments, it will be discussed there.— Mpaa (talk) 14:01, 6 August 2017 (UTC)
The function is ready if you need it.— Mpaa (talk) 21:42, 8 August 2017 (UTC)

Bot request

Hi Mpaa,

As discussed last month on Billinghurst's talk page, I've now got an Excel sheet which has enabled me to do 9 pages of the List of Private Acts and Resolution of the Fifty-Eighth Congress quite quickly and easily, with manual copy paste from Excel to Word, addition of some fixed stuff in Word and then copy/pasting into Wikisource. The Header and Footer I have been doing manually, but I have now put a small roman numerals tracker in the Excel spreadsheet to help a Bot with that. The List was both the best source for the data to populate the actual Private Acts in part 2, but also represented a pretty good trial run, I thought...?

My spreadsheet has the text for the start of each page body, the text for either left page number headers or right page number headers, and the text for footers each in a single cell. The text for each line of data is contained in a single cell which includes wiki mark-up and page number hotlink. Some computation will be necessary to take the page number into the header, and to select those rows which all belong on the same page, but all the data is there for a fully automated creation of pages including djvu filenames.

Would your Bot be able to take my excel spreadsheet and spread its data through the remaining ~50 pages of the List, do you think? If so, how do I get it to you?

Thanks

CharlesSpencer (talk) 14:26, 14 August 2017 (UTC)

I will have to prepare a script for that. Drop me a mail so you can send me the Excel file. Still not 100% clear what is needed, but hopefully looking at the file, it will be clearer.— Mpaa (talk) 18:46, 14 August 2017 (UTC)
Done. Please do not update status of Page:United States Statutes at Large Volume 33 Part 1.djvu/33 and /34.
We are looking into a bug (https://phabricator.wikimedia.org/T173385) and those two pages are under analysis. Bye— Mpaa (talk) 21:00, 15 August 2017 (UTC)
Mpaa thank you very much indeed for your fantastic work - you've saved me days of work! CharlesSpencer (talk) 09:09, 16 August 2017 (UTC)

Page substitution request

Hello Mpaa. I was wondering if this torn page with missing text could be substituted with this same version of the page so that the work of Sartor Resartus &c. can be marked as proofread? Thanks, Londonjackbooks (talk) 11:04, 15 October 2017 (UTC)

Done.— Mpaa (talk) 15:46, 15 October 2017 (UTC)
Thank you very much! Londonjackbooks (talk) 19:08, 15 October 2017 (UTC)

My cleanup contribution

Greetings and salutations, it has been ages since I contacted you, which you should consider to be a good thing. :-)

I am cleaning up some PSM related info and came across this Wikisource:WikiProject Popular Science Monthly/Statistics. The last update was sometimes in 2014, and I feel that if this cannot be updated automatically, the we should delete it. I consider the maintenance of such table is a lot of unnecessary work. I also believe that stats for PSM can be filtered out of available statistical pages on Wikisource at wmflabs, since I see there a lot dealing with various statistics. If this is possible then we should just add link to it. Your thoughts? — Ineuw talk 19:57, 6 November 2017 (UTC)

Glory of Women

The page Glory of Women is in disarray, you might have made some typo while editing it, judging by its history. --CopperKettle (talk) 18:23, 26 December 2017 (UTC)

Thanks— Mpaa (talk) 22:09, 26 December 2017 (UTC)