User talk:Mpaa

From Wikisource
Jump to: navigation, search

(Archives index, Last archive) Welcome

Hello, Mpaa, and welcome to Wikisource! Thank you for joining the project. I hope you like the place and decide to stay. Here are a few good links for newcomers:

Carl Spitzweg 021-detail.jpg

You may be interested in participating in

Add the code {{active projects}}, {{PotM}} or {{CotW}} to your page for current wikisource projects.

You can put a brief description of your interests on your user page and contributions to another Wikimedia project, such as Wikipedia and Commons.

Have questions? Then please ask them at either

I hope you enjoy contributing to Wikisource, the library that is free for everyone to use! In discussions, please "sign" your comments using four tildes (~~~~); this will automatically produce your IP address (or username if you're logged in) and the date. If you need help, ask me on my talk page, or ask your question here (click edit) and place {{helpme}} before your question.

Again, welcome! — billinghurst sDrewth 12:00, 7 April 2011 (UTC)

Have used pywikibot to upload v. 36[edit]

Hi Mpaa, thanks for pointing me at the upload script and the pywikibot ecosystem. After a bit of fumbling (including about 30 reverts -- power tools can be dangerous!:) I've managed to upload Volume 36 of SHSP and I'm pretty happy with the results. Two questions: first, I've produced a "clean" version of the underlying djvu file (no wiki-tags except for cross-page hyphenation), do you think it's worthwhile to upload the file to commons, or do you think the x-page hwe/hws tags are too much formatting as well? If it's still too much formatting, I can write a script to strip them, but am unsure what to change them to: an unhyphenated word at the bottom of the first page, an unhyphenated word at the top of the next page, or go back to how it was. I am happy to follow whatever direction you give on this. Second, should I create a separate bot account for running this script and register it, or is that unnecessary? Thanks in advance for this advice and at the risk of repeating myself, I really appreciate the technical help you've provided. Dictioneer (talk) 13:19, 29 May 2015 (UTC)

I would not touch the djvu text layer. If you extract the xml structure, each words has coordinates, etc. so I do not know how much value it has just to add the text. Regarding the bot, you need to ask to the community to grant you a bot flag for that task and usually create an account for that, see Wikisource:Scriptorium#BOT_approval_requests. And there must be a policy somewhere, should be easy to find but right now I do not have time. Bye— Mpaa (talk) 15:17, 29 May 2015 (UTC)

Strange thing[edit]

Hi Mpaa,

I don't really know my way around Wikisource. I made a small correction at Page:The Spell of the Yukon and Other Verses.djvu/60, changing "one" to "none" (there was none could place the stranger's face), which should have increased the size by one byte, but for some reason it went down by 113 bytes or something like that. Obviously something going on I don't understand. Would appreciate it if you'd take a look. --Trovatore (talk) 06:08, 16 June 2015 (UTC)

Hi. I do not know, some internal MW magic ... I wouldn't bother, your change looks fine anyhow.— Mpaa (talk) 07:36, 16 June 2015 (UTC)

A wikisource database question please[edit]

Post was moved to User talk:Ineuw#A wikisource database question please unsigned comment by Ineuw (talk) .

Apologies for stealing your topic, Mpaa! Please follow on as I expect your input/experience is going to be essential! AuFCL (talk) 03:31, 5 July 2015 (UTC)


Attempted a vallidation of this, but I'd appreciate a second view as whilst I've been very cautious, I'd like to be sure I've caught eveyrthing. Going to give it a second pass in any event. ShakespeareFan00 (talk) 19:51, 27 July 2015 (UTC)

Help with data extraction[edit]

Hi. I am asking you for help to extract some text data from the Wikisource text databases because so far I wasn't successful in achieving this goal.

The data I need is from this pageto this page of the first 105 characters of every paragraph. This may or may not contain already contain a wiki link and an anchor, but the necessary part of the text is the word "Page nnn" followed by a the reference number. From these I convert and create links and generate the anchors in the text pages.

Reference anchor:
{{fs90/s}}{{anchor|463-1}}[[Page:The Conquest of Mexico Volume 1.djvu/53#53-1|Page 9 (<sup>1</sup>)]].—

Main text source:
{{anchor|53-1}}[[Page:The Conquest of Mexico Volume 1.djvu/463#463-1|<sup>1</sup>]]

Not to confuse you, both ends are, or will be anchored and linked for convenience. — Ineuw talk 19:25, 25 August 2015 (UTC)

Done: User:Ineuw/Sandbox. Hope I got you right.— Mpaa (talk) 21:44, 25 August 2015 (UTC)
Perfect, many many thanks.— Ineuw talk 22:36, 25 August 2015 (UTC)
@Mpaa: Could you extract a new data set with the same parameters. Using the previous list I found a number of duplications which I corrected. Thanks in advance, and don't forget to bill me.— Ineuw talk 09:06, 29 August 2015 (UTC)
Done: User:Ineuw/Sandbox.— Mpaa (talk) 10:08, 29 August 2015 (UTC)
Much thanks :-) — Ineuw talk 18:22, 29 August 2015 (UTC)

Volume 2[edit]

@Mpaa: Could you please extract the data like above but for volume 2 and place it again in User:Ineuw/Sandbox? The page range is BEGINNING HERE and ENDING HERE. I am sure that after I made corrections I will come back for another data extraction. Thanks in advance.— Ineuw talk 03:50, 5 September 2015 (UTC)

done.— Mpaa (talk) 07:16, 5 September 2015 (UTC)
Thanks. — Ineuw talk 19:45, 5 September 2015 (UTC)

Categories by gender[edit]

Hi, there! There is a small problem in current version of author template with fetching gender data from wikidata. The problem arises when wikidata gender value is set to "unknown value" (see for example Author:C. E. Brewster). Cheers, Captain Nemo (talk) 01:10, 1 September 2015 (UTC).

Tried to fix it. I added Category:Author pages with unknown gender in Wikidata.— Mpaa (talk) 20:45, 1 September 2015 (UTC)

Publication year of Shakespeare's sonnets[edit]


Back in 2012 you added |year=1598 to all of Shakespeare's sonnets (see eg. Sonnet 4 (Shakespeare)). As far as I know the sonnets were all first published in 1609, except 138 and 144 that had previously appeared, probably through piracy, in The Passionate Pilgrim in 1599). Is there any particular reason you have it down as 1598 here? --Xover (talk) 19:51, 10 September 2015 (UTC)

I just moved the year from being an explicit category to a parameter of the header template (which does automatic categorisation by year), see so I guess you need to find out who put Category:1598 works on the page.— Mpaa (talk) 18:35, 11 September 2015 (UTC)
Thanks. --Xover (talk) 15:51, 12 September 2015 (UTC)

Updated scripts[edit]

Hi Mpaa. I edited your common.js, Regexp toolbar.js, and works.js to update you to the latest version of TemplateScript. You were using a much older version called regex menu framework, so you should notice a lot of improvements. A few of the big changes:

  regex menu framework TemplateScript
regex editor ✓ an improved regex editor which can save your patterns for later use
compatibility unknown ✓ compatible with all skins and modern browsers
custom scripts limited ✓ much better framework for writing scripts
supported views edit ✓ add templates and scripts for any view (edit, block, protect, etc)
keyboard shortcuts ✓ add keyboard shortcuts for your templates and scripts

I also updated deprecated functions. Let me know if anything breaks. :) —Pathoschild 05:02, 12 September 2015 (UTC)

Hi. Thanks a lot!— Mpaa (talk) 08:00, 12 September 2015 (UTC)

use of[edit]

Hi. Are you using to populate pages? I tried

 python -lang:en -family:wikisource -djvu:A_cyclopedia_of_American_medical_biography_vol._1.djvu -index:A_cyclopedia_of_American_medical_biography_vol._1.djvu

then it spat out a string of error messages. Before I start on the fuller diagnosis, wandering if you have used the tool, or whether it is compat mode, and I am bashing my head against the desk. Thanks. `— billinghurst sDrewth 15:23, 23 September 2015 (UTC)

There is bug. Will look into it.— Mpaa (talk) 17:01, 23 September 2015 (UTC)
Actually there were a couple. I submitted a patch. Hopefully will be merged soon (I hope ..., it is difficult to make a forecast on approval time). It is a good thing that someone starts using these scripts, as they have very few users and are either new or ported to core, so new bugs might pop up. @John Vandenberg: might help with the approval.— Mpaa (talk) 18:00, 23 September 2015 (UTC)
Merged. Hope it is OK now.— Mpaa (talk) 19:23, 23 September 2015 (UTC)
Thanks. I will have a go later. I sit in Freenodes's IRC #pywikibot so always happy to prod them for coding checks, just wasn't comfortable on a lesser used and specific script.

My plan is to get some of these biographical compilation works pushed through as that makes them findable in search, and may therefore get some people chipping away at them. [@Charles Matthews: FYI.] I am also thinking that we can set up some mini/sub-projects that may be self-sustaining if we get the parent guidance and components right; leveraging what we learnt from the DNB. Once that is going, I then want to look at some of those publications like Gentleman's Magazine, etc. which are otherwise referenced in this biographical works. I am not looking to use it on chaptered works, of either fiction or non-fiction. — billinghurst sDrewth 05:35, 24 September 2015 (UTC)

Potentially crossing the line with this question.[edit]

Hi again,

Not that your answer will prevent my support for you get the 'crat bit either way -- nor does it preclude the possibility that I'm being outlandishly cautious in my own little world of concerns -- but I'd feel better if I knew for sure; so here it goes...

Are you a resident of Australia?

I know how that may read but the ONLY reason I ask is I'm fairly sure the other 'higher-than-sysop' bit holders are residents of that great nation and I'm a bit concerned one good natural disaster there coupled with some local problem taking place at the same time here could present response issues given the right timing.

I do think you're well suited for the bit no matter how you answer (if at all - a simple 'y/n' will do) and plan to support you given better to have someone competent in place in spite of the slim likelihood of any such confluence of events taking place. Apologies in tenfold if you feel this kind of question is beyond something you should feel you ever need to answer never mind the seemingly 'poor-taste' in my asking of it in the first place may appear to some. All I'm trying to really establish is if there is any chance there may be a "gap" in observation & coverage while moving forward since WS is a 24 hour, 7 days a week endeavor and is mainly why I ask.

Sincerely. -- George Orwell III (talk) 20:59, 23 September 2015 (UTC)

No, in the "Old World" (EU). But there might be a "gap" all the same as I cannot commit to a daily supervision.— Mpaa (talk) 21:14, 23 September 2015 (UTC)
GOIII to note that we have one Oz CU, one Oz 'crat and we are as close together as the UK is to Russia; and I would think that we would be on very different information pipelines. The other CU, and the other 'crat are in the US, and I have no idea of their location to each other. — billinghurst sDrewth 06:03, 24 September 2015 (UTC)

Do you have a bot to auto add unproofed text from OCR?[edit]

I was wondering if was possible to look into mass-adding text for this Index:Ruffhead - The Statutes at Large - vol 3.djvu so that BD2412 and others can run automated cleanup scripts.ShakespeareFan00 (talk) 23:48, 23 September 2015 (UTC)

I am experimenting with Wikisource-bot, as a proof of concept at this stage, where I am targetting a few biographical works that users can: 1) find in a search, 2) may wish to fix for a biographical reference at WP, or 3) for a cross reference for a work here; as such they can dip in and out of as their time permits, and transclude in small parts, and I think that is justifiable. Our previous issues with chaptered works just being added and forgotten about was considered somewhere between valueless and innocuous.

I sense an eagerness for its use, and I would think that the general discussion would need to again be raised to what and where the community thought would be of value to apply non-proofread text; and how it would be envisaged that people are encouraged to proofread its text. In short, that it is being curated, and transcribed, not set and left. Also the benefit in applying the text layer in that form against the issues about having it placed but not progressing in proofread status. So let me see if I can get the tool working, and approved by the community for my plans, then we can look at other uses and tasks with solutions. — billinghurst sDrewth 05:57, 24 September 2015 (UTC)

Fwiw, I have the following workflow with pywikibot:
  • load non-existing pages with pywikibot: the 'preload' functionality will fetch the text layer from the djvu file for me.
  • save all pages in a file that can be used by
  • do the typical clean-up work offline (rh, typos, blanks in punctuation, etc. using text editors, offline scripts, etc. whatever is best in that case)
  • once the result is good enough, I bulk-upload it
  • pywikibot would benefit if it could read/write files containing pages (there is work on going about this).
Another option is to work directly online, interposing a clean-up function between fetching the (not existing) page and saving it.
Mpaa (talk) 18:04, 24 September 2015 (UTC)

Bot substitution & cleanup[edit]

Meant to thank you again for the bot work on War... No small favor. I am flying through the pages now, hoping I don't make too many errors as a result! Londonjackbooks (talk) 19:41, 28 September 2015 (UTC)

Happy of being helpful. Should you need help in future, just ask. In cases like this, with a small effort one can simplify a lot.— Mpaa (talk) 20:31, 28 September 2015 (UTC)

Crap!; or, What one says when pages are missing from one's book[edit]

Thought I'd ask here first; pages 278 & 279 are missing from War. I have images of the missing pages at the ready from another online version (same edition, different printing). The reason that pagination appeared to be squared away is because pages 296 & 297 repeat themselves after p. 297. All is well with pages after that. Do you know how this can be fixed? Sorry, and Thanks! Londonjackbooks (talk) 19:35, 29 September 2015 (UTC)

Yes, it can be fixed. Are you familiar with manipulating djvu files? You should remove the two duplicate images and insert the two new ones in the proper place. Then, we should shift pages correspondingly. If you can refer to djvu page number, it is clearer. If you are not familiar with the process, upload the two images here on WS and we will sort it out somehow.— Mpaa (talk) 20:31, 29 September 2015 (UTC)
I wish I knew how. But DJVU pages 312 & 313 should be removed (they are the pages which repeat), and pages shifted from DJVU pg 294. The pages to insert will fill DJVU pgs 294 & 295. I have uploaded the images. They are located in Category: User images, and are the only images listed. Let me know if I can do anything else, and thanks! Londonjackbooks (talk) 20:50, 29 September 2015 (UTC)
Should be OK now. Make a null edit (op righ menu) if images are not aligned yet.— Mpaa (talk) 22:10, 29 September 2015 (UTC)
Great! Thank you so much for the fix... Glad it was repairable. Londonjackbooks (talk) 22:29, 29 September 2015 (UTC)



You are now a bureaucrat. Enjoy the crushing pressure. :-D

Hesperian 02:28, 1 October 2015 (UTC)

Congratulations - good job! BD2412 T 03:39, 1 October 2015 (UTC)
Congratulations, excellent choice. . . . . and as they say in Budapest, today Rome, tomorrow the world. — Ineuw talk 22:36, 3 October 2015 (UTC)

Purging Index: ns[edit]

I have started Wikisource-bot on the task of purging all the Index: ns (started ~1200 GMT). That should clear up the issue of page editing not updating Special:IndexPage, and where you had identified that some indices were missing their class. — billinghurst sDrewth 12:10, 2 October 2015 (UTC)


Hi, I’ve been busy proofreading Nietzsche the thinker and wondered if you wanted some feedback on your bots’ work? Cheers, Zoeannl (talk) 01:35, 6 October 2015 (UTC)

Yes, that is appreciated, thanks. I actually cleaned up the text offline and just ised the bot for fast page uploading— Mpaa (talk) 21:21, 6 October 2015 (UTC)