User talk:Mpaa

From Wikisource
Jump to: navigation, search

(Archives index, Last archive) Welcome

Hello, Mpaa, and welcome to Wikisource! Thank you for joining the project. I hope you like the place and decide to stay. Here are a few good links for newcomers:

Carl Spitzweg 021-detail.jpg

You may be interested in participating in

Add the code {{active projects}}, {{PotM}} or {{CotW}} to your page for current wikisource projects.

You can put a brief description of your interests on your user page and contributions to another Wikimedia project, such as Wikipedia and Commons.

Have questions? Then please ask them at either

I hope you enjoy contributing to Wikisource, the library that is free for everyone to use! In discussions, please "sign" your comments using four tildes (~~~~); this will automatically produce your IP address (or username if you're logged in) and the date. If you need help, ask me on my talk page, or ask your question here (click edit) and place {{helpme}} before your question.

Again, welcome! — billinghurst sDrewth 12:00, 7 April 2011 (UTC)

What's the procedure to move pages?[edit]

Page:Popular Science Monthly Volume 32.djvu/271 should be djvu 153 and Page:Popular Science Monthly Volume 32.djvu/272 should be djvu 154. My question is - should I upload to the Commons the correct copy and replace the current? and ask you to make the pages adjustment? I am unclear about the process except that I don't want to loose any proofread pages.— Ineuw talk 19:00, 21 March 2015 (UTC)

Step 1: update Commons
Step 2: Pages /271 and /272 will have to be move temporarily at the end, to leave space to move range(/153 ... /270) to range (/155 .../272). Then The two images can be moved to the freed space for /153-/154.
Step 3: Then, all indexing in <pages ... > in Main need to be updated accordingly.
This can be done, it should not be a big issue. It is the standard bulk move procedure.
To be noted: All conventions you are using for sections, images and so on will be not valid any longer (but they will still work). E.g. "File:PSM V32 D170 American dredge.jpg" will now be in page D172. Same for section tagging, etc. It will take some additional processing to re-align them again (plus moving images at commons, etc.). Never thought about it in details, but it should be solvable.
Hope it is clear.--Mpaa (talk) 19:27, 21 March 2015 (UTC)
Eminently clear. Realigning the section tags is no problem because I know where they are. I will also modify my database accordingly and regenerate the pages with the sections. As for the images, it is not important for now. Thank you.— Ineuw talk 01:21, 22 March 2015 (UTC)
I checked IA and bookmarked the 2 existing copies. One is our installed copy, and the second (which I have) is the accurate copy. Unfortunately, the difference between the two is not just the misplacement of two pages. There are blank pages and the title pages are different order from the start, as well as the page count is different because of additional blanks and advertisements. Although the printed page numbers are identical, it would be very difficult to reassemble. I decided to stick to the existing installed copy. — Ineuw talk 04:35, 22 March 2015 (UTC)
good copy from biodiveristy
our bad copy from university of toronto
As you prefer.--Mpaa (talk) 21:57, 22 March 2015 (UTC)
It's not a preference, just the realization that I would have to redo the complete volume The TOC, the Index etc. However, I am considering to move and rename this copy & the Index to Volume 32 Old.djvu, both here and on the commons, (if possible) and install the good copy as Index:Popular Science Monthly Volume 32.djvu, and then copy the proofread pages from one to another. I just don't want to overlook anything. Your comments are appreciated.— Ineuw talk 19:37, 23 March 2015 (UTC)
I do not think it will make a big difference in terms of effort. If you are not going to change the internal references (anchors, sections, etc.), you might as well move the whole page. If you want to update the references, you might as well do it on the moved pages. Moreover, all the history will be lost. I think if you have a mapping old->new page, it would be possible to update every instance of sections, TOC, Index etc., as you have kept a consistence notation (Dxx for anchors, Bxx, Exx for sections, etc.). If the page to move are a bit tangled, the only thing to worry about is a good startegy to move pages back and forth.--Mpaa (talk) 21:23, 23 March 2015 (UTC)

@Mpaa: My apologies for making you crazy. I finally figured out what has to be done to replace this book. The good copy I have — to match page 1 and match the page count — all I had to do is insert two blank pages to bring Page 1 = Djvu 11, and then, deleted from the WS copy eighteen pages of advertisements at the end to match the page count of both at 900. Now, to quote from above:

Step 1: update Commons Yes check.svg Done
Step 2: Pages /271 and /272 will have to be move temporarily at the end, to leave space to move range(/153 ... /270) to range (/155 .../272). Then The two images can be moved to the freed space for /153-/154.

Now, all I will have to do is correct the links related to pages between 153 and 272. The rest will align perfectly. Thanks in advance. — Ineuw talk 19:51, 25 March 2015 (UTC)

@Ineuw:, something wrong in Step 2. I see (blk+image) in djvu/155-156 and not in djvu/153-154. Is this supposed to be like this?--Mpaa (talk) 21:04, 25 March 2015 (UTC)
There are also other blk-image couples at djvu/301-302 and djvu/471-472 that needs to be considered. And in this copy, current image Page:Popular Science Monthly Volume 32.djvu/851 and blk looks missing. Please look into it and re-specify the different (djvu/begin, djvu/end, +offset) moves.--Mpaa (talk) 21:41, 25 March 2015 (UTC)
In general, check image presence position in the new copy, as I think some are either moved or missing.--Mpaa (talk) 21:49, 25 March 2015 (UTC)
You are right. I must figure out how to correct the damned thing. Page:Popular Science Monthly Volume 32.djvu/851 seems to be missing from the new volume. Please let me study it again what went wrong. — Ineuw talk 04:54, 26 March 2015 (UTC)
D850 (Blank page) and the D851 (Image of David Ames Welles) should be D737+D738 before the first article of the month (April 1888) Currently they are at D741+D742.— Ineuw talk 05:05, 26 March 2015 (UTC)
This is the only way I can see it done. Pages D271 and D272 is to be moved to the end and then everything is to be moved 2 pages up beginning with 155. This will open up 155 and 156 for the blank D271 and the image D272. I will check then everything until I find the next page move. Can we do it one move at a time over the next few days? As I mentioned earlier, I have no idea what is out of order. — Ineuw talk 08:27, 29 March 2015 (UTC)

@Mpaa: Unfortunately, can't oblige you with a list of From D# to D# until we fix the first problem - which is to insert two pages D155+D156 so that D271+D272 can be moved to D155+D156 where it belongs and the gap of D271+D272 is closed up. The overall problem is as follows:

  • A "standard" volume of PSM has 6 paired pages without page numbers. Volume 32 has 7 pairs. They are D271+D272, D301+D302, D412+D413, D447+D448, D471+D472, 595+596, D741+D742. The six portraits are ALWAYS appear as the last two pages of a monthly publication. It's the shift of 2 to 4 to 2 pages is what I can't figure out. If you can, fine, but I can't.
  • I inserted 2 empty pages at the beginning so that both volumes should begin at D11. The images not only shifted by two pages, but they are inserted altogether in the wrong places and unless I go step by step, I can't see my way through as to where the additional pages go.
  • At this point, I don't care about incorrect main namespace links, section tasks and anchors. I must correct my database from where these are generates from one central entry. The system generates the table of contents, the main namespace links and the sections to the pages and the Index with 400+ entries and their anchors. — Ineuw talk 07:00, 30 March 2015 (UTC)
If you want I can move them, but I really can't see how it will help you in your research. Even if you will see the text aligned with the scans up to D272, how will that help you with the following pages? Nothing will change from D273 upwards. I'll try to compare the 2 djvus to see if I can shed some light.--Mpaa (talk) 17:44, 30 March 2015 (UTC)
The +4 in some ranges after D469 is the combination of the misplaced portrait-blk pair, (when moved back -> contr. +2) and the insertion of D471+D472, (another +2 contr.).
If you download the 2 versions of djvus and compare, range old->new is relatively easy to spot. Regarding the 7 vs. 6 pairs , maybe D471+D472 was at the end of the volume as a larger folded page (see the marks on the page) and has been inserted in this position in this version of Vol. 32.--Mpaa (talk) 19:23, 30 March 2015 (UTC)
That was my assumption but when I get to this at 10pm at night, I no longer trust myself. Therefore, I leave it in your capable hands A Smiley.jpg.— Ineuw talk 20:47, 30 March 2015 (UTC)
What do you think of the position of these D471+D472? Shall they stay where they are in the new vol? Even if they break the 6-pair scheme?--Mpaa (talk) 11:53, 31 March 2015 (UTC)
D471+D472 should appear as they are in the new upload (not at the end). I mentioned the six pairs of portraits only as an example that they are the basic six pairs which are placed at the end of the monthly editions (up to volume 54). When I did the original organization, these were the ones that I found immediately, because of the blank image protector page. There are many other images in the volumes without page numbers, but not always accompanied by a blank page.— Ineuw talk 18:54, 31 March 2015 (UTC)
Page move should be OK now. If you want to save some work and you're not in a hurry, it will not take long to adapt a bot to replace sections, links, etc all over. Otherwise, proceed as you feel. I am sending you the old->new pages. Bye--Mpaa (talk) 22:28, 31 March 2015 (UTC)
Mpaa, Much thanks and my gratitude. I will do the section tags & anchors because I always check the main namespace pages. — Ineuw talk 22:47, 31 March 2015 (UTC)

A problem cropped up[edit]

Everything up to and including matches perfectly. From here on I am lost because the uploaded original has the correct page contents including D597 This is a duplicate OF THIS PAGE and the real D597 seems to be missing. I can resolve it by uploading the same copy again where the page is in the right place. I have been following the DjVu book pagination with this installed volume. — Ineuw talk 05:21, 1 April 2015 (UTC)

No clue. But the djvu file has the correct page. Let's wait and see if it is a cache issue.--Mpaa (talk) 06:43, 1 April 2015 (UTC)
I am sure it is a cache issue, these two links show two different pages, one wrong and one correct.
Bye--Mpaa (talk) 09:46, 1 April 2015 (UTC)
The second image differs in one minute detail, "1023px" and that "Bye" sounds so ominous. A Smiley.jpg. — Ineuw talk 18:01, 1 April 2015 (UTC)
"1023px" has probably forced a new thumbnail, and the correct page has been used, so I think we just need to wait for the "1024px" one to be refreshed. Sincerely A Smiley.jpg--Mpaa (talk) 19:08, 1 April 2015 (UTC)
Thanks. There is loads of info on the commons about refreshing djvu uploads such as this, but none of them worked. I just mention to you because I was curious, but will wait when it happens.A Smiley.jpgIneuw talk 20:31, 1 April 2015 (UTC)

Categorizing by gender[edit]

If we're going to classify by gender, then we need to classify all authors, not just authors in one gender, as if the other gender was the default state.--Prosfilaes (talk) 00:49, 2 April 2015 (UTC)

Better still, don't create these categories. n.b. Your bot is not approved for categorizing Author pages this way. --EncycloPetey (talk) 02:32, 2 April 2015 (UTC)

(Sigh!) At just which point does somebody point out to the above pair of (functional) morons the fact Mpaa was merely being a "nice guy" respondent to this request? Hang your heads for raising this matter elsewhere than where the discussion belongs. For shame!
Now which one of you is going to be grown-up about this matter and restart the discussion appropriately; preferably even involving the original requester, user:Nonexyst?unsigned comment by (talk) .
I leave the discussion of this matter to @Nonexyst: in another place. I just helped with his request. Nobody also objected when I said I would have taken care of it. If it has to be undone, so be it.--Mpaa (talk) 06:55, 2 April 2015 (UTC)

bot: "(align formatting)"[edit]

Mpaa, in using the bot are all of those edits showing (align formatting) mistakes I made in validating? If so then something is wrong because they looked correct to me and I double-check before saving. —Maury (talk) 19:52, 6 May 2015 (UTC)

Hi. No, no problems, I wanted to align the left side note. It is just that it easier to process all pages in the same way than selecting only some pages.— Mpaa (talk) 19:57, 6 May 2015 (UTC)

Thanks for responding and further info on bot request[edit]

What a small wiki world! You may not recall, but you assisted me with my first two texts last fall when I first joined the WS project. Thanks again for that. For this topic I will respond more formally on the bot page (where I hope you'll notice my astute use of the emdash  –  :), but I just wanted to say hi and fill you in on the background to this request.

First, the OCR button: with all due respect, user Beeswaxcandle responded to my problems with the OCR button by explaining that that it activates an OCR routine that rescans the image and does not re-emit the underlying text in the DJVU file. I note that this sometimes corresponds to my experience of the button, but for me the button's behavior and even whether it appears on my toolbar is unpredictable, even with the appropriate widgets preference set, so I am unable to use it productively.

Try to set it up. It is all you need.— Mpaa (talk) 11:24, 15 May 2015 (UTC)

Second, the reason for tagging the underlying text, then uploading: I have reluctantly reached the conclusion that there are several types of correction best done on the whole volume because of the context needed to be able to make good decisions about running headers, titles, cross-page hyphenation, front v. back matter, etc., so this is my experiment along those lines. Again, Beeswaxcandle and I discussed this on my talk page if you're curious about the history.

One thing are the corrections on the whole volume, another is to embed wiki-sytax in the djvu file. I would discourage that,as if someone needs the Djvu from other purposes, the text layer will be full of useless stuff.— Mpaa (talk) 11:24, 15 May 2015 (UTC)

LBNL, I chose the Southern Historical Society Papers project because it appeared to have stalled but has a user (Maury) who was interested in making further progress on the series and has several volumes I am interested in seeing completed. For the number of pages involved, a mass reload (assuming my script on the djvu text works sufficiently well) would be far more efficient and make it possible to finish the SHS project this year.

Probably more than you wanted to know, but I thought extra detail might be helpful since you had previously assisted me. Feel free to reply on the bot page or my talk page, and thanks again, Dictioneer (talk) 00:17, 15 May 2015 (UTC)

Thanks for your quick response, both here and on the bot page. Here is my experience with the OCR button: I bring up a page in edit mode, click on "Proofread tools", and there is no OCR box. I click on "Preferences" at the top, then on "Gadgets." On the Gadgets page, in the second section, "Editing tools for Page: namespace", I click the checkbox " OCR: Enable OCR button Button in Page: namespace." and click on "Save" at the bottom of the page. I go back and reclick through from the Index space to the specific page. The OCR button doesn't appear. I go back to preferences/gadgets and in the section "Development (in beta)" I click on "Add a toolbox link to reload the current page with Resource Loader in debug mode." Lather, rinse, repeat, back to the specific page in question. Still no OCR button. I click on the "Debug" button in the left pane under "Tools." The OCR button usually appears, though not always. Sometimes I have to click on the EDIT link again, in which case the OCR button always appears. If I click on the OCR button, it reloads the text, but the text doesn't contain my most recent changes.
I should note I have tried this with Firefox on Ubuntu, Mac, and Windows, with Chrome on Ubuntu and Mac, and with Internet Explorer on Windows. If you have a fix for this, or debugging advice, or can point me to a resource that will help me sort it out, that would be great. I've also copied in common.js and common.css text from user Beeswaxcandle so that my setup was as close to his/hers as possible. If you think it would help to copy in your script source, I'm happy to try that.
In my experience, the only way that reliably reloads my updated text and formatting from the djvu source file is going to the testpage Beeswaxcandle has deleted for me. There, the correct text shows up whether I have an OCR button or not, and the OCR button (if I press it) reloads the updated text correctly in that circumstance. I have not saved this page since it is the only one I have that reliably works. This is the reason for my request for a mass delete of non-proofread pages previously uploaded by LA2-bot.
Thanks for any help you can provide or for pointing me to an appropriate resource to get this debugged. At the moment, the only workaround that gives the desired result is the one proposed by Beeswaxcandle. I am open to alternatives, but would need a link to the relevant bot, upload script or other documentation that would get me started. Also, once the text has been populated, I would be fine to upload a version of the djvu file which has all wiki-formatting removed, it's just difficult at the moment to see how to get to that point. Dictioneer (talk) 17:28, 15 May 2015 (UTC)
Try to copy my User:Mpaa/vector.js an d keep ypur preferences as simple as possible (in editing select Show edit toolbar and Enable enhanced editing toolbar.— Mpaa (talk) 18:47, 15 May 2015 (UTC)
No luck, I'm afraid. Here's what I did: A) went into preferences and hit the reset button, then verified the Show Edit & Enable Enhanced settings. B) created a blank vector.js page and copy over your source. C) exited the browser, restarted and logged in. No OCR button. Realized that I'd reset the OCR gadgets preference to off in my general reset, went to gadgets and re-enabled OCR. Went back to edit a page. Still no OCR. D) Went back to Gadgets and re-enabled the Debug setting under "Development". I am now back to the original behavior, which is if I click Edit, then Debug, I usually get the OCR button to show up. E) However, when I click the OCR button it will reload the text, but does not reload the most current of the text. F) I also went back and copied in your vector.css, common.js, and common.css just for thoroughness sake. Same result.
I think it might be illuminating to separate this problem into its less important and more important parts: the unpredictable appearance and disappearance of the button(annoying but less important), and what actually happens on a page when you press OCR. Let's assume that the OCR button appears reliably for you. How does it behave when you use it on these text pages? Here is the experiment to try: A) go to Index:Southern_Historical_Society_Papers_volume_35.djvu and click on Index page 19/file page 33, aka to see what happens. I get a page that warns me this page has been deleted by Beeswax and I should think seriously before recreating it. B) In the text itself you should see a "noinclude" running header and a hwe tag for possessor: this reflects the current updated djvu file. C) Now cancel out and go to page 20/34 (i.e., the next page), which still exists. You will see it displayed without any running header tag. D) Press OCR, and the page is refreshed with a running header but without the noinclude tags. This text is from an old version of the file, not the most current one from commons. E) Therefore, P. 19 (which Beeswax deleted) is correctly updated, p. 20 (not deleted) is not.
Did you get the same result? There are other differences on the page I could detail, but I assume one missing change is enough for now. Thanks for taking the trouble to help me figure out what's going wrong. Dictioneer (talk) 14:45, 16 May 2015 (UTC)
This is the link that is called by OCR button [1], and then this is parsed. If you copied my settings, OCR should apper under "Proofread Tools".— Mpaa (talk) 19:43, 16 May 2015 (UTC)
Another issue is that OCR considers all the text as part of the body, so I guess it will not include Template:Rh... in the header.— Mpaa (talk) 19:56, 16 May 2015 (UTC)
Unfortunately, OCR still only occasionally appears but generally doesn't. The noinclude tag was included in a revision at Beeswax's suggestion, apparently he and Maury have a hot-key that activates and .js routine that will take the running-header and put it in the header box. In any case, other changes in the text underlying the djvu file also do not appear, not just the running-header noinclude tag.
I can provide other details of what's not appearing if that's your preference, but personally I would suggest that you proceed with the deletion algorithm you propose on the bot page and that we resume trying to chase down the source of this problem at some point in the future. The problem seems important to me since users who edit text and re-upload djvu files will have some of their changes appear and some not for no apparent reason. However, this is clearly a difficult and intermittent problem, so a brief break from chasing it may be good for all involved. Let me know if there's any help I can provide on the revised deletion/reloading based on LA2-bot being the most recent updater of the page. Dictioneer (talk) 20:59, 16 May 2015 (UTC)
I'll try to upload the latest text-layer, I need to customise a script first. But trust me, updating the text layer in the djvu file is a viable option if and only if a new OCR-process will be reapplied to the file. All the rest can be done working directly on text files, without bothering changing the djvu. And also dividing header, body and footer is not trivial. If you want to apply this approach in the future, try to stick to this file format to divide the different pages: Note that this is not done to handle the Proofread Page format, so if you want to define headers and footers, try to mark them in the text with a convention that makes it easy to recognise them (e.g. @@HEADER_START@@all the header text@HEADER_END@@ or similar). It will make the rest of the process easier. There is WIP on the bot side to handle this in an easy way in the future.— Mpaa (talk) 21:12, 16 May 2015 (UTC)
One more thing. The advantage is tha once you have the file done, you can apply all the text improvements you want with a text editor, working off-line and using search and replace patterns per file instead of per page. And upload only the final step of all your improvements.— Mpaa (talk) 21:16, 16 May 2015 (UTC)
BTW, your syntax for {{hwe}} is wrong. See Page:Southern_Historical_Society_Papers_volume_35.djvu/86. {{hwe|con|fidence.}} should be {{hwe|fidence.|confidence.}}
Good catch, I've updated my script accordingly. I may start a new topic with questions about the mediawiki link above, but you've given me a huge amount of help already, so I'll let you get back to your own texts for awhile before I bug you again. Thanks so much, and if there's anything I can do for you in return, just let me know. Dictioneer (talk) 22:06, 17 May 2015 (UTC)

Have used pywikibot to upload v. 36[edit]

Hi Mpaa, thanks for pointing me at the upload script and the pywikibot ecosystem. After a bit of fumbling (including about 30 reverts -- power tools can be dangerous!:) I've managed to upload Volume 36 of SHSP and I'm pretty happy with the results. Two questions: first, I've produced a "clean" version of the underlying djvu file (no wiki-tags except for cross-page hyphenation), do you think it's worthwhile to upload the file to commons, or do you think the x-page hwe/hws tags are too much formatting as well? If it's still too much formatting, I can write a script to strip them, but am unsure what to change them to: an unhyphenated word at the bottom of the first page, an unhyphenated word at the top of the next page, or go back to how it was. I am happy to follow whatever direction you give on this. Second, should I create a separate bot account for running this script and register it, or is that unnecessary? Thanks in advance for this advice and at the risk of repeating myself, I really appreciate the technical help you've provided. Dictioneer (talk) 13:19, 29 May 2015 (UTC)

I would not touch the djvu text layer. If you extract the xml structure, each words has coordinates, etc. so I do not know how much value it has just to add the text. Regarding the bot, you need to ask to the community to grant you a bot flag for that task and usually create an account for that, see Wikisource:Scriptorium#BOT_approval_requests. And there must be a policy somewhere, should be easy to find but right now I do not have time. Bye— Mpaa (talk) 15:17, 29 May 2015 (UTC)

Strange thing[edit]

Hi Mpaa,

I don't really know my way around Wikisource. I made a small correction at Page:The Spell of the Yukon and Other Verses.djvu/60, changing "one" to "none" (there was none could place the stranger's face), which should have increased the size by one byte, but for some reason it went down by 113 bytes or something like that. Obviously something going on I don't understand. Would appreciate it if you'd take a look. --Trovatore (talk) 06:08, 16 June 2015 (UTC)

Hi. I do not know, some internal MW magic ... I wouldn't bother, your change looks fine anyhow.— Mpaa (talk) 07:36, 16 June 2015 (UTC)

A wikisource database question please[edit]

Post was moved to User talk:Ineuw#A wikisource database question please unsigned comment by Ineuw (talk) .

Apologies for stealing your topic, Mpaa! Please follow on as I expect your input/experience is going to be essential! AuFCL (talk) 03:31, 5 July 2015 (UTC)


Attempted a vallidation of this, but I'd appreciate a second view as whilst I've been very cautious, I'd like to be sure I've caught eveyrthing. Going to give it a second pass in any event. ShakespeareFan00 (talk) 19:51, 27 July 2015 (UTC)