User talk:Mpaa

From Wikisource
Jump to: navigation, search

(Archives index, Last archive) Welcome

Hello, Mpaa, and welcome to Wikisource! Thank you for joining the project. I hope you like the place and decide to stay. Here are a few good links for newcomers:

Carl Spitzweg 021-detail.jpg

You may be interested in participating in

Add the code {{active projects}}, {{PotM}} or {{CotW}} to your page for current wikisource projects.

You can put a brief description of your interests on your user page and contributions to another Wikimedia project, such as Wikipedia and Commons.

Have questions? Then please ask them at either

I hope you enjoy contributing to Wikisource, the library that is free for everyone to use! In discussions, please "sign" your comments using four tildes (~~~~); this will automatically produce your IP address (or username if you're logged in) and the date. If you need help, ask me on my talk page, or ask your question here (click edit) and place {{helpme}} before your question.

Again, welcome! — billinghurst sDrewth 12:00, 7 April 2011 (UTC)

Mining data from the quarry[edit]

Hi, and thanks for the link to the data query page and the link to the database schemas. Managed to modify and run my query (it was written for MySQL and the current database is dbMaria), but I got some unrelated categories and garbage, which means that the category links are incorrectly defined in the SQL statement. I'll keep at it which also means I have delve into the SQL of dbMaria which is a bit different that MySQL. Must assume that the switch to the new DBSM was because Oracle owns MySQL. — Ineuw talk 00:07, 7 January 2015 (UTC)

Welcome. I am no expert on databases but I use MySQL workbench to connect to dB (if this can be useful to you).--Mpaa (talk) 19:28, 7 January 2015 (UTC)
Yes it is useful and I will reinstall it. I used to use that a long time ago, (6 years) but that is not the problem.. . . I removed the relational constraints to the categorylinks, and ran a simple "SELECT" query to see if it can select just the titles. There are nearly 6000 titles in the PSM main namespace and it selected only about 2,200. When you have the chance, could you run this simple query for PSM article titles for the main namespace?
Like ('Popular_Science_Monthly%')
AND = 0

Ineuw talk 19:57, 7 January 2015 (UTC)

Same here, weird ...:
MariaDB [enwikisource_p]> SELECT COUNT(*) FROM WHERE  LIKE ('Popular_Science_Monthly/%s') AND = 0;
| COUNT(*) |
|     2206 |
1 ROW IN SET (0.01 sec)
OK, I got it. It should be something wrong in the LIKE specs. %s must discar something(apostrphes?, unicode? whatever ...?)
MariaDB [enwikisource_p]> SELECT COUNT(*)  FROM WHERE  LIKE ('Popular_Science_Monthly%') AND = 0;
| COUNT(*) |
|     8933 |
1 ROW IN SET (0.01 sec)

--Mpaa (talk) 21:08, 7 January 2015 (UTC)

Thanks. I will figure out what's going on from the 2nd recordset — Ineuw talk 21:12, 7 January 2015 (UTC)

Hi again, using the 2nd SQL statement, I extracted all the titles and everything matches up to my Access database (8016). The additional entries are all redirects pointing to Obituaries and Articles - but without the {{ROOTPAGENAME}}. of PSM.
The issue is to check the link to the categories by testing some records of "categorylink" layout because my original copy of the SQL statement from 2 years ago mentions a fieldname link which no longer exists, or rather it has been renamed. It is just a matter of some detective work.
The lazy way is to print the field lists of each table in question from MariaDB, extract a few complete records from each, recreate the structures in MSAccess and see what I get. I have used MSAccess as a query design tool (very sophisticated, the best I've ever come across) and as a graphical front end to connect to MySQL. The two basic SQL differences that I remember is that MSAccess uses '*' instead of '%' to indicate everything, and 'constant strings' in a MYSQL statement can only be enclosed with single quotes, while MSAccess accepts both single or double qutes. P.S.: I always wondered who is/was Maria. — Ineuw talk 04:12, 11 January 2015 (UTC)
I think the issue is in the search pattern. This should do the trick.
    enwikisource_p.categorylinks ON = enwikisource_p.categorylinks.cl_from
WHERE REGEXP 'Popular_Science_Monthly/Volume_.*'
        AND = 0;
Bye--Mpaa (talk) 09:40, 11 January 2015 (UTC)
Thanks. I created and executed a similar statement successfully which resulted in an accurate list of the titles, but no categories. After studying the schema, I concluded that the categories table is missing from the SQL. The original of this query created a temporary table with the article titles and the link # and then linked this to the categories.
USE enwikisource_p;
SELECT, categorylinks.cl_to
 FROM categorylinks INNER JOIN
  ON = categorylinks.cl_from WHERE categorylinks.cl_to LIKE
  ('Popular_Science_Monthly_Volume%') AND page.page_namespace = 0;

Ineuw talk 20:17, 11 January 2015 (UTC)

Finally got it[edit]

Extracted the structure of each table and then extracted a couple of 100 records from each table and figured out what is happening and how the info is stored. Below is the correct SQL statement - yielded some 27,000 records. :-). Thanks again for your guidance. BTW, SQL is easier than it looks, only the table JOINs are a bit tricky.

USE enwikisource_p;
SELECT page.page_title, categorylinks.cl_to
 FROM categorylinks INNER JOIN page
  ON page.page_id = categorylinks.cl_from WHERE page.page_title LIKE
  ('Popular_Science_Monthly_Volume%') AND page.page_namespace = 0;

Ineuw talk 16:43, 21 January 2015 (UTC)

Good. I think the JOIN statement is in this case symmetric, so it should be equivalent to the above. I also got 27000+ pages.
The syntax for asymmetric JOIN is where I am a bit weak but if you really want to get comfortable, use the MSAccess query designer to create asymmetric JOIN. It's the best visual designer I've seen anywhere, and the SQL is easy to convert to MariaDB. — Ineuw talk 05:45, 22 January 2015 (UTC)

{{nop}} vs <nowiki />'s[edit]


What is reason to use {{nop}} rather than <nowiki /> in pagebreaks? As for me, it looks equally on HTML and on epub, but the latter doesn’t spoil plain text with any html-tags, and it’s used in French Wikisource.

And what is this? Is this inclusion of a couple of pages really needed? For me, it looks like a trash.

Best regards, Nonexyst (talk) 21:29, 13 February 2015 (UTC)

That is what is the recommended way here to break pages, see Help:Formatting_conventions. If you think that should be the good way, I encourage you to post it here Wikisource:Scriptorium. As far as the page above, that is a mistake, will fix it. A mistake might happen, call it a trash sounds a bit harsh. Was just trying to be helpful, in the future I'll stay away ...--Mpaa (talk) 21:51, 13 February 2015 (UTC)
Well, maybe, I’ll propose it there. Sorry if it sounds harsh, I’m not a native English speaker, so it can raise some misunderstading.Nonexyst (talk) 22:09, 13 February 2015 (UTC)

PSM Obituary Notes[edit]

Ciao. There are anchors placed in the Obituary Notes section of this page. Can you recall where they are anchored to? Ineuw (talk) 02:11, 25 February 2015 (UTC)

By context may I hazard a guess: Popular Science Monthly/Volume 38/December 1890/The Identity of Light and Electricity 07:14, 25 February 2015 (UTC)
Not really, Mpaa started to organize something and the topic ended up as a major discussion in the Scriptorium about creating a separate section for obituaries. My only contribution was that obits in PSM appear everywhere (any section), otherwise I wasn't involved and don't know what happened to it. Ineuw (talk) 09:24, 25 February 2015 (UTC)
In this page: Author:Heinrich Hertz: Obituary in "Obituary Notes", in Popular Science Monthly Volume 44, April 1894--Mpaa (talk) 18:09, 25 February 2015 (UTC)
Thanks Mpaa, (I should have clicked on the link). But, there are two problems: The other end of the anchor is directed back to the Obituary section but not to the paragraph, and the index of this volume also contains an obit section for which the anchor is generated automatically. I guess my only option is to use two anchors. Ineuw (talk) 05:27, 26 February 2015 (UTC)
Found it, Popular Science Monthly/Volume 44/April 1894/Obituary: Heinrich Rudolf Hertz. There was some debate about this. Proceed as you think is better, I will not oppose if you decide to change the approach.--Mpaa (talk) 07:43, 26 February 2015 (UTC)
Thanks. I will figure something out when I do the indexes with obituaries. Ineuw (talk) 18:03, 26 February 2015 (UTC)

A definition[edit]

The is no greater humbling experience than that of revisiting one's old proofreading. Ineuw (talk) 05:26, 2 March 2015 (UTC)

 :-)--Mpaa (talk) 12:08, 2 March 2015 (UTC)
By the way, the pages I worked on yesterday in Volume 35 were for demo purposes for Zoeannl. Ineuw (talk) 18:10, 2 March 2015 (UTC)

POM accountancy?[edit]

Hi Mpaa.

I took the liberty of copy/mutilating one of your queries to produce a more succinct summary of edit activity. In case it is of any use to you, here it is + current output. (Of course this ought to work as well but I am unclear as to whether that forces a query re-execution when viewed by a (different) user: (Interpolated semi-snarky note: I have since discovered it does! But only if you are logged in to Quarry. If you are logged out it shows the prior run result set. Tool still in development mentality?

    COUNT(p2.page_id) AS 'Edit Count'
        INNER JOIN
    enwikisource_p.pagelinks pl ON p1.page_id = pl.pl_from
        INNER JOIN p2 ON p2.page_title = pl_title
        INNER JOIN
    enwikisource_p.revision r ON p2.page_id = r.rev_page
    p1.page_title IN ('A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu')
        AND p1.page_namespace = 106 /* N.B. INDEX namespace! */
        AND (r.rev_timestamp > '20150201000000'
        AND r.rev_timestamp < DATE_FORMAT(NOW(), 'YYmmddHHiiss'))
GROUP BY page_title,rev_user_text;
rev_user_text page_title Edit Count A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 1
Acalycine A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 1
Aphillipsmusique A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 64
BD2412 A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 2
Beleg Tâl A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 1
Billinghurst A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 1
CMRBuck A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 2
Curly Turkey A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 11
Dick Bos A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 42
Duckias A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 15
Einstein95 A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 4
EncycloPetey A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 16
Hazmat2 A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 39
Hesperian A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 1
HueSatLum A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 1
Jimregan A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 4
John Carter A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 5
Keith Edkins A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 6
Legofan94 A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 22
Meggos703 A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 1
Michael Barera A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 13
Moondyne A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 44
Nonexyst A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 2
Pathore A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 8
Pixelwarrior A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 10
Prtksxna A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 24
Siddhant A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 9
Slowking4 A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 10
Spiros790 A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 44
TroyGab A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 3
William Maury Morris II A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 5
Wurstmaster A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 1
Zyephyrus A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 3
Showing 1 to 33 of 33 entries

Now of course (unless I have made a mistake) this output ought to be an accurate summary of your own version of the query. This is only an ideas investigation for pulling out the detail I thought might be most relevant, along with using the p1.page_title IN clause (strictly not necessary—so far—for this month) as an indicator of how future multi-work month edit summaries might be more easily prepared.

Cheers, AuFCL (talk) 05:32, 6 March 2015 (UTC)

Hi. No problem. Mine was just a first attempt to see if I could make life easier for BWC in POTM handling.--Mpaa (talk) 10:25, 6 March 2015 (UTC)
Good to hear I am not totally wasting your time!

I evolved the query a little further (Quarry 2472) to produce a further breakdown of "interesting" name spaces the edits affected (Currently: Page:, Index: and Main/transclusion.) As a fail-safe I included a count of "other" edits not classified and am gratified/a little surprised that Nonexst's edit of Wikisource talk:Proofread of the Month was included in the count.

Is this useful and/or per your expectations? Does it reveal any shortcomings with which I ought to be concerned? Right at this point I consider this is about the sensible point to stop fiddling but of course your requirements may vary considerably from my expectations (and of course thank you for getting the basic "associated page" logic core of this thing sorted out in the first place!)

Regards, AuFCL (talk) 23:14, 6 March 2015 (UTC)

Looks good to me. @Beeswaxcandle:, please fill in if you have more needs. I think Nonexst's edit is actually Author:Robert_Mallet, the other page link you stated does not 'leave' the index page [1].--Mpaa (talk) 19:14, 7 March 2015 (UTC)
Mea culpa. You are absolutely right regarding the Robert_Mallet edit. In any case it still demonstrates the point that this SQL-based approach perhaps "reaches" just a little farther than naïve expectation might otherwise suggest.

I am inclined to add a further AND edp.page_namespace = 104 /* N.B. PAGE: namespace! */ restriction to the WHERE clause (so that only actual Page:-space edits are counted, but feel the payback is not worth the load of running another test.

Besides I have (if you will but take my word for it!) trialled this precise approach elsewhere. AuFCL (talk) 23:25, 7 March 2015 (UTC)

As a compleat aside, did you mean something like this instead of the Special:WhatLeavesHere link above, as the latter now seems to lead me to a No such special page warning (even though I have the Gadget selected et al.) Is this worth chasing up, or perhaps is an already-known issue? AuFCL (talk) 23:45, 7 March 2015 (UTC)

No, I actually meant that link and for me it works and shows some links. It was to prove that it could not be Wikisource talk:Proofread of the Month, that is in namespace=5, as there are no links leaving the Index page towards ns=5, while there are towards ns=102 (Author), see Strange it does not work for you, try 'What leaves here' from the Index Page (if you feel like chasing this ...). Otherwise I am satified with your query.--Mpaa (talk) 00:20, 8 March 2015 (UTC)
Thanks for double-checking. (And yes, I realised I had settled on the wrong page. Not quite sure in hindsight why I picked that one. Alzheimers setting in?)

Regarding this whole side issue (sigh) please forget it. Turned out to be a problem of my own making: the so-called RequestPolicy extension to Firefox installed here was blocking page links. (Thanks for forcing me to look into this properly. Security is a pain; the lack of it is also a pain. If this is telling us something fundamental regarding computers, networking and masochism, I really wish I didn't have to think about it.)

Regards, AuFCL (talk) 02:01, 8 March 2015 (UTC)

What's the procedure to move pages?[edit]

Page:Popular Science Monthly Volume 32.djvu/271 should be djvu 153 and Page:Popular Science Monthly Volume 32.djvu/272 should be djvu 154. My question is - should I upload to the Commons the correct copy and replace the current? and ask you to make the pages adjustment? I am unclear about the process except that I don't want to loose any proofread pages.— Ineuw talk 19:00, 21 March 2015 (UTC)

Step 1: update Commons
Step 2: Pages /271 and /272 will have to be move temporarily at the end, to leave space to move range(/153 ... /270) to range (/155 .../272). Then The two images can be moved to the freed space for /153-/154.
Step 3: Then, all indexing in <pages ... > in Main need to be updated accordingly.
This can be done, it should not be a big issue. It is the standard bulk move procedure.
To be noted: All conventions you are using for sections, images and so on will be not valid any longer (but they will still work). E.g. "File:PSM V32 D170 American dredge.jpg" will now be in page D172. Same for section tagging, etc. It will take some additional processing to re-align them again (plus moving images at commons, etc.). Never thought about it in details, but it should be solvable.
Hope it is clear.--Mpaa (talk) 19:27, 21 March 2015 (UTC)
Eminently clear. Realigning the section tags is no problem because I know where they are. I will also modify my database accordingly and regenerate the pages with the sections. As for the images, it is not important for now. Thank you.— Ineuw talk 01:21, 22 March 2015 (UTC)
I checked IA and bookmarked the 2 existing copies. One is our installed copy, and the second (which I have) is the accurate copy. Unfortunately, the difference between the two is not just the misplacement of two pages. There are blank pages and the title pages are different order from the start, as well as the page count is different because of additional blanks and advertisements. Although the printed page numbers are identical, it would be very difficult to reassemble. I decided to stick to the existing installed copy. — Ineuw talk 04:35, 22 March 2015 (UTC)
good copy from biodiveristy
our bad copy from university of toronto
As you prefer.--Mpaa (talk) 21:57, 22 March 2015 (UTC)
It's not a preference, just the realization that I would have to redo the complete volume The TOC, the Index etc. However, I am considering to move and rename this copy & the Index to Volume 32 Old.djvu, both here and on the commons, (if possible) and install the good copy as Index:Popular Science Monthly Volume 32.djvu, and then copy the proofread pages from one to another. I just don't want to overlook anything. Your comments are appreciated.— Ineuw talk 19:37, 23 March 2015 (UTC)
I do not think it will make a big difference in terms of effort. If you are not going to change the internal references (anchors, sections, etc.), you might as well move the whole page. If you want to update the references, you might as well do it on the moved pages. Moreover, all the history will be lost. I think if you have a mapping old->new page, it would be possible to update every instance of sections, TOC, Index etc., as you have kept a consistence notation (Dxx for anchors, Bxx, Exx for sections, etc.). If the page to move are a bit tangled, the only thing to worry about is a good startegy to move pages back and forth.--Mpaa (talk) 21:23, 23 March 2015 (UTC)

@Mpaa: My apologies for making you crazy. I finally figured out what has to be done to replace this book. The good copy I have — to match page 1 and match the page count — all I had to do is insert two blank pages to bring Page 1 = Djvu 11, and then, deleted from the WS copy eighteen pages of advertisements at the end to match the page count of both at 900. Now, to quote from above:

Step 1: update Commons Yes check.svg Done
Step 2: Pages /271 and /272 will have to be move temporarily at the end, to leave space to move range(/153 ... /270) to range (/155 .../272). Then The two images can be moved to the freed space for /153-/154.

Now, all I will have to do is correct the links related to pages between 153 and 272. The rest will align perfectly. Thanks in advance. — Ineuw talk 19:51, 25 March 2015 (UTC)

@Ineuw:, something wrong in Step 2. I see (blk+image) in djvu/155-156 and not in djvu/153-154. Is this supposed to be like this?--Mpaa (talk) 21:04, 25 March 2015 (UTC)
There are also other blk-image couples at djvu/301-302 and djvu/471-472 that needs to be considered. And in this copy, current image Page:Popular Science Monthly Volume 32.djvu/851 and blk looks missing. Please look into it and re-specify the different (djvu/begin, djvu/end, +offset) moves.--Mpaa (talk) 21:41, 25 March 2015 (UTC)
In general, check image presence position in the new copy, as I think some are either moved or missing.--Mpaa (talk) 21:49, 25 March 2015 (UTC)
You are right. I must figure out how to correct the damned thing. Page:Popular Science Monthly Volume 32.djvu/851 seems to be missing from the new volume. Please let me study it again what went wrong. — Ineuw talk 04:54, 26 March 2015 (UTC)
D850 (Blank page) and the D851 (Image of David Ames Welles) should be D737+D738 before the first article of the month (April 1888) Currently they are at D741+D742.— Ineuw talk 05:05, 26 March 2015 (UTC)
This is the only way I can see it done. Pages D271 and D272 is to be moved to the end and then everything is to be moved 2 pages up beginning with 155. This will open up 155 and 156 for the blank D271 and the image D272. I will check then everything until I find the next page move. Can we do it one move at a time over the next few days? As I mentioned earlier, I have no idea what is out of order. — Ineuw talk 08:27, 29 March 2015 (UTC)

@Mpaa: Unfortunately, can't oblige you with a list of From D# to D# until we fix the first problem - which is to insert two pages D155+D156 so that D271+D272 can be moved to D155+D156 where it belongs and the gap of D271+D272 is closed up. The overall problem is as follows:

  • A "standard" volume of PSM has 6 paired pages without page numbers. Volume 32 has 7 pairs. They are D271+D272, D301+D302, D412+D413, D447+D448, D471+D472, 595+596, D741+D742. The six portraits are ALWAYS appear as the last two pages of a monthly publication. It's the shift of 2 to 4 to 2 pages is what I can't figure out. If you can, fine, but I can't.
  • I inserted 2 empty pages at the beginning so that both volumes should begin at D11. The images not only shifted by two pages, but they are inserted altogether in the wrong places and unless I go step by step, I can't see my way through as to where the additional pages go.
  • At this point, I don't care about incorrect main namespace links, section tasks and anchors. I must correct my database from where these are generates from one central entry. The system generates the table of contents, the main namespace links and the sections to the pages and the Index with 400+ entries and their anchors. — Ineuw talk 07:00, 30 March 2015 (UTC)
If you want I can move them, but I really can't see how it will help you in your research. Even if you will see the text aligned with the scans up to D272, how will that help you with the following pages? Nothing will change from D273 upwards. I'll try to compare the 2 djvus to see if I can shed some light.--Mpaa (talk) 17:44, 30 March 2015 (UTC)
The +4 in some ranges after D469 is the combination of the misplaced portrait-blk pair, (when moved back -> contr. +2) and the insertion of D471+D472, (another +2 contr.).
If you download the 2 versions of djvus and compare, range old->new is relatively easy to spot. Regarding the 7 vs. 6 pairs , maybe D471+D472 was at the end of the volume as a larger folded page (see the marks on the page) and has been inserted in this position in this version of Vol. 32.--Mpaa (talk) 19:23, 30 March 2015 (UTC)
That was my assumption but when I get to this at 10pm at night, I no longer trust myself. Therefore, I leave it in your capable hands A Smiley.jpg.— Ineuw talk 20:47, 30 March 2015 (UTC)
What do you think of the position of these D471+D472? Shall they stay where they are in the new vol? Even if they break the 6-pair scheme?--Mpaa (talk) 11:53, 31 March 2015 (UTC)
D471+D472 should appear as they are in the new upload (not at the end). I mentioned the six pairs of portraits only as an example that they are the basic six pairs which are placed at the end of the monthly editions (up to volume 54). When I did the original organization, these were the ones that I found immediately, because of the blank image protector page. There are many other images in the volumes without page numbers, but not always accompanied by a blank page.— Ineuw talk 18:54, 31 March 2015 (UTC)
Page move should be OK now. If you want to save some work and you're not in a hurry, it will not take long to adapt a bot to replace sections, links, etc all over. Otherwise, proceed as you feel. I am sending you the old->new pages. Bye--Mpaa (talk) 22:28, 31 March 2015 (UTC)
Mpaa, Much thanks and my gratitude. I will do the section tags & anchors because I always check the main namespace pages. — Ineuw talk 22:47, 31 March 2015 (UTC)

A problem cropped up[edit]

Everything up to and including matches perfectly. From here on I am lost because the uploaded original has the correct page contents including D597 This is a duplicate OF THIS PAGE and the real D597 seems to be missing. I can resolve it by uploading the same copy again where the page is in the right place. I have been following the DjVu book pagination with this installed volume. — Ineuw talk 05:21, 1 April 2015 (UTC)

No clue. But the djvu file has the correct page. Let's wait and see if it is a cache issue.--Mpaa (talk) 06:43, 1 April 2015 (UTC)
I am sure it is a cache issue, these two links show two different pages, one wrong and one correct.
Bye--Mpaa (talk) 09:46, 1 April 2015 (UTC)
The second image differs in one minute detail, "1023px" and that "Bye" sounds so ominous. A Smiley.jpg. — Ineuw talk 18:01, 1 April 2015 (UTC)
"1023px" has probably forced a new thumbnail, and the correct page has been used, so I think we just need to wait for the "1024px" one to be refreshed. Sincerely A Smiley.jpg--Mpaa (talk) 19:08, 1 April 2015 (UTC)
Thanks. There is loads of info on the commons about refreshing djvu uploads such as this, but none of them worked. I just mention to you because I was curious, but will wait when it happens.A Smiley.jpgIneuw talk 20:31, 1 April 2015 (UTC)

PSM page cleaning[edit]

I don't know if the title list I emailed you was what you wanted. I also prepared a sample page for when you clean a two column page. My changes are in red half way down the page. The idea is that either you leave the empty row between the end of Column 1 and the beginning of column 2, or merge the last line of the left column with the first line of the right column. This helps me a lot to find a problematic text.— Ineuw talk 03:47, 23 March 2015 (UTC)

Hi. I do not go into that level of details when cleaning the text. I usually address items that can be globally cleaned over the set of pages.--Mpaa (talk) 09:26, 23 March 2015 (UTC)
My apologies, misstated my request. Would you mind leaving the empty row between the two columns when you remove the �? — Ineuw talk 19:27, 23 March 2015 (UTC)
As I said, I do it in one shot, as I dump all the pages in one file and 'replace all' in one command (I guess it could be hundreds of replacements to do).--Mpaa (talk) 21:23, 23 March 2015 (UTC)
Got it. No problem. — Ineuw talk 01:14, 24 March 2015 (UTC)

Categorizing by gender[edit]

If we're going to classify by gender, then we need to classify all authors, not just authors in one gender, as if the other gender was the default state.--Prosfilaes (talk) 00:49, 2 April 2015 (UTC)

Better still, don't create these categories. n.b. Your bot is not approved for categorizing Author pages this way. --EncycloPetey (talk) 02:32, 2 April 2015 (UTC)

(Sigh!) At just which point does somebody point out to the above pair of (functional) morons the fact Mpaa was merely being a "nice guy" respondent to this request? Hang your heads for raising this matter elsewhere than where the discussion belongs. For shame!
Now which one of you is going to be grown-up about this matter and restart the discussion appropriately; preferably even involving the original requester, user:Nonexyst?unsigned comment by (talk) .
I leave the discussion of this matter to @Nonexyst: in another place. I just helped with his request. Nobody also objected when I said I would have taken care of it. If it has to be undone, so be it.--Mpaa (talk) 06:55, 2 April 2015 (UTC)

bot: "(align formatting)"[edit]

Mpaa, in using the bot are all of those edits showing (align formatting) mistakes I made in validating? If so then something is wrong because they looked correct to me and I double-check before saving. —Maury (talk) 19:52, 6 May 2015 (UTC)

Hi. No, no problems, I wanted to align the left side note. It is just that it easier to process all pages in the same way than selecting only some pages.— Mpaa (talk) 19:57, 6 May 2015 (UTC)

Thanks for responding and further info on bot request[edit]

What a small wiki world! You may not recall, but you assisted me with my first two texts last fall when I first joined the WS project. Thanks again for that. For this topic I will respond more formally on the bot page (where I hope you'll notice my astute use of the emdash  –  :), but I just wanted to say hi and fill you in on the background to this request.

First, the OCR button: with all due respect, user Beeswaxcandle responded to my problems with the OCR button by explaining that that it activates an OCR routine that rescans the image and does not re-emit the underlying text in the DJVU file. I note that this sometimes corresponds to my experience of the button, but for me the button's behavior and even whether it appears on my toolbar is unpredictable, even with the appropriate widgets preference set, so I am unable to use it productively.

Try to set it up. It is all you need.— Mpaa (talk) 11:24, 15 May 2015 (UTC)

Second, the reason for tagging the underlying text, then uploading: I have reluctantly reached the conclusion that there are several types of correction best done on the whole volume because of the context needed to be able to make good decisions about running headers, titles, cross-page hyphenation, front v. back matter, etc., so this is my experiment along those lines. Again, Beeswaxcandle and I discussed this on my talk page if you're curious about the history.

One thing are the corrections on the whole volume, another is to embed wiki-sytax in the djvu file. I would discourage that,as if someone needs the Djvu from other purposes, the text layer will be full of useless stuff.— Mpaa (talk) 11:24, 15 May 2015 (UTC)

LBNL, I chose the Southern Historical Society Papers project because it appeared to have stalled but has a user (Maury) who was interested in making further progress on the series and has several volumes I am interested in seeing completed. For the number of pages involved, a mass reload (assuming my script on the djvu text works sufficiently well) would be far more efficient and make it possible to finish the SHS project this year.

Probably more than you wanted to know, but I thought extra detail might be helpful since you had previously assisted me. Feel free to reply on the bot page or my talk page, and thanks again, Dictioneer (talk) 00:17, 15 May 2015 (UTC)

Thanks for your quick response, both here and on the bot page. Here is my experience with the OCR button: I bring up a page in edit mode, click on "Proofread tools", and there is no OCR box. I click on "Preferences" at the top, then on "Gadgets." On the Gadgets page, in the second section, "Editing tools for Page: namespace", I click the checkbox " OCR: Enable OCR button Button in Page: namespace." and click on "Save" at the bottom of the page. I go back and reclick through from the Index space to the specific page. The OCR button doesn't appear. I go back to preferences/gadgets and in the section "Development (in beta)" I click on "Add a toolbox link to reload the current page with Resource Loader in debug mode." Lather, rinse, repeat, back to the specific page in question. Still no OCR button. I click on the "Debug" button in the left pane under "Tools." The OCR button usually appears, though not always. Sometimes I have to click on the EDIT link again, in which case the OCR button always appears. If I click on the OCR button, it reloads the text, but the text doesn't contain my most recent changes.
I should note I have tried this with Firefox on Ubuntu, Mac, and Windows, with Chrome on Ubuntu and Mac, and with Internet Explorer on Windows. If you have a fix for this, or debugging advice, or can point me to a resource that will help me sort it out, that would be great. I've also copied in common.js and common.css text from user Beeswaxcandle so that my setup was as close to his/hers as possible. If you think it would help to copy in your script source, I'm happy to try that.
In my experience, the only way that reliably reloads my updated text and formatting from the djvu source file is going to the testpage Beeswaxcandle has deleted for me. There, the correct text shows up whether I have an OCR button or not, and the OCR button (if I press it) reloads the updated text correctly in that circumstance. I have not saved this page since it is the only one I have that reliably works. This is the reason for my request for a mass delete of non-proofread pages previously uploaded by LA2-bot.
Thanks for any help you can provide or for pointing me to an appropriate resource to get this debugged. At the moment, the only workaround that gives the desired result is the one proposed by Beeswaxcandle. I am open to alternatives, but would need a link to the relevant bot, upload script or other documentation that would get me started. Also, once the text has been populated, I would be fine to upload a version of the djvu file which has all wiki-formatting removed, it's just difficult at the moment to see how to get to that point. Dictioneer (talk) 17:28, 15 May 2015 (UTC)
Try to copy my User:Mpaa/vector.js an d keep ypur preferences as simple as possible (in editing select Show edit toolbar and Enable enhanced editing toolbar.— Mpaa (talk) 18:47, 15 May 2015 (UTC)
No luck, I'm afraid. Here's what I did: A) went into preferences and hit the reset button, then verified the Show Edit & Enable Enhanced settings. B) created a blank vector.js page and copy over your source. C) exited the browser, restarted and logged in. No OCR button. Realized that I'd reset the OCR gadgets preference to off in my general reset, went to gadgets and re-enabled OCR. Went back to edit a page. Still no OCR. D) Went back to Gadgets and re-enabled the Debug setting under "Development". I am now back to the original behavior, which is if I click Edit, then Debug, I usually get the OCR button to show up. E) However, when I click the OCR button it will reload the text, but does not reload the most current of the text. F) I also went back and copied in your vector.css, common.js, and common.css just for thoroughness sake. Same result.
I think it might be illuminating to separate this problem into its less important and more important parts: the unpredictable appearance and disappearance of the button(annoying but less important), and what actually happens on a page when you press OCR. Let's assume that the OCR button appears reliably for you. How does it behave when you use it on these text pages? Here is the experiment to try: A) go to Index:Southern_Historical_Society_Papers_volume_35.djvu and click on Index page 19/file page 33, aka to see what happens. I get a page that warns me this page has been deleted by Beeswax and I should think seriously before recreating it. B) In the text itself you should see a "noinclude" running header and a hwe tag for possessor: this reflects the current updated djvu file. C) Now cancel out and go to page 20/34 (i.e., the next page), which still exists. You will see it displayed without any running header tag. D) Press OCR, and the page is refreshed with a running header but without the noinclude tags. This text is from an old version of the file, not the most current one from commons. E) Therefore, P. 19 (which Beeswax deleted) is correctly updated, p. 20 (not deleted) is not.
Did you get the same result? There are other differences on the page I could detail, but I assume one missing change is enough for now. Thanks for taking the trouble to help me figure out what's going wrong. Dictioneer (talk) 14:45, 16 May 2015 (UTC)
This is the link that is called by OCR button [2], and then this is parsed. If you copied my settings, OCR should apper under "Proofread Tools".— Mpaa (talk) 19:43, 16 May 2015 (UTC)
Another issue is that OCR considers all the text as part of the body, so I guess it will not include Template:Rh... in the header.— Mpaa (talk) 19:56, 16 May 2015 (UTC)
Unfortunately, OCR still only occasionally appears but generally doesn't. The noinclude tag was included in a revision at Beeswax's suggestion, apparently he and Maury have a hot-key that activates and .js routine that will take the running-header and put it in the header box. In any case, other changes in the text underlying the djvu file also do not appear, not just the running-header noinclude tag.
I can provide other details of what's not appearing if that's your preference, but personally I would suggest that you proceed with the deletion algorithm you propose on the bot page and that we resume trying to chase down the source of this problem at some point in the future. The problem seems important to me since users who edit text and re-upload djvu files will have some of their changes appear and some not for no apparent reason. However, this is clearly a difficult and intermittent problem, so a brief break from chasing it may be good for all involved. Let me know if there's any help I can provide on the revised deletion/reloading based on LA2-bot being the most recent updater of the page. Dictioneer (talk) 20:59, 16 May 2015 (UTC)
I'll try to upload the latest text-layer, I need to customise a script first. But trust me, updating the text layer in the djvu file is a viable option if and only if a new OCR-process will be reapplied to the file. All the rest can be done working directly on text files, without bothering changing the djvu. And also dividing header, body and footer is not trivial. If you want to apply this approach in the future, try to stick to this file format to divide the different pages: Note that this is not done to handle the Proofread Page format, so if you want to define headers and footers, try to mark them in the text with a convention that makes it easy to recognise them (e.g. @@HEADER_START@@all the header text@HEADER_END@@ or similar). It will make the rest of the process easier. There is WIP on the bot side to handle this in an easy way in the future.— Mpaa (talk) 21:12, 16 May 2015 (UTC)
One more thing. The advantage is tha once you have the file done, you can apply all the text improvements you want with a text editor, working off-line and using search and replace patterns per file instead of per page. And upload only the final step of all your improvements.— Mpaa (talk) 21:16, 16 May 2015 (UTC)
BTW, your syntax for {{hwe}} is wrong. See Page:Southern_Historical_Society_Papers_volume_35.djvu/86. {{hwe|con|fidence.}} should be {{hwe|fidence.|confidence.}}
Good catch, I've updated my script accordingly. I may start a new topic with questions about the mediawiki link above, but you've given me a huge amount of help already, so I'll let you get back to your own texts for awhile before I bug you again. Thanks so much, and if there's anything I can do for you in return, just let me know. Dictioneer (talk) 22:06, 17 May 2015 (UTC)