User talk:Mpaa/Archives/2015

From Wikisource
Jump to navigation Jump to search
Warning Please do not post any new comments on this page.
This is a discussion archive first created in , although the comments contained were likely posted before and after this date.
See current discussion or the archives index.

New Proposal Notification - Replacement of common main-space header template

Announcing the listing of a new formal proposal recently added to the Scriptorium community-discussion page, Proposals section, titled:

Switch header template foundation from table-based to division-based

The proposal entails the replacement of the current Header template familiar to most with a structurally redesigned new Header template. Replacement is a needed first step in series of steps needed to properly address the long time deficiencies behind several issues as well as enhance our mobile device presence.

There should be no significant operational or visual differences between the existing and proposed Header templates under normal usage (i.e. Desktop view). The change is entirely structural -- moving away from the existing HTML all Table make-up to an all Div[ision] based one.

Please examine the testcases where the current template is compared to the proposed replacement. Don't forget to also check Mobile Mode from the testcases page -- which is where the differences between current header template & proposed header template will be hard to miss.

For those who are concerned over the possible impact replacement might have on specific works, you can test the replacement on your own by entering edit mode, substituting the header tag {{header with {{header/sandbox and then previewing the work with the change in place. Saving the page with the change in place should not be needed but if you opt to save the page instead of just previewing it, please remember to revert the change soon after your done inspecting the results.

Your questions or comments are welcomed. At the same time I personally urge participants to support this proposed change. -- George Orwell III (talk) 02:04, 13 January 2015 (UTC)

Mining data from the quarry

Hi, and thanks for the link to the data query page and the link to the database schemas. Managed to modify and run my query (it was written for MySQL and the current database is dbMaria), but I got some unrelated categories and garbage, which means that the category links are incorrectly defined in the SQL statement. I'll keep at it which also means I have delve into the SQL of dbMaria which is a bit different that MySQL. Must assume that the switch to the new DBSM was because Oracle owns MySQL. — Ineuw talk 00:07, 7 January 2015 (UTC)

Welcome. I am no expert on databases but I use MySQL workbench to connect to dB (if this can be useful to you).--Mpaa (talk) 19:28, 7 January 2015 (UTC)
Yes it is useful and I will reinstall it. I used to use that a long time ago, (6 years) but that is not the problem.. . . I removed the relational constraints to the categorylinks, and ran a simple "SELECT" query to see if it can select just the titles. There are nearly 6000 titles in the PSM main namespace and it selected only about 2,200. When you have the chance, could you run this simple query for PSM article titles for the main namespace?
SELECT enwikisource_p.page.page_title 
FROM enwikisource_p.page
WHERE enwikisource_p.page.page_title 
Like ('Popular_Science_Monthly%')
AND enwikisource_p.page.page_namespace = 0

Ineuw talk 19:57, 7 January 2015 (UTC)

Same here, weird ...:
MariaDB [enwikisource_p]> SELECT COUNT(*) FROM enwikisource_p.page WHERE enwikisource_p.page.page_title  Like ('Popular_Science_Monthly/%s') AND enwikisource_p.page.page_namespace = 0;
+----------+
| COUNT(*) |
+----------+
|     2206 |
+----------+
1 row in set (0.01 sec)
OK, I got it. It should be something wrong in the LIKE specs. %s must discar something(apostrphes?, unicode? whatever ...?)
MariaDB [enwikisource_p]> SELECT COUNT(*)  FROM enwikisource_p.page WHERE enwikisource_p.page.page_title  Like ('Popular_Science_Monthly%') AND enwikisource_p.page.page_namespace = 0;
+----------+
| COUNT(*) |
+----------+
|     8933 |
+----------+
1 row in set (0.01 sec)

--Mpaa (talk) 21:08, 7 January 2015 (UTC)

Thanks. I will figure out what's going on from the 2nd recordset — Ineuw talk 21:12, 7 January 2015 (UTC)

Hi again, using the 2nd SQL statement, I extracted all the titles and everything matches up to my Access database (8016). The additional entries are all redirects pointing to Obituaries and Articles - but without the {{ROOTPAGENAME}}. of PSM.
The issue is to check the link to the categories by testing some records of "categorylink" layout because my original copy of the SQL statement from 2 years ago mentions a fieldname link which no longer exists, or rather it has been renamed. It is just a matter of some detective work.
The lazy way is to print the field lists of each table in question from MariaDB, extract a few complete records from each, recreate the structures in MSAccess and see what I get. I have used MSAccess as a query design tool (very sophisticated, the best I've ever come across) and as a graphical front end to connect to MySQL. The two basic SQL differences that I remember is that MSAccess uses '*' instead of '%' to indicate everything, and 'constant strings' in a MYSQL statement can only be enclosed with single quotes, while MSAccess accepts both single or double qutes. P.S.: I always wondered who is/was Maria. — Ineuw talk 04:12, 11 January 2015 (UTC)
I think the issue is in the search pattern. This should do the trick.
SELECT 
    enwikisource_p.page.page_title,
    enwikisource_p.categorylinks.cl_to
FROM
    enwikisource_p.page
        JOIN
    enwikisource_p.categorylinks ON enwikisource_p.page.page_id = enwikisource_p.categorylinks.cl_from
WHERE
    enwikisource_p.page.page_title REGEXP 'Popular_Science_Monthly/Volume_.*'
        AND enwikisource_p.page.page_namespace = 0;
Bye--Mpaa (talk) 09:40, 11 January 2015 (UTC)
Thanks. I created and executed a similar statement successfully which resulted in an accurate list of the titles, but no categories. After studying the schema, I concluded that the categories table is missing from the SQL. The original of this query created a temporary table with the article titles and the link # and then linked this to the categories.
USE enwikisource_p;
SELECT enwikisource_p.page.page_title, categorylinks.cl_to 
 FROM categorylinks INNER JOIN enwikisource_p.page 
  ON enwikisource_p.page.page_id = categorylinks.cl_from WHERE categorylinks.cl_to LIKE 
  ('Popular_Science_Monthly_Volume%') AND page.page_namespace = 0;

Ineuw talk 20:17, 11 January 2015 (UTC)

Finally got it

Extracted the structure of each table and then extracted a couple of 100 records from each table and figured out what is happening and how the info is stored. Below is the correct SQL statement - yielded some 27,000 records. :-). Thanks again for your guidance. BTW, SQL is easier than it looks, only the table JOINs are a bit tricky.

USE enwikisource_p;
SELECT page.page_title, categorylinks.cl_to 
 FROM categorylinks INNER JOIN page 
  ON page.page_id = categorylinks.cl_from WHERE page.page_title LIKE 
  ('Popular_Science_Monthly_Volume%') AND page.page_namespace = 0;

Ineuw talk 16:43, 21 January 2015 (UTC)

Good. I think the JOIN statement is in this case symmetric, so it should be equivalent to the above. I also got 27000+ pages.
The syntax for asymmetric JOIN is where I am a bit weak but if you really want to get comfortable, use the MSAccess query designer to create asymmetric JOIN. It's the best visual designer I've seen anywhere, and the SQL is easy to convert to MariaDB. — Ineuw talk 05:45, 22 January 2015 (UTC)

{{nop}} vs <nowiki />'s

Hello!

What is reason to use {{nop}} rather than <nowiki /> in pagebreaks? As for me, it looks equally on HTML and on epub, but the latter doesn’t spoil plain text with any html-tags, and it’s used in French Wikisource.

And what is this? Is this inclusion of a couple of pages really needed? For me, it looks like a trash.

Best regards, Nonexyst (talk) 21:29, 13 February 2015 (UTC)

That is what is the recommended way here to break pages, see Help:Formatting_conventions. If you think that should be the good way, I encourage you to post it here Wikisource:Scriptorium. As far as the page above, that is a mistake, will fix it. A mistake might happen, call it a trash sounds a bit harsh. Was just trying to be helpful, in the future I'll stay away ...--Mpaa (talk) 21:51, 13 February 2015 (UTC)
Well, maybe, I’ll propose it there. Sorry if it sounds harsh, I’m not a native English speaker, so it can raise some misunderstading.Nonexyst (talk) 22:09, 13 February 2015 (UTC)

PSM Obituary Notes

Ciao. There are anchors placed in the Obituary Notes section of this page. Can you recall where they are anchored to? Ineuw (talk) 02:11, 25 February 2015 (UTC)

By context may I hazard a guess: Popular Science Monthly/Volume 38/December 1890/The Identity of Light and Electricity 124.183.124.235 07:14, 25 February 2015 (UTC)
Not really, Mpaa started to organize something and the topic ended up as a major discussion in the Scriptorium about creating a separate section for obituaries. My only contribution was that obits in PSM appear everywhere (any section), otherwise I wasn't involved and don't know what happened to it. Ineuw (talk) 09:24, 25 February 2015 (UTC)
In this page: Author:Heinrich Hertz: Obituary in "Obituary Notes" in Popular Science Monthly, 44 (April 1894)--Mpaa (talk) 18:09, 25 February 2015 (UTC)
Thanks Mpaa, (I should have clicked on the link). But, there are two problems: The other end of the anchor is directed back to the Obituary section but not to the paragraph, and the index of this volume also contains an obit section for which the anchor is generated automatically. I guess my only option is to use two anchors. Ineuw (talk) 05:27, 26 February 2015 (UTC)
Found it, Popular Science Monthly/Volume 44/April 1894/Obituary: Heinrich Rudolf Hertz. There was some debate about this. Proceed as you think is better, I will not oppose if you decide to change the approach.--Mpaa (talk) 07:43, 26 February 2015 (UTC)
Thanks. I will figure something out when I do the indexes with obituaries. Ineuw (talk) 18:03, 26 February 2015 (UTC)

A definition

The is no greater humbling experience than that of revisiting one's old proofreading. Ineuw (talk) 05:26, 2 March 2015 (UTC)

:-)--Mpaa (talk) 12:08, 2 March 2015 (UTC)
By the way, the pages I worked on yesterday in Volume 35 were for demo purposes for Zoeannl. Ineuw (talk) 18:10, 2 March 2015 (UTC)

POM accountancy?

Hi Mpaa.

I took the liberty of copy/mutilating one of your queries to produce a more succinct summary of edit activity. In case it is of any use to you, here it is + current output. (Of course this ought to work as well but I am unclear as to whether that forces a query re-execution when viewed by a (different) user: (Interpolated semi-snarky note: I have since discovered it does! But only if you are logged in to Quarry. If you are logged out it shows the prior run result set. Tool still in development mentality?

SELECT 
    r.rev_user_text,
    p1.page_title,
    count(p2.page_id) as 'Edit Count'
FROM
    enwikisource_p.page p1
        INNER JOIN
    enwikisource_p.pagelinks pl ON p1.page_id = pl.pl_from
        INNER JOIN
    enwikisource_p.page p2 ON p2.page_title = pl_title
        INNER JOIN
    enwikisource_p.revision r ON p2.page_id = r.rev_page
WHERE
    p1.page_title in ('A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu')
        AND p1.page_namespace = 106 /* N.B. INDEX namespace! */
        AND (r.rev_timestamp > '20150201000000'
        AND r.rev_timestamp < DATE_FORMAT(NOW(), 'YYmmddHHiiss'))
GROUP BY page_title,rev_user_text;
Output
rev_user_text page_title Edit Count
73.15.161.127 A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 1
Acalycine A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 1
Aphillipsmusique A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 64
BD2412 A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 2
Beleg Tâl A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 1
Billinghurst A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 1
CMRBuck A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 2
Curly Turkey A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 11
Dick Bos A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 42
Duckias A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 15
Einstein95 A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 4
EncycloPetey A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 16
Hazmat2 A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 39
Hesperian A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 1
HueSatLum A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 1
Jimregan A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 4
John Carter A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 5
Keith Edkins A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 6
Legofan94 A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 22
Meggos703 A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 1
Michael Barera A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 13
Moondyne A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 44
Nonexyst A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 2
Pathore A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 8
Pixelwarrior A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 10
Prtksxna A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 24
Siddhant A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 9
Slowking4 A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 10
Spiros790 A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 44
TroyGab A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 3
William Maury Morris II A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 5
Wurstmaster A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 1
Zyephyrus A_Portrait_of_the_Artist_as_a_Young_Man_(Huebsch_1916).djvu 3
Showing 1 to 33 of 33 entries

Now of course (unless I have made a mistake) this output ought to be an accurate summary of your own version of the query. This is only an ideas investigation for pulling out the detail I thought might be most relevant, along with using the p1.page_title IN clause (strictly not necessary—so far—for this month) as an indicator of how future multi-work month edit summaries might be more easily prepared.

Cheers, AuFCL (talk) 05:32, 6 March 2015 (UTC)

Hi. No problem. Mine was just a first attempt to see if I could make life easier for BWC in POTM handling.--Mpaa (talk) 10:25, 6 March 2015 (UTC)
Good to hear I am not totally wasting your time!

I evolved the query a little further (Quarry 2472) to produce a further breakdown of "interesting" name spaces the edits affected (Currently: Page:, Index: and Main/transclusion.) As a fail-safe I included a count of "other" edits not classified and am gratified/a little surprised that Nonexst's edit of Wikisource talk:Proofread of the Month was included in the count.

Is this useful and/or per your expectations? Does it reveal any shortcomings with which I ought to be concerned? Right at this point I consider this is about the sensible point to stop fiddling but of course your requirements may vary considerably from my expectations (and of course thank you for getting the basic "associated page" logic core of this thing sorted out in the first place!)

Regards, AuFCL (talk) 23:14, 6 March 2015 (UTC)

Looks good to me. @Beeswaxcandle:, please fill in if you have more needs. I think Nonexst's edit is actually Author:Robert_Mallet, the other page link you stated does not 'leave' the index page [1].--Mpaa (talk) 19:14, 7 March 2015 (UTC)
Mea culpa. You are absolutely right regarding the Robert_Mallet edit. In any case it still demonstrates the point that this SQL-based approach perhaps "reaches" just a little farther than naïve expectation might otherwise suggest.

I am inclined to add a further AND edp.page_namespace = 104 /* N.B. PAGE: namespace! */ restriction to the WHERE clause (so that only actual Page:-space edits are counted, but feel the payback is not worth the load of running another test.

Besides I have (if you will but take my word for it!) trialled this precise approach elsewhere. AuFCL (talk) 23:25, 7 March 2015 (UTC)

┌─────────────┘
As a compleat aside, did you mean something like this instead of the Special:WhatLeavesHere link above, as the latter now seems to lead me to a No such special page warning (even though I have the Gadget selected et al.) Is this worth chasing up, or perhaps is an already-known issue? AuFCL (talk) 23:45, 7 March 2015 (UTC)

No, I actually meant that link and for me it works and shows some links. It was to prove that it could not be Wikisource talk:Proofread of the Month, that is in namespace=5, as there are no links leaving the Index page towards ns=5, while there are towards ns=102 (Author), see https://en.wikisource.org/wiki/Special:WhatLeavesHere?target=Index%3AGreat+Neapolitan+Earthquake+of+1857.djvu&namespace=102. Strange it does not work for you, try 'What leaves here' from the Index Page (if you feel like chasing this ...). Otherwise I am satified with your query.--Mpaa (talk) 00:20, 8 March 2015 (UTC)
Thanks for double-checking. (And yes, I realised I had settled on the wrong page. Not quite sure in hindsight why I picked that one. Alzheimers setting in?)

Regarding this whole side issue (sigh) please forget it. Turned out to be a problem of my own making: the so-called RequestPolicy extension to Firefox installed here was blocking wmflabs.org page links. (Thanks for forcing me to look into this properly. Security is a pain; the lack of it is also a pain. If this is telling us something fundamental regarding computers, networking and masochism, I really wish I didn't have to think about it.)

Regards, AuFCL (talk) 02:01, 8 March 2015 (UTC)

PSM page cleaning

I don't know if the title list I emailed you was what you wanted. I also prepared a sample page for when you clean a two column page. My changes are in red half way down the page. The idea is that either you leave the empty row between the end of Column 1 and the beginning of column 2, or merge the last line of the left column with the first line of the right column. This helps me a lot to find a problematic text.— Ineuw talk 03:47, 23 March 2015 (UTC)

Hi. I do not go into that level of details when cleaning the text. I usually address items that can be globally cleaned over the set of pages.--Mpaa (talk) 09:26, 23 March 2015 (UTC)
My apologies, misstated my request. Would you mind leaving the empty row between the two columns when you remove the �? — Ineuw talk 19:27, 23 March 2015 (UTC)
As I said, I do it in one shot, as I dump all the pages in one file and 'replace all' in one command (I guess it could be hundreds of replacements to do).--Mpaa (talk) 21:23, 23 March 2015 (UTC)
Got it. No problem. — Ineuw talk 01:14, 24 March 2015 (UTC)

What's the procedure to move pages?

Page:Popular Science Monthly Volume 32.djvu/271 should be djvu 153 and Page:Popular Science Monthly Volume 32.djvu/272 should be djvu 154. My question is - should I upload to the Commons the correct copy and replace the current? and ask you to make the pages adjustment? I am unclear about the process except that I don't want to loose any proofread pages.— Ineuw talk 19:00, 21 March 2015 (UTC)

Step 1: update Commons

Step 2: Pages /271 and /272 will have to be move temporarily at the end, to leave space to move range(/153 ... /270) to range (/155 .../272). Then The two images can be moved to the freed space for /153-/154.

Step 3: Then, all indexing in <pages ... > in Main need to be updated accordingly.

This can be done, it should not be a big issue. It is the standard bulk move procedure.
To be noted: All conventions you are using for sections, images and so on will be not valid any longer (but they will still work). E.g. "File:PSM V32 D170 American dredge.jpg" will now be in page D172. Same for section tagging, etc. It will take some additional processing to re-align them again (plus moving images at commons, etc.). Never thought about it in details, but it should be solvable.
Hope it is clear.--Mpaa (talk) 19:27, 21 March 2015 (UTC)
Eminently clear. Realigning the section tags is no problem because I know where they are. I will also modify my database accordingly and regenerate the pages with the sections. As for the images, it is not important for now. Thank you.— Ineuw talk 01:21, 22 March 2015 (UTC)
I checked IA and bookmarked the 2 existing copies. One is our installed copy, and the second (which I have) is the accurate copy. Unfortunately, the difference between the two is not just the misplacement of two pages. There are blank pages and the title pages are different order from the start, as well as the page count is different because of additional blanks and advertisements. Although the printed page numbers are identical, it would be very difficult to reassemble. I decided to stick to the existing installed copy. — Ineuw talk 04:35, 22 March 2015 (UTC)
good copy from biodiveristy
our bad copy from university of toronto
As you prefer.--Mpaa (talk) 21:57, 22 March 2015 (UTC)
It's not a preference, just the realization that I would have to redo the complete volume The TOC, the Index etc. However, I am considering to move and rename this copy & the Index to Volume 32 Old.djvu, both here and on the commons, (if possible) and install the good copy as Index:Popular Science Monthly Volume 32.djvu, and then copy the proofread pages from one to another. I just don't want to overlook anything. Your comments are appreciated.— Ineuw talk 19:37, 23 March 2015 (UTC)
I do not think it will make a big difference in terms of effort. If you are not going to change the internal references (anchors, sections, etc.), you might as well move the whole page. If you want to update the references, you might as well do it on the moved pages. Moreover, all the history will be lost. I think if you have a mapping old->new page, it would be possible to update every instance of sections, TOC, Index etc., as you have kept a consistence notation (Dxx for anchors, Bxx, Exx for sections, etc.). If the page to move are a bit tangled, the only thing to worry about is a good startegy to move pages back and forth.--Mpaa (talk) 21:23, 23 March 2015 (UTC)

┌─────────────────────────────────┘
@Mpaa: My apologies for making you crazy. I finally figured out what has to be done to replace this book. The good copy I have — to match page 1 and match the page count — all I had to do is insert two blank pages to bring Page 1 = Djvu 11, and then, deleted from the WS copy eighteen pages of advertisements at the end to match the page count of both at 900. Now, to quote from above:

Step 1: update Commons Done
Step 2: Pages /271 and /272 will have to be move temporarily at the end, to leave space to move range(/153 ... /270) to range (/155 .../272). Then The two images can be moved to the freed space for /153-/154.

Now, all I will have to do is correct the links related to pages between 153 and 272. The rest will align perfectly. Thanks in advance. — Ineuw talk 19:51, 25 March 2015 (UTC)

@Ineuw:, something wrong in Step 2. I see (blk+image) in djvu/155-156 and not in djvu/153-154. Is this supposed to be like this?--Mpaa (talk) 21:04, 25 March 2015 (UTC)
There are also other blk-image couples at djvu/301-302 and djvu/471-472 that needs to be considered. And in this copy, current image Page:Popular Science Monthly Volume 32.djvu/851 and blk looks missing. Please look into it and re-specify the different (djvu/begin, djvu/end, +offset) moves.--Mpaa (talk) 21:41, 25 March 2015 (UTC)
In general, check image presence position in the new copy, as I think some are either moved or missing.--Mpaa (talk) 21:49, 25 March 2015 (UTC)
You are right. I must figure out how to correct the damned thing. Page:Popular Science Monthly Volume 32.djvu/851 seems to be missing from the new volume. Please let me study it again what went wrong. — Ineuw talk 04:54, 26 March 2015 (UTC)
D850 (Blank page) and the D851 (Image of David Ames Welles) should be D737+D738 before the first article of the month (April 1888) Currently they are at D741+D742.— Ineuw talk 05:05, 26 March 2015 (UTC)
This is the only way I can see it done. Pages D271 and D272 is to be moved to the end and then everything is to be moved 2 pages up beginning with 155. This will open up 155 and 156 for the blank D271 and the image D272. I will check then everything until I find the next page move. Can we do it one move at a time over the next few days? As I mentioned earlier, I have no idea what is out of order. — Ineuw talk 08:27, 29 March 2015 (UTC)

┌─────────────────────────────────┘
@Mpaa: Unfortunately, can't oblige you with a list of From D# to D# until we fix the first problem - which is to insert two pages D155+D156 so that D271+D272 can be moved to D155+D156 where it belongs and the gap of D271+D272 is closed up. The overall problem is as follows:

  • A "standard" volume of PSM has 6 paired pages without page numbers. Volume 32 has 7 pairs. They are D271+D272, D301+D302, D412+D413, D447+D448, D471+D472, 595+596, D741+D742. The six portraits are ALWAYS appear as the last two pages of a monthly publication. It's the shift of 2 to 4 to 2 pages is what I can't figure out. If you can, fine, but I can't.
  • I inserted 2 empty pages at the beginning so that both volumes should begin at D11. The images not only shifted by two pages, but they are inserted altogether in the wrong places and unless I go step by step, I can't see my way through as to where the additional pages go.
  • At this point, I don't care about incorrect main namespace links, section tasks and anchors. I must correct my database from where these are generates from one central entry. The system generates the table of contents, the main namespace links and the sections to the pages and the Index with 400+ entries and their anchors. — Ineuw talk 07:00, 30 March 2015 (UTC)
If you want I can move them, but I really can't see how it will help you in your research. Even if you will see the text aligned with the scans up to D272, how will that help you with the following pages? Nothing will change from D273 upwards. I'll try to compare the 2 djvus to see if I can shed some light.--Mpaa (talk) 17:44, 30 March 2015 (UTC)
The +4 in some ranges after D469 is the combination of the misplaced portrait-blk pair, (when moved back -> contr. +2) and the insertion of D471+D472, (another +2 contr.).

If you download the 2 versions of djvus and compare, range old->new is relatively easy to spot. Regarding the 7 vs. 6 pairs , maybe D471+D472 was at the end of the volume as a larger folded page (see the marks on the page) and has been inserted in this position in this version of Vol. 32.--Mpaa (talk) 19:23, 30 March 2015 (UTC)
That was my assumption but when I get to this at 10pm at night, I no longer trust myself. Therefore, I leave it in your capable hands .— Ineuw talk 20:47, 30 March 2015 (UTC)
What do you think of the position of these D471+D472? Shall they stay where they are in the new vol? Even if they break the 6-pair scheme?--Mpaa (talk) 11:53, 31 March 2015 (UTC)
D471+D472 should appear as they are in the new upload (not at the end). I mentioned the six pairs of portraits only as an example that they are the basic six pairs which are placed at the end of the monthly editions (up to volume 54). When I did the original organization, these were the ones that I found immediately, because of the blank image protector page. There are many other images in the volumes without page numbers, but not always accompanied by a blank page.— Ineuw talk 18:54, 31 March 2015 (UTC)
Page move should be OK now. If you want to save some work and you're not in a hurry, it will not take long to adapt a bot to replace sections, links, etc all over. Otherwise, proceed as you feel. I am sending you the old->new pages. Bye--Mpaa (talk) 22:28, 31 March 2015 (UTC)
Mpaa, Much thanks and my gratitude. I will do the section tags & anchors because I always check the main namespace pages. — Ineuw talk 22:47, 31 March 2015 (UTC)

A problem cropped up

Everything up to and including matches perfectly. From here on I am lost because the uploaded original has the correct page contents including D597 This is a duplicate OF THIS PAGE and the real D597 seems to be missing. I can resolve it by uploading the same copy again where the page is in the right place. I have been following the DjVu book pagination with this installed volume. — Ineuw talk 05:21, 1 April 2015 (UTC)

No clue. But the djvu file has the correct page. Let's wait and see if it is a cache issue.--Mpaa (talk) 06:43, 1 April 2015 (UTC)
I am sure it is a cache issue, these two links show two different pages, one wrong and one correct.
Bye--Mpaa (talk) 09:46, 1 April 2015 (UTC)
The second image differs in one minute detail, "1023px" and that "Bye" sounds so ominous. . — Ineuw talk 18:01, 1 April 2015 (UTC)
"1023px" has probably forced a new thumbnail, and the correct page has been used, so I think we just need to wait for the "1024px" one to be refreshed. Sincerely --Mpaa (talk) 19:08, 1 April 2015 (UTC)
Thanks. There is loads of info on the commons about refreshing djvu uploads such as this, but none of them worked. I just mention to you because I was curious, but will wait when it happens.Ineuw talk 20:31, 1 April 2015 (UTC)

Categorizing by gender

If we're going to classify by gender, then we need to classify all authors, not just authors in one gender, as if the other gender was the default state.--Prosfilaes (talk) 00:49, 2 April 2015 (UTC)

Better still, don't create these categories. n.b. Your bot is not approved for categorizing Author pages this way. --EncycloPetey (talk) 02:32, 2 April 2015 (UTC)

(Sigh!) At just which point does somebody point out to the above pair of (functional) morons the fact Mpaa was merely being a "nice guy" respondent to this request? Hang your heads for raising this matter elsewhere than where the discussion belongs. For shame!
Now which one of you is going to be grown-up about this matter and restart the discussion appropriately; preferably even involving the original requester, user:Nonexyst?unsigned comment by 121.216.176.77 (talk) .
I leave the discussion of this matter to @Nonexyst: in another place. I just helped with his request. Nobody also objected when I said I would have taken care of it. If it has to be undone, so be it.--Mpaa (talk) 06:55, 2 April 2015 (UTC)

bot: "(align formatting)"

Mpaa, in using the bot are all of those edits showing (align formatting) mistakes I made in validating? If so then something is wrong because they looked correct to me and I double-check before saving. —Maury (talk) 19:52, 6 May 2015 (UTC)

Hi. No, no problems, I wanted to align the left side note. It is just that it easier to process all pages in the same way than selecting only some pages.— Mpaa (talk) 19:57, 6 May 2015 (UTC)

Thanks for responding and further info on bot request

What a small wiki world! You may not recall, but you assisted me with my first two texts last fall when I first joined the WS project. Thanks again for that. For this topic I will respond more formally on the bot page (where I hope you'll notice my astute use of the emdash  –  :), but I just wanted to say hi and fill you in on the background to this request.

First, the OCR button: with all due respect, user Beeswaxcandle responded to my problems with the OCR button by explaining that that it activates an OCR routine that rescans the image and does not re-emit the underlying text in the DJVU file. I note that this sometimes corresponds to my experience of the button, but for me the button's behavior and even whether it appears on my toolbar is unpredictable, even with the appropriate widgets preference set, so I am unable to use it productively.

Try to set it up. It is all you need.— Mpaa (talk) 11:24, 15 May 2015 (UTC)

Second, the reason for tagging the underlying text, then uploading: I have reluctantly reached the conclusion that there are several types of correction best done on the whole volume because of the context needed to be able to make good decisions about running headers, titles, cross-page hyphenation, front v. back matter, etc., so this is my experiment along those lines. Again, Beeswaxcandle and I discussed this on my talk page if you're curious about the history.

One thing are the corrections on the whole volume, another is to embed wiki-sytax in the djvu file. I would discourage that,as if someone needs the Djvu from other purposes, the text layer will be full of useless stuff.— Mpaa (talk) 11:24, 15 May 2015 (UTC)

LBNL, I chose the Southern Historical Society Papers project because it appeared to have stalled but has a user (Maury) who was interested in making further progress on the series and has several volumes I am interested in seeing completed. For the number of pages involved, a mass reload (assuming my script on the djvu text works sufficiently well) would be far more efficient and make it possible to finish the SHS project this year.

Probably more than you wanted to know, but I thought extra detail might be helpful since you had previously assisted me. Feel free to reply on the bot page or my talk page, and thanks again, Dictioneer (talk) 00:17, 15 May 2015 (UTC)

Thanks for your quick response, both here and on the bot page. Here is my experience with the OCR button: I bring up a page in edit mode, click on "Proofread tools", and there is no OCR box. I click on "Preferences" at the top, then on "Gadgets." On the Gadgets page, in the second section, "Editing tools for Page: namespace", I click the checkbox " OCR: Enable OCR button Button in Page: namespace." and click on "Save" at the bottom of the page. I go back and reclick through from the Index space to the specific page. The OCR button doesn't appear. I go back to preferences/gadgets and in the section "Development (in beta)" I click on "Add a toolbox link to reload the current page with Resource Loader in debug mode." Lather, rinse, repeat, back to the specific page in question. Still no OCR button. I click on the "Debug" button in the left pane under "Tools." The OCR button usually appears, though not always. Sometimes I have to click on the EDIT link again, in which case the OCR button always appears. If I click on the OCR button, it reloads the text, but the text doesn't contain my most recent changes.
I should note I have tried this with Firefox on Ubuntu, Mac, and Windows, with Chrome on Ubuntu and Mac, and with Internet Explorer on Windows. If you have a fix for this, or debugging advice, or can point me to a resource that will help me sort it out, that would be great. I've also copied in common.js and common.css text from user Beeswaxcandle so that my setup was as close to his/hers as possible. If you think it would help to copy in your script source, I'm happy to try that.
In my experience, the only way that reliably reloads my updated text and formatting from the djvu source file is going to the testpage Beeswaxcandle has deleted for me. There, the correct text shows up whether I have an OCR button or not, and the OCR button (if I press it) reloads the updated text correctly in that circumstance. I have not saved this page since it is the only one I have that reliably works. This is the reason for my request for a mass delete of non-proofread pages previously uploaded by LA2-bot.
Thanks for any help you can provide or for pointing me to an appropriate resource to get this debugged. At the moment, the only workaround that gives the desired result is the one proposed by Beeswaxcandle. I am open to alternatives, but would need a link to the relevant bot, upload script or other documentation that would get me started. Also, once the text has been populated, I would be fine to upload a version of the djvu file which has all wiki-formatting removed, it's just difficult at the moment to see how to get to that point. Dictioneer (talk) 17:28, 15 May 2015 (UTC)
Try to copy my User:Mpaa/vector.js an d keep ypur preferences as simple as possible (in editing select Show edit toolbar and Enable enhanced editing toolbar.— Mpaa (talk) 18:47, 15 May 2015 (UTC)
No luck, I'm afraid. Here's what I did: A) went into preferences and hit the reset button, then verified the Show Edit & Enable Enhanced settings. B) created a blank vector.js page and copy over your source. C) exited the browser, restarted and logged in. No OCR button. Realized that I'd reset the OCR gadgets preference to off in my general reset, went to gadgets and re-enabled OCR. Went back to edit a page. Still no OCR. D) Went back to Gadgets and re-enabled the Debug setting under "Development". I am now back to the original behavior, which is if I click Edit, then Debug, I usually get the OCR button to show up. E) However, when I click the OCR button it will reload the text, but does not reload the most current of the text. F) I also went back and copied in your vector.css, common.js, and common.css just for thoroughness sake. Same result.
I think it might be illuminating to separate this problem into its less important and more important parts: the unpredictable appearance and disappearance of the button(annoying but less important), and what actually happens on a page when you press OCR. Let's assume that the OCR button appears reliably for you. How does it behave when you use it on these text pages? Here is the experiment to try: A) go to Index:Southern_Historical_Society_Papers_volume_35.djvu and click on Index page 19/file page 33, aka https://en.wikisource.org/w/index.php?title=Page:Southern_Historical_Society_Papers_volume_35.djvu/33&action=edit&redlink=1 to see what happens. I get a page that warns me this page has been deleted by Beeswax and I should think seriously before recreating it. B) In the text itself you should see a "noinclude" running header and a hwe tag for possessor: this reflects the current updated djvu file. C) Now cancel out and go to page 20/34 (i.e., the next page), which still exists. You will see it displayed without any running header tag. D) Press OCR, and the page is refreshed with a running header but without the noinclude tags. This text is from an old version of the file, not the most current one from commons. E) Therefore, P. 19 (which Beeswax deleted) is correctly updated, p. 20 (not deleted) is not.
Did you get the same result? There are other differences on the page I could detail, but I assume one missing change is enough for now. Thanks for taking the trouble to help me figure out what's going wrong. Dictioneer (talk) 14:45, 16 May 2015 (UTC)
This is the link that is called by OCR button [2], and then this is parsed. If you copied my settings, OCR should apper under "Proofread Tools".— Mpaa (talk) 19:43, 16 May 2015 (UTC)
Another issue is that OCR considers all the text as part of the body, so I guess it will not include {{rh...}} in the header.— Mpaa (talk) 19:56, 16 May 2015 (UTC)
Unfortunately, OCR still only occasionally appears but generally doesn't. The noinclude tag was included in a revision at Beeswax's suggestion, apparently he and Maury have a hot-key that activates and .js routine that will take the running-header and put it in the header box. In any case, other changes in the text underlying the djvu file also do not appear, not just the running-header noinclude tag.
I can provide other details of what's not appearing if that's your preference, but personally I would suggest that you proceed with the deletion algorithm you propose on the bot page and that we resume trying to chase down the source of this problem at some point in the future. The problem seems important to me since users who edit text and re-upload djvu files will have some of their changes appear and some not for no apparent reason. However, this is clearly a difficult and intermittent problem, so a brief break from chasing it may be good for all involved. Let me know if there's any help I can provide on the revised deletion/reloading based on LA2-bot being the most recent updater of the page. Dictioneer (talk) 20:59, 16 May 2015 (UTC)
I'll try to upload the latest text-layer, I need to customise a script first. But trust me, updating the text layer in the djvu file is a viable option if and only if a new OCR-process will be reapplied to the file. All the rest can be done working directly on text files, without bothering changing the djvu. And also dividing header, body and footer is not trivial. If you want to apply this approach in the future, try to stick to this file format to divide the different pages: https://www.mediawiki.org/wiki/Manual:Pywikibot/pagefromfile.py. Note that this is not done to handle the Proofread Page format, so if you want to define headers and footers, try to mark them in the text with a convention that makes it easy to recognise them (e.g. @@HEADER_START@@all the header text@HEADER_END@@ or similar). It will make the rest of the process easier. There is WIP on the bot side to handle this in an easy way in the future.— Mpaa (talk) 21:12, 16 May 2015 (UTC)
One more thing. The advantage is tha once you have the file done, you can apply all the text improvements you want with a text editor, working off-line and using search and replace patterns per file instead of per page. And upload only the final step of all your improvements.— Mpaa (talk) 21:16, 16 May 2015 (UTC)
BTW, your syntax for {{hwe}} is wrong. See Page:Southern_Historical_Society_Papers_volume_35.djvu/86. {{hwe|con|fidence.}} should be {{hwe|fidence.|confidence.}}
Good catch, I've updated my script accordingly. I may start a new topic with questions about the mediawiki link above, but you've given me a huge amount of help already, so I'll let you get back to your own texts for awhile before I bug you again. Thanks so much, and if there's anything I can do for you in return, just let me know. Dictioneer (talk) 22:06, 17 May 2015 (UTC)

Have used pywikibot to upload v. 36

Hi Mpaa, thanks for pointing me at the upload script and the pywikibot ecosystem. After a bit of fumbling (including about 30 reverts -- power tools can be dangerous!:) I've managed to upload Volume 36 of SHSP and I'm pretty happy with the results. Two questions: first, I've produced a "clean" version of the underlying djvu file (no wiki-tags except for cross-page hyphenation), do you think it's worthwhile to upload the file to commons, or do you think the x-page hwe/hws tags are too much formatting as well? If it's still too much formatting, I can write a script to strip them, but am unsure what to change them to: an unhyphenated word at the bottom of the first page, an unhyphenated word at the top of the next page, or go back to how it was. I am happy to follow whatever direction you give on this. Second, should I create a separate bot account for running this script and register it, or is that unnecessary? Thanks in advance for this advice and at the risk of repeating myself, I really appreciate the technical help you've provided. Dictioneer (talk) 13:19, 29 May 2015 (UTC)

I would not touch the djvu text layer. If you extract the xml structure, each words has coordinates, etc. so I do not know how much value it has just to add the text. Regarding the bot, you need to ask to the community to grant you a bot flag for that task and usually create an account for that, see Wikisource:Scriptorium#BOT_approval_requests. And there must be a policy somewhere, should be easy to find but right now I do not have time. Bye— Mpaa (talk) 15:17, 29 May 2015 (UTC)

Strange thing

Hi Mpaa,

I don't really know my way around Wikisource. I made a small correction at Page:The Spell of the Yukon and Other Verses.djvu/60, changing "one" to "none" (there was none could place the stranger's face), which should have increased the size by one byte, but for some reason it went down by 113 bytes or something like that. Obviously something going on I don't understand. Would appreciate it if you'd take a look. --Trovatore (talk) 06:08, 16 June 2015 (UTC)

Hi. I do not know, some internal MW magic ... I wouldn't bother, your change looks fine anyhow.— Mpaa (talk) 07:36, 16 June 2015 (UTC)

A wikisource database question please

Post was moved to User talk:Ineuw#A wikisource database question please unsigned comment by Ineuw (talk) .

Apologies for stealing your topic, Mpaa! Please follow on as I expect your input/experience is going to be essential! AuFCL (talk) 03:31, 5 July 2015 (UTC)

Attempted a vallidation of this, but I'd appreciate a second view as whilst I've been very cautious, I'd like to be sure I've caught eveyrthing. Going to give it a second pass in any event. ShakespeareFan00 (talk) 19:51, 27 July 2015 (UTC)

Help with data extraction

Hi. I am asking you for help to extract some text data from the Wikisource text databases because so far I wasn't successful in achieving this goal.

The data I need is from this pageto this page of the first 105 characters of every paragraph. This may or may not contain already contain a wiki link and an anchor, but the necessary part of the text is the word "Page nnn" followed by a the reference number. From these I convert and create links and generate the anchors in the text pages.

Reference anchor:
{{fs90/s}}{{anchor|463-1}}[[Page:The Conquest of Mexico Volume 1.djvu/53#53-1|Page 9 (<sup>1</sup>)]].—

Main text source:
{{anchor|53-1}}[[Page:The Conquest of Mexico Volume 1.djvu/463#463-1|<sup>1</sup>]]

Not to confuse you, both ends are, or will be anchored and linked for convenience. — Ineuw talk 19:25, 25 August 2015 (UTC)

Done: User:Ineuw/Sandbox. Hope I got you right.— Mpaa (talk) 21:44, 25 August 2015 (UTC)
Perfect, many many thanks.— Ineuw talk 22:36, 25 August 2015 (UTC)
@Mpaa: Could you extract a new data set with the same parameters. Using the previous list I found a number of duplications which I corrected. Thanks in advance, and don't forget to bill me.— Ineuw talk 09:06, 29 August 2015 (UTC)
Done: User:Ineuw/Sandbox.— Mpaa (talk) 10:08, 29 August 2015 (UTC)
Much thanks :-) — Ineuw talk 18:22, 29 August 2015 (UTC)

Volume 2

@Mpaa: Could you please extract the data like above but for volume 2 and place it again in User:Ineuw/Sandbox? The page range is BEGINNING HERE and ENDING HERE. I am sure that after I made corrections I will come back for another data extraction. Thanks in advance.— Ineuw talk 03:50, 5 September 2015 (UTC)

done.— Mpaa (talk) 07:16, 5 September 2015 (UTC)
Thanks. — Ineuw talk 19:45, 5 September 2015 (UTC)

Categories by gender

Hi, there! There is a small problem in current version of author template with fetching gender data from wikidata. The problem arises when wikidata gender value is set to "unknown value" (see for example Author:C. E. Brewster). Cheers, Captain Nemo (talk) 01:10, 1 September 2015 (UTC).

Tried to fix it. I added Category:Author pages with unknown gender in Wikidata.— Mpaa (talk) 20:45, 1 September 2015 (UTC)

Publication year of Shakespeare's sonnets

Hi,

Back in 2012 you added |year=1598 to all of Shakespeare's sonnets (see eg. Sonnet 4 (Shakespeare)). As far as I know the sonnets were all first published in 1609, except 138 and 144 that had previously appeared, probably through piracy, in The Passionate Pilgrim in 1599). Is there any particular reason you have it down as 1598 here? --Xover (talk) 19:51, 10 September 2015 (UTC)

I just moved the year from being an explicit category to a parameter of the header template (which does automatic categorisation by year), see https://en.wikisource.org/w/index.php?title=Sonnet_4_%28Shakespeare%29&type=revision&diff=4047849&oldid=3745136 so I guess you need to find out who put Category:1598 works on the page.— Mpaa (talk) 18:35, 11 September 2015 (UTC)
Thanks. --Xover (talk) 15:51, 12 September 2015 (UTC)

Updated scripts

Hi Mpaa. I edited your common.js, Regexp toolbar.js, and works.js to update you to the latest version of TemplateScript. You were using a much older version called regex menu framework, so you should notice a lot of improvements. A few of the big changes:

  regex menu framework TemplateScript
regex editor ✓ an improved regex editor which can save your patterns for later use
compatibility unknown ✓ compatible with all skins and modern browsers
custom scripts limited ✓ much better framework for writing scripts
supported views edit ✓ add templates and scripts for any view (edit, block, protect, etc)
keyboard shortcuts ✓ add keyboard shortcuts for your templates and scripts

I also updated deprecated functions. Let me know if anything breaks. :) —Pathoschild 05:02, 12 September 2015 (UTC)

Hi. Thanks a lot!— Mpaa (talk) 08:00, 12 September 2015 (UTC)

use of djvutext.py

Hi. Are you using djvutext.py to populate pages? I tried

 python pwb.py djvutext.py -lang:en -family:wikisource -djvu:A_cyclopedia_of_American_medical_biography_vol._1.djvu -index:A_cyclopedia_of_American_medical_biography_vol._1.djvu

then it spat out a string of error messages. Before I start on the fuller diagnosis, wandering if you have used the tool, or whether it is compat mode, and I am bashing my head against the desk. Thanks. `— billinghurst sDrewth 15:23, 23 September 2015 (UTC)

There is bug. Will look into it.— Mpaa (talk) 17:01, 23 September 2015 (UTC)
Actually there were a couple. I submitted a patch. Hopefully will be merged soon (I hope ..., it is difficult to make a forecast on approval time). It is a good thing that someone starts using these scripts, as they have very few users and are either new or ported to core, so new bugs might pop up. @John Vandenberg: might help with the approval.— Mpaa (talk) 18:00, 23 September 2015 (UTC)
Merged. Hope it is OK now.— Mpaa (talk) 19:23, 23 September 2015 (UTC)
Thanks. I will have a go later. I sit in Freenodes's IRC #pywikibot so always happy to prod them for coding checks, just wasn't comfortable on a lesser used and specific script.

My plan is to get some of these biographical compilation works pushed through as that makes them findable in search, and may therefore get some people chipping away at them. [@Charles Matthews: FYI.] I am also thinking that we can set up some mini/sub-projects that may be self-sustaining if we get the parent guidance and components right; leveraging what we learnt from the DNB. Once that is going, I then want to look at some of those publications like Gentleman's Magazine, etc. which are otherwise referenced in this biographical works. I am not looking to use it on chaptered works, of either fiction or non-fiction. — billinghurst sDrewth 05:35, 24 September 2015 (UTC)

Potentially crossing the line with this question.

Hi again,

Not that your answer will prevent my support for you get the 'crat bit either way -- nor does it preclude the possibility that I'm being outlandishly cautious in my own little world of concerns -- but I'd feel better if I knew for sure; so here it goes...

Are you a resident of Australia?

I know how that may read but the ONLY reason I ask is I'm fairly sure the other 'higher-than-sysop' bit holders are residents of that great nation and I'm a bit concerned one good natural disaster there coupled with some local problem taking place at the same time here could present response issues given the right timing.

I do think you're well suited for the bit no matter how you answer (if at all - a simple 'y/n' will do) and plan to support you given better to have someone competent in place in spite of the slim likelihood of any such confluence of events taking place. Apologies in tenfold if you feel this kind of question is beyond something you should feel you ever need to answer never mind the seemingly 'poor-taste' in my asking of it in the first place may appear to some. All I'm trying to really establish is if there is any chance there may be a "gap" in observation & coverage while moving forward since WS is a 24 hour, 7 days a week endeavor and is mainly why I ask.

Sincerely. -- George Orwell III (talk) 20:59, 23 September 2015 (UTC)

No, in the "Old World" (EU). But there might be a "gap" all the same as I cannot commit to a daily supervision.— Mpaa (talk) 21:14, 23 September 2015 (UTC)
GOIII to note that we have one Oz CU, one Oz 'crat and we are as close together as the UK is to Russia; and I would think that we would be on very different information pipelines. The other CU, and the other 'crat are in the US, and I have no idea of their location to each other. — billinghurst sDrewth 06:03, 24 September 2015 (UTC)

Do you have a bot to auto add unproofed text from OCR?

I was wondering if was possible to look into mass-adding text for this Index:Ruffhead - The Statutes at Large - vol 3.djvu so that BD2412 and others can run automated cleanup scripts.ShakespeareFan00 (talk) 23:48, 23 September 2015 (UTC)

I am experimenting with Wikisource-bot, as a proof of concept at this stage, where I am targetting a few biographical works that users can: 1) find in a search, 2) may wish to fix for a biographical reference at WP, or 3) for a cross reference for a work here; as such they can dip in and out of as their time permits, and transclude in small parts, and I think that is justifiable. Our previous issues with chaptered works just being added and forgotten about was considered somewhere between valueless and innocuous.

I sense an eagerness for its use, and I would think that the general discussion would need to again be raised to what and where the community thought would be of value to apply non-proofread text; and how it would be envisaged that people are encouraged to proofread its text. In short, that it is being curated, and transcribed, not set and left. Also the benefit in applying the text layer in that form against the issues about having it placed but not progressing in proofread status. So let me see if I can get the tool working, and approved by the community for my plans, then we can look at other uses and tasks with solutions. — billinghurst sDrewth 05:57, 24 September 2015 (UTC)

Fwiw, I have the following workflow with pywikibot:
  • load non-existing pages with pywikibot: the 'preload' functionality will fetch the text layer from the djvu file for me.
  • save all pages in a file that can be used by pagefromfile.py
  • do the typical clean-up work offline (rh, typos, blanks in punctuation, etc. using text editors, offline scripts, etc. whatever is best in that case)
  • once the result is good enough, I bulk-upload it
  • pywikibot would benefit if it could read/write files containing pages (there is work on going about this).
Another option is to work directly online, interposing a clean-up function between fetching the (not existing) page and saving it.
Mpaa (talk) 18:04, 24 September 2015 (UTC)

Bot substitution & cleanup

Meant to thank you again for the bot work on War... No small favor. I am flying through the pages now, hoping I don't make too many errors as a result! Londonjackbooks (talk) 19:41, 28 September 2015 (UTC)

Happy of being helpful. Should you need help in future, just ask. In cases like this, with a small effort one can simplify a lot.— Mpaa (talk) 20:31, 28 September 2015 (UTC)

Crap!; or, What one says when pages are missing from one's book

Thought I'd ask here first; pages 278 & 279 are missing from War. I have images of the missing pages at the ready from another online version (same edition, different printing). The reason that pagination appeared to be squared away is because pages 296 & 297 repeat themselves after p. 297. All is well with pages after that. Do you know how this can be fixed? Sorry, and Thanks! Londonjackbooks (talk) 19:35, 29 September 2015 (UTC)

Yes, it can be fixed. Are you familiar with manipulating djvu files? You should remove the two duplicate images and insert the two new ones in the proper place. Then, we should shift pages correspondingly. If you can refer to djvu page number, it is clearer. If you are not familiar with the process, upload the two images here on WS and we will sort it out somehow.— Mpaa (talk) 20:31, 29 September 2015 (UTC)
I wish I knew how. But DJVU pages 312 & 313 should be removed (they are the pages which repeat), and pages shifted from DJVU pg 294. The pages to insert will fill DJVU pgs 294 & 295. I have uploaded the images. They are located in Category: User images, and are the only images listed. Let me know if I can do anything else, and thanks! Londonjackbooks (talk) 20:50, 29 September 2015 (UTC)
Should be OK now. Make a null edit (op righ menu) if images are not aligned yet.— Mpaa (talk) 22:10, 29 September 2015 (UTC)
Great! Thank you so much for the fix... Glad it was repairable. Londonjackbooks (talk) 22:29, 29 September 2015 (UTC)

crat

Howdy,

You are now a bureaucrat. Enjoy the crushing pressure. :-D

Hesperian 02:28, 1 October 2015 (UTC)

Congratulations - good job! BD2412 T 03:39, 1 October 2015 (UTC)
Congratulations, excellent choice. . . . . and as they say in Budapest, today Rome, tomorrow the world. — Ineuw talk 22:36, 3 October 2015 (UTC)

Purging Index: ns

I have started Wikisource-bot on the task of purging all the Index: ns (started ~1200 GMT). That should clear up the issue of page editing not updating Special:IndexPage, and where you had identified that some indices were missing their class. — billinghurst sDrewth 12:10, 2 October 2015 (UTC)

Hello Mpaa,

I prefer you to decide about the namings, whatever system you chose. The French and Canadian system is often (but not always) this one: fr:Livre:Paquin - La cité dans les fers, 1926.djvu. I don't know if Wikidata makes this kind of naming obsolete. Lots of thanks for your help! Regards, --Zyephyrus (talk) 13:18, 8 October 2015 (UTC)

Hi. I just noticed in Central discussions that a bulk page move was required, so I took the task. I had no part in the rest. If a better naming is suggested, I think this should be mentioned in the thread where all this started.— Mpaa (talk) 20:13, 8 October 2015 (UTC)

Nietzsche

Hi, I’ve been busy proofreading Nietzsche the thinker and wondered if you wanted some feedback on your bots’ work? Cheers, Zoeannl (talk) 01:35, 6 October 2015 (UTC)

Yes, that is appreciated, thanks. I actually cleaned up the text offline and just ised the bot for fast page uploading— Mpaa (talk) 21:21, 6 October 2015 (UTC)
The clean-up of footnotes is almost flawless with very few missed. The pre-italicizing is very convenient—to the extent I’ve had to stifle my annoyance when it doesn’t work! almost always because of a scanno. I've noticed 2 minor glitches: it consistently deletes anything on the line before a "Cf." and there have been a couple of missing following footnotes i.e. when split over 2 pages the footnote on the 2nd page isn't there.

Also, if you have done any clean-up on PSM so it's set up for easy proofreading, please let me know. I find such pages very soothing at the end of a long day—when Nietzsche is a bit much. :) Cheers, Zoeannl (talk) 08:08, 7 October 2015 (UTC)

@Zoeannl:, you might want to take a look at PSM 55.— Mpaa (talk) 20:58, 13 October 2015 (UTC)
Lovely. Cheers, Zoeannl (talk) 00:17, 14 October 2015 (UTC)
PSM 56, 58 and are good to go as well. BTW, thanks for proofreading Nietzsche.--Mpaa (talk) 15:23, 25 October 2015 (UTC)

Wikisource Conference

Hi Mpaa, I'm Aubrey from it.ws. As you probably know, a bunch of wikisourcerors (and Wikimedia Austria) are organizing a conference from 20 to 22 November, in Wien, Austria. We are trying to reach to experienced Wikisource editors, because the conference will be (hopefully) an important event for the Wikisource international community. Will you be able to attend? Do you have a chapter who could provide you a scholarship? If you're interested, please write to me at andrea.zanni@wikimedia.it. Thank you very much. --Aubrey (talk) 17:24, 18 October 2015 (UTC)

Hi Aubrey. Thanks for the invitation. Unfortunately, I do not think I will be able to attend.— Mpaa (talk) 17:43, 18 October 2015 (UTC)

A minor request for the text cleanup process

Hi. Would it be possible to alter the <ref>footnote</ref> to <ref>*</ref>? When the asterisk is removed, it's difficult to find at a glance. Thanks — Ineuw talk 17:06, 2 November 2015 (UTC)

Sure, I will do that from now on.— Mpaa (talk) 17:58, 2 November 2015 (UTC)
Thanks.— Ineuw talk 20:53, 3 November 2015 (UTC)

Farinelli

Dear Mpaa I am contacting you since I found you interested in music and committed in the Grove Dictionary. I have found an interesting pair of essays on Senesino and Farinellim, the two most famous castratos in London during the time of Handel, in The Westminster Magazine: or, The Pantheon of Taste, vol. 5 (1777), pp. 396-397 and would like to submit both to the English wikisource in the form of texts (of course a scan of the pages can accompany the essays). So far however I have only contributed to the German version and only reference sites listing links to pdfs of old musicological and general newspapers. So I do not know how I can possibly create a new site here containing an article from the Westminster Magazine, the more so since no single article from there seems to be contained in Wikisource yet. I have uploaded the first of the two texts on a separate page in my wikisource profile. The "references" are original footnotes from the text.

Not really, probably you have seen some occasional contribution of mine ... :-)

Three questions on this:
1) Can you explain to me how to submit this to the English wikisource best

Same as on de.wikisource. Download the scan to Commons (File:xxx.djvu or pdf, depending on what format you have) and create an Index:xxx.djvu (or pdf).


2) Would you take care of this for me? The errors in the "text only version" supplied by Google Books have been corrected with the exception of two minor things: a) what the omonious "lö" in "Jupiter & lö" stands for and what amount is meant by 50 (unknown symbol) L. (50,000 pounds?). The text version has been proofread by me against the scan.

You need to proofread against the scan. Since you already did it, copy-paste the prtion of text in th corresponding Pages in Page:ns.
Texts should be faithful to original, so errors should stay there, may be you can use {{SIC}}.


3) Is it ok or general usage in the English wikisource to include links to wikipedia to make clear what person or building etc. is meant in the text?--Haendelfan (talk) 11:10, 10 November 2015 (UTC)

Yes, without overlinking. When you transclude a page in Main ns, in the author header (or via wikidata) you can link to from there as well.
If you link here the index page you will create, I will keep an eye on it.Mpaa (talk) 23:01, 10 November 2015‎

A quick mass subst

https://en.wikisource.org/w/index.php?title=Special:WhatLinksHere/Template:Rh/l&limit=500

The template concerned should have been subst, (and note to self this needs documenting)ShakespeareFan00 (talk) 11:37, 28 November 2015 (UTC)

Done.— Mpaa (talk) 17:21, 28 November 2015 (UTC)