User talk:Xover

From Wikisource
Jump to navigation Jump to search


The story of Prague.djvu[edit]

Hello Xover. I have uploaded File:The story of Prague.djvu which I converted from File:The story of Prague.pdf, but the quality of the djvu file is very bad. May I ask you if you could convert it so that the original quality of the scan stayed? Thanks very much. --Jan Kameníček (talk) 15:58, 26 September 2020 (UTC)

Hello again. Meanwhile I found out that although visually the scans in .djvu are very poor, IndicOCR works very well, so it is not that urgent. It may still be good to convert it in a better way for visual purposes, but if you have better work to do, just forget it, it is really not necessary. --Jan Kameníček (talk) 09:35, 27 September 2020 (UTC)
Oh, now I see that I was too slow to write you, I should have made my mind to write you earlier. Now, it looks awsome, thank you very much!!! --Jan Kameníček (talk) 09:38, 27 September 2020 (UTC)
@Jan.Kamenicek: Done. The Internet Archive had the same scan so I used the scan images from there, simply because it is more convenient to download there.
I also found that IA had a scan of the 1920 second reprint of the work (which looks to be entirely identical) but in much higher scan resolution (about 2.7x) and generally good quality. So I uploaded that too at Index:The Story of Prague (1920).djvu. Since this is an image-heavy work I recommend prioritising the higher resolution scan unless there are specific reasons for preferring the first printing (beyond it being the first printing).
Incidentally, I also strongly recommend tacking on the year of publication in parenthesis after the work's title when uploading; even when you don't anticipate there ever being multiple editions transcribed. Multiple editions can come up for any number of unpredictable reasons, and even when they do not the year of publication in the file name helps put the file in context in a number of situations (telling what's what at a quick glance in a file list, for example). --Xover (talk) 09:40, 27 September 2020 (UTC)
Ad images: That is true, I will consider this. Unfortunately, I have already manually processed and uploaded the images acquired from the older edition, which was quite a lot of work :-( Should I have noticed the edition and copy you have pointed to, I would definitely use that one for the images.
Ad parentheses: True, I will keep this advice in mind. --Jan Kameníček (talk) 09:56, 27 September 2020 (UTC)
So finally I decided to work on the 1920 edition whose scans have better resolution and which you uploaded too (thanks very much for that). May I just ask whether you changed the position of the map in book? The copy at IA has it a couple of pages later. Currently the position of the map makes a small problem: The list of illustrations says that an image of View of Prague in 1606 faces page no. 206, but in the uploaded copy it is the map that faces this page instead. According to the list of illustrations the map should face page 212. It is true that such a position (inside the book’s Index) does not make much sense, maybe it could be moved just in front of the Index (i.e. only behind the other three plates). It would not solve the problem of facing page 212 (which would have to be handled e.g. by SIC template) but it would solve the problem of facing the page 206. What do you think? --Jan Kameníček (talk) 17:38, 17 November 2020 (UTC)
@Jan.Kamenicek: The details are hazy, but I seem to recall the map was placed in a way that was problematic for some reason, and I had to make a judgement call on where to put it. As I recall there were several such issues with the scan, but most had an obvious resolution. In any case, I think I still have the files sitting around so I'll take a look and see if I have anything intelligent to contribute. Your judgement will probably be much better than mine on this though, since you're more familiar with the work. --Xover (talk) 17:53, 17 November 2020 (UTC)
@Jan.Kamenicek: Oh, hmm, it comes back to me now… I moved the map where it is mainly based on the page numbering: with the map the four illustrations cover pp. 207–210, with the last page at 206 and the index starting at 211. Without it we're one page short. In view of the list of illustrations pegging the map to be on p. 212 (which I probably didn't notice at the time) I would be inclined to move it back to that position (well, facing 212, not on p. 212, but that's a minor quibble) and inserting a dummy blank page after the three other plates (before p. 211). What do you think? --Xover (talk) 18:07, 17 November 2020 (UTC)
Yes, I agree with this. Unfortunately, I am not able to manage djvus, may I ask you to handle it? It would be really helpful. --Jan Kameníček (talk) 18:57, 17 November 2020 (UTC)
@Jan.Kamenicek: Of course! I’ll try to get it done some time tomorrow. --Xover (talk) 19:00, 17 November 2020 (UTC)

Phab ticket you might be interested in: phab:T267617[edit]

Hi! Quick heads up for a phab ticket that might interest a gadgety person: phab:T267617 Index page's page links should have the page index-in-file in them (e.g. as attribute). Inductiveloadtalk/contribs 10:30, 10 November 2020 (UTC)

@Inductiveload: Thanks. Incidentally, you can watch components and projects in Phabricator and be notified whenever a new task is registered for it. So for e.g. Wikisource and ProofreadPage, click the tag in an existing task, then navigate to its overview page in the left nav menu (by default you get its work board), and then use the watch button in the top right. --Xover (talk) 14:38, 10 November 2020 (UTC)

New OCR tool[edit]

Hello Xover,

I have tried your new gadget at the following pages with very good results:

  • Page:The Queens Court Manuscript with Other Ancient Bohemian Poems, 1852, Cambridge edition.djvu/110: Very good text recognition, only it inserts empty lines between most of the lines of the poem (not all). The same problem appeared also in 117, but these are only exceptions, other pages of the book were OK and the OCR was sometimes even better than in the original OCR layer. I do like the curly quotes and apostrophes, although other people may not be so happy about them (I guess it would be too difficult to let the user choose in some preferences).
  • Page:The Story of Prague (1920).djvu/206 Very good OCR competing with the original OCR layer. I like the empty lines between paragraphs which the original OCR layer did not have. Both of them have problems with acutes above some Czech vowels and they both transcribe "mediæval" as "medieval".
  • Page:The Bohemian Review, vol2, 1918.djvu/217 Your gadget has no problem to read the text in columns and beats the original OCR in line recognition again. The only problem is the header of the newspaper, whose text is quite well recognized in the original OCR, but makes problems to your gadget.
  • Page:The Bohemian Review, vol2, 1918.djvu/237 This page is an extremely hard test for any OCR, as two upper columns belong to one article and two lower columns to another article. The original OCR layer failed to recognize this and so did yours, but in fact I did not expect any success here and I would be really astonished if the result were different.

To sum it up, I do like your gadget as it proved its usefulness in my tests. Although there is some space for improvements, imo it can replace the previous Phe’s gadget and I do thank you for its creation. It would be great, if the gadget were not only an external tool difficult to be repaired by other people than you in case of some problems in future, but if it could be open for wider community, and ideally, if it could be a part of Mediawiki so that it was not so easy to ignore its potential failure in future as it happened with the Phe’s tool . --Jan Kameníček (talk) 20:32, 11 November 2020 (UTC)

@Jan.Kamenicek: Thank you: that was exceedingly thorough!
First, I need to clarify that what we're here talking about is all Phe's code. The new script I asked you to test is a copy of MediaWiki:Gadget-ocr.js, which adds the "OCR" button to the toolbar, sends the request to the https://phetools.toolforge.org/ backend service, and then adds the result to the text box. Much of the discussion in the Phabricator task was regarding various fixes to that backend service. You can see the sum total of the changes I made to the script here (all of it is tweaking how the script deals with the whitespace in the OCR output from the backend service). So all credit here goes to Phe; I've just been doing minor tweaks to try to get it working again.
In addition to this I've been working on my own, completely independent, OCR gadget; which I have mentioned in passing but not really shown to anybody yet (it's too primitive and buggy). That was motivated primarily by making something to tide the community over until WMF Community Tech comes up with a new and (hopefully) better supported OCR tool. Now that Phe's OCR is (hopefully) fixed the need for that is probably not as great, but I may still keep working on it in order to experiment with giving the user some more options. For example whether to output curly or straight quotes, whether to unwrap lines within paragraphs, and possibly other such transformations. I am also looking at letting the user specify a primary and one or more secondary languages for a given page. Right now Phe's OCR assumes all text requested from enWS are in English, and so it will mostly not recognise any runs of text in other languages, except insofar as they are written in characters in common with English. For Chinese, Cyrillic, etc., or languages with extensive use of accents and ligatures (i.e. Polish etc.), this is almost guaranteed to give poor results. By specifying that "This page is mostly in English, but it also contains some words in Polish" it is possible that we can get better OCR results for these pages.
In any case… Based on your testing and feedback above it sounds like the fixes I made to Phe's OCR have been about as successful as we can hope for, and we're at the point where we can update the main Gadget and announce that Phe's OCR is back up. --Xover (talk) 13:19, 12 November 2020 (UTC)
@what we're here talking about is all Phe's code: Ah, I see :-) Nevertheless, it does not make your credit any smaller! Thanks a lot for getting the tool to work, hardly anybody hoped it could still happen :-) --Jan Kameníček (talk) 22:07, 13 November 2020 (UTC)
@Jan.Kamenicek: Yeah, I had mostly given up hope of a fix, so when an opportunity presented itself I jumped at the chance. Hopefully this will tide the community over until Community Tech can build a new tool that is at least less dependent on a single contributor, even if there are limits to how many resources they can give it once it's built. --Xover (talk) 09:30, 14 November 2020 (UTC)

Thou hast sprung my trappe carde[edit]

I see you there, fiddling in Marlowe! Do you think this could become a PotM or maybe make Christopher Marlowe a collab when the current one fades? He's kind of a "thing", but all his works here are a hot mess and need scan backing (except Ignoto!), and we could do with some "olde worlde" originals if possible too.

Re. curly quotes they are straight in the OCR and I usually use straight through sheer laziness (Compose+<+' is fiddly) and inertia. I have no philosophical objections to them, and I do think they look better. Inductiveloadtalk/contribs 18:01, 12 November 2020 (UTC)

@Inductiveload: Yeah, the Early Modern classics are woefully patchy here. Marlowe is probably a good collab since his oeuvre is a reasonable size, unlike, say, Shakespeare or Middleton (thank god for poets who get themselves killed young!). On curlies, I automate it with a script cribbed from Sam, and may eventually get around to adding it as a per-work auto-fixup ala that header thingy. --Xover (talk) 18:08, 12 November 2020 (UTC)
@Inductiveload: Oh, I meant to mention… I found a couple of instances of {{dhr|$1}} in there (the title page I think) that looked like a buggy helper script at work. You may want to go looking for that one.
And, while I'm teaching grandpappy to suck eggs, since {{ts}} took the worst pain out of formatting tables, I've completely stopped using {{TOC begin}} and friends. Plain tables gives better control, less messy markup, and doesn't require recalling template-specific syntax for structure (with a table, the structure is explicit and in your head, and you look up any formatting you need; vs. the opposite for the various TOC templates). It was a bit of a pain to start, but it turns out there aren't that many variations in the tables so it quickly overtook the TOC templates in efficiency. I heartily recommend it! --Xover (talk) 08:07, 13 November 2020 (UTC)
Thanks, I fixed it. I got 99 problems and regexes are 98 of them (forgot the parens, so there was no capture group 1 >_<).
Re TOC, those templates certainly aren't ideal, for various reasons including bad interactions with ProofreadPage and the MW parser (c.f. phab: T232477, which you know already). I wonder if TemplateStyles + CSS classes on the TR elements might be worth a try for a slightly more semantic feel? "Direct formatting" with {{ts}} or similar is a fairly blunt weapon IMO, though the blunter weapons can be more reliable, and the TS+class approach might tip towards overwrought? At least it's not quite as fraught as {{TOCstyle}}. Inductiveloadtalk/contribs 20:52, 13 November 2020 (UTC)
@Inductiveload: Hmm. Since what we're doing is essentially direct formatting, reproducing the original work rather than applying our own style to indicate the same semantics, I don't think @class is a good match. By using a template ({{ts}}) we get the same benefits of abstraction as a CSS class, but retain the convenience. I suppose we could create {{trs}} that emits @class, but I think the problem there is more that CSS styling tables is quirky as heck (or, it was when last I tried, but I'm not up to date).
But all this reminds me of an… experiment… I have ongoing: {{sbs}} and {{sss}} (with accompanying {{sbe}} and {{sse}} closing templates). The mnemonic is for "styled block start" and "styled span start", and both of them boil down to spitting out a div or span with the provided arguments as CSS class names, styled by TemplateStyles. They've got a couple of different goals, but the initial impetuous was to find a better approach to styling poetry (I hate the poem tag, and detest long rows of br). In addition I grew tired of the scattershot of templates with inconsistent naming conventions and arguments, and spotty documentation, and annoying syntax weirdness when you try to nest templates or put all the text in a template argument or…
So for a typical poem I would do something like: {{sbs|fine centered-block pre-wrap}} … lines of poetry … {{sbe}}. Or for a title page where all the text is centered: {{sbs|centered-text}} … normal formatting, except no need for {{c}} … {{sbe}}. Since they're just div or span they can be arbitrarily and predictably nested, and the block vs. inline semantics are explicit in the template. And since what we're doing with the templates is applying styles, using classes is a pretty natural fit, and lets us reuse general CSS knowledge rather than inventing our own style language again for each template.
There's stuff they can't solve (hanging indent for wrapped poem lines being the standard example), and they're a bad fit for anything needing flexibility (no parameterized TemplateStyles). But a surprising amount of our most used templates mainly just apply a static formatting for which these are straightforward replacements. The lack of knobs and dials may also encourage a healthy shift away from obsessively trying to reproduce details that really aren't important and on which an inordinate amount of volunteer time and frustration (see SF00's periodic bursts of exasperation) is wasted. I'm envisioning the docs to be a list of the available classes, each documenting standard workarounds for common issues, and with side-links to traditional formatting templates where knobs and dials can be tweaked if needed.
I'm trying these out on works I work on myself to get a feel for how well they work and what the "standard" workarounds for various problems will have to be. So far I'm pretty happy with {{sbs}} but find myself using {{sss}} comparatively little, mainly because the syntax gets more verbose than the old way for inline use (maybe it should be a meta-template and have a suite of wrappers applying each effect?). I'm still not completely convinced it will work to mix and match CSS classes this way without running into the same conflicts inconsistent templates do.
In any case, thoughts and input on these and this approach are very much welcome. If you want to try it out then keep in mind I don't really consider them stable so you'll need to be prepared for breakage. --Xover (talk) 09:22, 15 November 2020 (UTC)
For TOC tables (specifically TOCs) with row-based classing, my thinking is that something like this:
|- class=toc_row_1-1-1
| I
| Chapter 1
| 2
is simpler and the intent more "visible" than:
|-
| {{ts|vtp|ar|wnw}} | I
| {{ts|vtp|pl1|wa}} | Chapter 1
| {{ts|vbm|ar|pl1}} | 2
{{TOC begin}} and {{ts}} are roughly contemporaneous, and the reason for the former is that the style-spam in TOCs gets tiresome, repetitive and tricky to adjust later.
In nearly all other cases, my main concern is that centralising all the CSS into global classes, while clearly better from a DRY perspective, is also somewhat fragile, as the CSS classes will be shotgunned throughout thousands of pages and can break, and break silently due to how TemplateStyles works, if someone makes a well-meaning edit to, say, the fine class. This is why I have generally stayed away from "global" CSS (à la Template:Table class) and leaned more towards work-specific CSS like Template:Os Lusiadas (Burton, 1880)/errata.css.
Re poem, I suspect that a new extension or a new tag in the existing extension (say <ppoem>, where p stands for "proper") that does span-per-line and p/div-per-stanza is better than anything we can hack up on the wikicode side, even with module support. Inductiveloadtalk/contribs 12:02, 15 November 2020 (UTC)
@Inductiveload: Apples and oranges. {{ts}} is a shortcut for adding @style to table cells, and the equivalent would be {{trs}} (or whatever) to add table row styles. Because table rows are, by virtue of their semantics, more general than table cells, the arguments for having {{trs}} emit @class rather than @style are stronger. Personally I am not convinced @class makes sense at any level more granular than the page (and the most natural fit is at the work level), but at the table row level I am at least prepared to entertain an argument.
On CSS I agree on the general point, but I think that's a longer term issue of better CSS support (PRP support for per-work CSS, maybe something like LESS and a hierarchy of CSS to cover cases in between MediaWiki:Common.css and inline styles, beyond just TemplateStyles. Definitely agree a MW extension to replace the current poem tag is needed for a real solution, but I don't think that's realistic in any reasonable timeframe so I'm focussing on stuff that can (hopefully) be made to work within the current limitations. The CSS stuff in {{sbs}} being one prong, and a Lua module a possible alternative approach.
Of course, I am not at all sure anything short of an extension will work: the parser and remex insert themselves so aggressively that they tend to sabotage any even moderately complex markup and styling. --Xover (talk) 13:38, 15 November 2020 (UTC)

Gadget in progress[edit]

Just a quick note for something to play with if you have some cycles to spare one day (no action required, just for interest).

It's a "re-imagining" of the popups gadget. Using a slightly different plug-in-like architecture, I hope it can be a bit more flexible that the enWP-centric popups gadget. To try it:

mw.loader.load("/w/index.php?title=User:Inductiveload/popups_reloaded.js&action=raw&ctype=text/javascript");
mw.loader.load("/w/index.php?title=User:Inductiveload/popups_reloaded.css&action=raw&ctype=text/css", 'text/css');

Probably will spew a few errors to console on occasion and the UX is a bit jarring sometimes, but it's already better than the old popups for my nefarious purposes IMO. Inductiveloadtalk/contribs 00:23, 21 November 2020 (UTC)

@Inductiveload: Neat! Upgrading or replacing Popups has been on my wishlist for a long time; with the two main issues being improved styling and better support for previewing PRP-backed pages. I probably won't have time to play with it any time soon, but when things improve I'd love to take it for a spin. --Xover (talk) 12:33, 22 November 2020 (UTC)

The History of the Bohemian persecution (1650)[edit]

Hello. May I ask you to convert File:The History of the Bohemian persecution (1650).pdf into djvu? There is absolutely no hurry, I have enough work to do :-) --Jan Kameníček (talk) 19:08, 21 November 2020 (UTC)

@Jan.Kamenicek: File:The History of the Bohemian Persecution (1650).djvu. The OCR quality is… not great. You may want to try the Google OCR gadget to see if it does better on the worst pages. But at least the image resolution is ~2x the PDF version. Let me know if there are any out of order pages or other such issues that needs fixing. --Xover (talk) 00:49, 22 November 2020 (UTC)
Thanks very much! I expected the OCR layer would be bad, so it did not surprise me. However, comparing e. g. [1] with [2] I can see that in the PDF version the OCR recognizes long ſ, while in the DJVU version it replaces it for f which did surprise me. I am mentioning it just as a curiousity, it is not a problem at all, as it needs to be replace for "s" anyway and maybe the whole OCR needs to be replaced e.g. using the Google gadget (which seems sliiiiggghtly better ). Thanks again. --Jan Kameníček (talk) 09:29, 22 November 2020 (UTC)
@Jan.Kamenicek: Tesseract (the OCR engine I use) does not recognise long s, so these will never have that right. It's trained on more modern texts so pre-18th century texts will be pretty hit and miss. Sorry. --Xover (talk) 12:29, 22 November 2020 (UTC)
I see, I did not know that you exchanged the OCR layer. I noticed that it was better than in the PDF (except the long s), but I thought that it was due to better OCR extraction from djvu than from pdf by Mediawiki. So I thank you for this too. --Jan Kameníček (talk) 13:10, 22 November 2020 (UTC)

ES6 in JS that may end up in a gadget[edit]

Heads up: if you use ES6 syntax (let, fat arrow, etc. etc.) in scripts, it will choke if you try to make it a gadget and you'll spend ages unpicking your shiny new hotness and replacing it with old and busted. Inductiveloadtalk/contribs 12:12, 1 December 2020 (UTC)

@Inductiveload: Hmm. You sure that's not just the normal scoping issues? What is it that breaks exactly? --Xover (talk) 12:34, 1 December 2020 (UTC)
@Xover: something in the ResourceLoader stack rejects it. You get errors something like
JavaScript parse error (scripts need to be valid ECMAScript 5): Parse error: Missing ; before statement in file 'MediaWiki:Gadget-sandbox.js' on line 4
You can try it out by enabling the "Sandbox" gadget in your user preferences. Line 4 of MediaWiki:Gadget-sandbox.js is the let x = 1; line. Inductiveloadtalk/contribs 12:53, 1 December 2020 (UTC)
@Inductiveload: Argh! Yeah, as usual the MW situation is a mess. phab:T75714 will give you the gist of it, but the issue seems to be the lack of a JS minifier written in PHP that supports ES6 combined with lack of priority to the task due to IE still providing 3% of global hits on WMF sites. In essence I think that means ES6 is effectively blacklisted until the WMF raise the Grade A browser support criteria to include ES6. --Xover (talk) 13:23, 1 December 2020 (UTC)
@Inductiveload: The "shiny new hotness" is now at mw.loader.load("//en.wikisource.org/w/index.php?title=User:Xover/loupe.js&action=raw&ctype=text/javascript"); if you want to play. No testing to speak of, and written in full "scratching my own itch" mode, so expect breakage. It probably needs a toggle to turn it on and off, and the layering is off at the edges (that's prolly PRP's fault though), and the size is hard-coded, and… But, anyway, feel free to play with it (and to steal any bits you want obviously: I tipped over into actually cobbling this together when I saw your code for grabbing the thumbnail URL from the API, which I'd been procrastinating on figuring out, in the index grid thingy), or laugh and point derisively (it ain't pretty is what I'm saying). --Xover (talk) 15:14, 1 December 2020 (UTC)
Awesome! Very stylish!
I'm still unsure of the One True Way (TM) to configure gadgets (e.g. width) - so far the only response I got at MW is "use the options API", which is a good way in terms of UX and also when the data is available (i.e. right from the start), but perhaps rather limited in terms of being ale to drive the configuration programatically.
Furthermore, after digging about in PageNumbers.js I'm also unsure if mw.cookies or the Options API are a better bet for storing things like current visibility state.
BTW, I've made some notes at User:Inductiveload/Script development about "offline" development which can be a bit less frustrating that saving every typo into a page's history! Inductiveloadtalk/contribs 17:36, 1 December 2020 (UTC)
@Inductiveload: I think the options API is a good fit for storing user preference, active choices that rarely change, while cookies are a better fit for state and things that can change based on an ad hoc toggle. For PageNumbers.js I'd think of user.options as analogous to {{default layout}}, and mw.cookie as parallel to the toolbar toggle. But don't think too hard about this with PageNumbers.js in mind: that code is a mishmash of different things bolted together and written under the constraints of what MW provided half a decade ago (and affected by cultural factors like fights over the default stylesheet etc.). It needs a thorough rethink before it's an apt use case for anything.
You might also keep in mind mw.storage for other kinds of things you want to stash away. My auto-header scripts stores the current chapter title there, in a per-work key. It's semi-persistent (webstorage is size-capped) and limited to current browser, but for a lot of use cases it's plenty good enough.
All of that is to say, I'm not seeing the use case you have in mind when you say perhaps rather limited in terms of being ale to drive the configuration programatically (too much ale? ;D). I'd love a more full-fledged preferences framework (ala what Apple gives you in Cocoa), but for what it is, the stuff in MW seems… adequate. I feel far more stifled by the design and limitations of OOUI (and MW HTML output when viewed as an API). But it seems likely I'm missing something there?
On offline development, I'm mostly just lazy and find the solutions both awkward and overkill for my needs (I don't write all that much code here). Your notes will be a great help overcoming the "lazy" bit though, if I ever feel I need to take the plunge. They're very helpful and we should stash them somewhere prominent where others in need can find them.
PS. The loupe has been updated, with some general cleanup and fixes after a bit more testing (also de-ES6-ified). It turned out nicely enough that I'm starting to think of Gadget-ifying it. It currently breaks other interactions with the page image so it probably needs some way to toggle it on and off. (definitely needs more testing too, I'm half-arsing this in stolen snatches of time) Thoughts, on this that or the other? --Xover (talk) 18:04, 5 December 2020 (UTC)
Sounds sensible re cookies vs options API. I feel like there's probably a fairly waide grey area where neither is wrong.
Re "perhaps rather limited in terms of being ale to drive the configuration programatically: What I mean is things like being able to do very general setup like (psuedocode)
if (pagename.startsWith("Foo") or namespace === "Page") {
    config.replacements.push([
        /myregex([a-z])/,
        function(x) { return "[[ + x + "]]"; }
    ]);
}
where allowing all possible such things would be awkward to express in a way that can be stored and manipulated only though an HTML form (unless it's eval'd JS or something evil like that. Setting strings or ints is one thing, but the beauty (and curse) of allowing JS gadgets is you can have some really powerful configuration options.
I'll investigate the loupe more when I have a mo Inductiveloadtalk/contribs 17:16, 6 December 2020 (UTC)
@Inductiveload: Ah, I see. Well, to me, that's really neither settings nor state; we're talking something akin to a plugin or an extension. And, no, MW doesn't really have any good facility for that use case. The best you can probably do there is a JSON content model page in the user's user space with a Special:BlankPage subpage and a JS GUI for managing it. Which is needlessly complicated and labour-intensive, admittedly. Or possibly you could do it using runtime hooks, where userspace scripts can add regexes to the gadget code through the window object (which I don't think they fence, but I could be wrong) or custom events (ditto). But, yeah, no great options.
In fairness, I'm not really sure how you would design a good system for that in the MW architecture. Our needs here are pretty unique, and they are distinctly non-regular (varying from work to work), creating a need for far more flexibility at the end-user level than most other use cases. --Xover (talk) 18:07, 6 December 2020 (UTC)
The thing I've done/seen done before without going through window is a MW hook fire()/hook(), but, like I was fretting about before, this will only work if you can send the hook() before the fire(). Which admittedly is not so bad when the action fires upon a user's action (e.g. a toolbar click, save, etc), but gives me pause when it happens right away (e.g. page load), as the gadget/user JS relative load ordering is AFAIK totally undefined, though if the gadget waits for DOM ready, user common.js "almost" certainly would be complete. But, if the user config happened, say, in a callback from some external AJAX load (e.g. checking if that page had an IA descriptions available via the File info page), it could easily come after the DOM settling. Inductiveloadtalk/contribs 11:13, 7 December 2020 (UTC)

Index:Duke University Libraries (IA carysnewitinerar01cary).pdf[edit]

Started in good faith, but in places the scans aren't readable.. Once again viewing the book directly on IA show no quality degrade..

PDF compression issues? ShakespeareFan00 (talk) 20:22, 5 December 2020 (UTC)

@ShakespeareFan00: I've regenerated a new DjVu from the source scans (at 3.5x resolution) and migrated the index and pages. I didn't spot the quality issues that triggered your request (didn't look that hard), but if the new version is significantly better you may want to point Fæ at the before and after (specific pages); both because I happened to see they've discussed the relative quality of PDF vs. DjVu, and because it's relevant to the IA upload project. IAs PDFs are not particularly high quality, and even their DjVus are sometimes pathologically bad. For master data you really really want the original .jp2 scan files (in a pinch converted to plain .jpeg, but preferably not); the DjVu (and PDFs) are a derived format to make it convenient for various secondary purposes. (I'd follow up directly but I just don't have the spare cycles these days). --Xover (talk) 14:18, 6 December 2020 (UTC)
Generally , The IA PDF's or Djvu's have been a LOT better than ex Google Books scans.. So it's only 3 works from IA I'v had to ask for regeneration on.

(The purpose of Fae's projects on Commons was a "backup" option, Indvidual scans can be regenerated as needed. See also the phabricator ticket I lodged earlier. If it was possible to find an existing IA- pdf, and do a Hi-quality re-gen semi-automatically , I'd be happy, but at the moment manual requests are a suitable workaround.ShakespeareFan00 (talk) 15:25, 6 December 2020 (UTC)

Sandboxes and pagenumbers[edit]

Daily WTF updates:

  • I have adjusted the sandbox gadget to pull from "User:" + wgUserName + "/gadget-sandbox.js" so anyone can play.
  • I may (may) have found a way to make it work. It's a hack and I don't know why it works.
  • I found having Firefox dev tools open and the HTTP cache disabled prevents the bug from manifesting, even when it's loading as a gadget. And even then it doesn't happen very often for me.

Such fun.

Also I made a Hathi OCR downloader and changed my HOCR parser to a push parser like the cool kids and after getting it all done and discovering that all the text layers are coming out misaligned >_<, discovered Hathi's OCR is great at words and terrible at page segmentation so it's still better to just throw it at Tesseract. Humph. Inductiveloadtalk/contribs 05:08, 16 December 2020 (UTC)

@Inductiveload: It's definitely timing-triggered: the first time the offsets are calculated the page geometry is not in its stable state, so that final .refresh_offsets is needed to shift them into the correct position. I was able to reliably reproduce the issue yesterday (Safari, macOS, sandbox gadget, testing on B's test case) so I'll look for a less-timing-dependent way to fix it. I have a hunch the basic issue may be that we're modifying the DOM during pagenumbers.init while running inside $(document).ready(), but with no gate to make sure the DOM has (re)coalesced before calculating the offsets.
Loading an arbitrarily named page in wgUserName's userspace as a gadget sounds… iffy. It has security implications, performance implications, and it affects the timing (which is one major reason to load as this gadget in the first place) by, among other things, waiting on mw.util. It's exceedingly clever, but I think it's also excessively clever: the gadgets have barely been touched in years, and right now you and I are the only ones touching anything here. Sam might if they ever get any spare time again, but other than that the need for this is close to zero. I'm not feeling the cost—benefit on this one, is what I'm saying.
Hathi's OCR quality: that comports with my experience. Both Hathi and IA (and Google) have some aspects that are better on some works, but there is no clear general winner for all works. --Xover (talk) 07:32, 16 December 2020 (UTC)
Re the gadget sandbox: you have to enable the gadget and ignore the "DONT USE THIS GADGET" notice in the process. And only interface admins (i.e. me) can edit other users' JS anyway so loading you own JS should be safe enough, considering we also load JS from all over the place, including other wikis (like mulWS and ruWP). And if I were to go rogue, I'd hit MW:common.js, not a gadget with probably 0 or 1 users on a good day. The reason I did it was so I could load something as a gadget too without co-opting the sandbox.
Re the timings - a MutationObserver might be the thing to use, but the question is which mutations? Inductiveloadtalk/contribs 07:46, 16 December 2020 (UTC)
@Inductiveload: Sandbox: Yeah, don't read too much emphasis into those comments. I'm not sending up a red flag, just saying the cost—benefit doesn't add up for me vs. just having two or three user-specific sandbox gadgets. The calculus would change if there was a bigger need (we don't want 5+ sandbox gadgets). Your observation about cross-loading JS is a much bigger concern regarding security and attack vectors and potential impact.
PageNumbers: yeah, watching mutations is one track I plan to explore, but I'm not sure we have a good place to watch or that these events will necessarily reflect stable page geometry (I'm a bit fuzzy on how closely DOM and rendered geometry track each other in modern browsers). I'm thinking it's more likely we can do something like hoisting the .wrapAll() out so it runs earlier, and either attach a .load handler sooner (so it actually gets run), hook into .ready() in a way that triggers off the .wrapAll(), or possibly a Promise-based solution.
BTW, I've started adding the dynlayout-exempt class to the key templates we want excluded: {{header}}, {{license}}, and {{authority control}}. My thinking is this is a determination that belongs in the template, when we actual control the generation of the page element; and we'll save the JS manipulation for things we don't control (like the edit box). It'll be a cleaner separation and make for cleaner code. I'm also toying with some ideas we could explore long-term to make MW and PRP directly support this use case (but we should think that through well before bugging the devs; maybe see it in connection with sidenotes?). Things would be much easier if we had a PRP equivalent to #mw-content-text to glom onto, and a dedicated left and right column (div) to stuff things into without needing the .wrapAll() stunt in local JS. If that came from MW or PRP it could more easily be Grid or Flexbox based, and save us some trouble. But I digress…
The separation of concerns track also got me thinking about how to handle the various styles involved in PageNumbers,js. I'd originally thought to make everything stylesheets in MediaWiki:-space that we just apply from JS. But some of this is stuff where we are not just applying a style, but toggling or changing values for a property (i.e. we might need to unapply a given property rather than rely on it getting overwritten). This especially goes for properties that are not really part of the individual layouts (most of what's in the current gadget .css), so it may be that that's a dividing line. I haven't concluded (or thought all that deeply) on it, but figured I'd toss it out there in case you had any thoughts. Having a dynamic layout boil down to having a MediaWiki:dynlayout-someid.css and possibly a simple configuration (for things like displayed name) in the gadget is very tempting (and I think it might just be possible to avoid hardcoding anything regarding individual layouts in JS too, but haven't gotten around to testing that out yet). --Xover (talk) 09:15, 16 December 2020 (UTC)
@Inductiveload: Oh, and I'm guessing the reason this works is that you're attaching a new .ready() handler, and these are executed in the order they were attached, so it ends up executing sequentially after the current .ready() handler; which just happens to be late enough that the page geometry is final. I don't see any obvious reason that variant should be any more gated after final rendering than just plain calling the function there would. --Xover (talk) 13:25, 16 December 2020 (UTC)
I get it and the absurdity of adding of a ready hook inside the ready hook doesn't escape me. So, I guess what I'm saying is...
setInterval(pagenumbers.refresh_offsets, 1000);
:trollface: Inductiveloadtalk/contribs 14:33, 16 December 2020 (UTC)
@Inductiveload: Heh heh! Don't think it didn't occur to me! :-)
BTW, MutationObserver won't work: it triggers off DOM changes, not property changes (like .offsetTop). --Xover (talk) 14:48, 16 December 2020 (UTC)

Anglo-Saxon Riddles of the Exeter Book[edit]

Can I request a download of a scan from HathiTrust? —Beleg Tâl (talk) 17:24, 16 December 2020 (UTC)

@Beleg Tâl: A 1963 translation, by a translator who died in 1964. Copyright? --Xover (talk) 17:42, 16 December 2020 (UTC)
The work is tagged as {{PD-US-no-renewal}}; it entered PD in 1992. You can view the uploader's rationale here. —Beleg Tâl (talk) 18:21, 16 December 2020 (UTC)
@Beleg Tâl: File:Anglo-Saxon Riddles of the Exeter Book (1963).djvu. As an experiment I tried a relatively aggressive compression profile (it's 4MB). Please check that it didn't destroy the quality before M&S. --Xover (talk) 18:45, 16 December 2020 (UTC)
What settings did you use, out of interest? Works of Bentham come out well over 2GB (I didn't use a size limit, so it's kinda my fault, but still). Inductiveloadtalk/contribs 19:28, 16 December 2020 (UTC)
@Inductiveload: The scan was black and white already, and with very good background separation (color-wise), so I simply converted to PBM and used the cjb2 encoder. I didn't tweak any of the encoder settings: the output was 2.6MB (the rest are the color covers, and one internal page with some gray tones, that I added manually afterwards from JPEG), versus 90MB for the same test using PGM for input. --Xover (talk) 21:01, 16 December 2020 (UTC)

Category:Ready for export[edit]

So I finally got round to creating this category and Help:Preparing for export. Which is why I've been hunting things like {| align=center.

Feel free to throw your favourite things into it. Pending phab:T270387 it's not that much use, but one day soonish hopefully we can have an OPDS catalog like an actual library.

So far my main pain points have been:

  • Old formatting like {| align=center, which comes out more like text-align: center; on my reader.
  • Pages with TOCs on subpages
  • Pages using dotted TOC leaders, which, despite some expedient hacks to ws-noexport the most egregiously broken elements, still don't render correctly on all devices. So far I've not put them in the category, but actually they're not nearly as bad as they were, they're just a bit raggedy rather than totally borked.

Inductiveloadtalk/contribs 07:09, 18 December 2020 (UTC)

Re Lua per Template talk:header/year[edit]

Thanks for the fix, I will give it a prod in a while.

I understand about conversion to Lua, though it is outside of my knowledge base, the one reason that I hesitate to push. When we do it, I hope that it is numbers of smaller components so that it is easier to identify issues, easier to update and document, and maybe then more usable across the range of places/namespaces that we re-use the logic. I cannot fathom getting right output from Module:WikidataIB <call me simple>. I also want to know that when we do it that we review our metadata aspects, as I think that we are still blurry to Wikidata to inhale our data well. Stuff that is outside my comfort zone. — billinghurst sDrewth 20:27, 22 December 2020 (UTC)

@Billinghurst: Yeah, one main goal in converting to Lua is to re-architect it to reduce duplicate code spread around a million places. It should be possible to have a single Lua header module backing at least {{header}} and {{translation header}}, and with reusable functions that can be invoked elsewhere if needed. I am also hoping we can clean up and modernise the HTML we output, and to let the different formatting between header/translation live in a style sheet, but that may take some unspooling of the spaghetti that's accumulated over the years.
On the metadata I can't really comment intelligently as I haven't really look into it that closely. But I can say that if Module:WikidataIB confuses you that's probably mostly to do with the information model and interfaces that Wikidata itself provides: I find these deeply weird and confusing every time I try to do anything but the very simplest operation on Wikidata. I'm sure it makes perfect sense to those steeped in it, but to anyone else it just looks plain alien. It's the same sort of feeling I used to get when talking to the RDF folks two decades ago: you can tell they're really really smart, but you're never quite sure they know what planet we're currently on. --Xover (talk) 22:43, 24 December 2020 (UTC)
smiley I will continue to tidy up the drudgery and get our templates in order and that underlying alignment, especially biographical and encyclopaedic. I will also spread my search for some metadata experts who can assist us. More my skillset. — billinghurst sDrewth 23:20, 24 December 2020 (UTC)

So many directions to look...[edit]

It is very hard to track all the different methods to alter display of texts. So I'm always looking behind the curtain. I found use of {{sbs}} and then wondered what the hey behind

<templatestyles src="sbs/styles.css" />

After a while I found the Wikipedia: page for templatestyles and a couple other bits, though it is a stumbling block that there is no descriptive write-up for the non-ivory-tower people.

Anyway, finally got back to trying to figure out why this was needed on the particular page, and still don't know 'why'.

But I did notice that Template:Sbs/styles.css has two mentions of

 div.ws-template-sbs.smaller {
   font-size: 83%;
 }
 div.ws-template-sbs.smaller {
   font-size: 83%;
 }

Anyway anyway, is templatestyles usage supposed to be work/genre -specific, or just another collection of personal choices? Shenme (talk) 04:50, 27 December 2020 (UTC)

Found some more hints? Would be nice if centralized:
Shenme (talk) 09:41, 27 December 2020 (UTC)
@Shenme: In the wider Wikimedia and MediaWiki universe, TemplateStyles is not particularly an end-user feature. It is an extension to MediaWiki that was designed specifically to solve the problem of templates hard-coding a lot of physical formatting in a way that cause some technical and practical problems, and which makes it hard to tweak the formatting because you have to edit the convoluted template syntax. So instead, the TemplateStyles extension gives you a html-like tag—<templatestyles src="styles.css" />—that lets you load a CSS stylesheet at that point in the template (usually at the start) and in a way that ensures deduplication (using the same template multiple times will only load the style rules once) and scoping (all the CSS selectors are scoped to only apply to the content area).
On Wikisource however, we have somewhat more need of specific formatting: on Wikipedia all articles should look roughly the same, but each of our works have unique formatting quirks. Which means we have need of the functionality of TemplateStyles as somewhat more of an end-user accessible feature, rather than a template-developer feature. We have feature requests in to extend both the TemplateStyles extension and Proofread Page extension to enable various easy ways to add a per-work stylesheet to our works through the Index: page. One major thing we're missing is the ability to pass variables to the TemplateStyles stylesheets, so that we can do things like let the end user specify a precise letter spacing in em, the way we can with hard-coded formatting in a template. This won't show up in the near future, unfortunately, and in the mean time we can't really go all-in on TemplateStyles; but we're still trying to use it in the places where it makes sense (anywhere we need formatting but we don't need to give the end user unlimited flexibility to tweak values). That is, the main use of it is technical right now.
{{BookCSS}} is a somewhat manageable way to use TemplateStyles to enable per-work stylesheets. It gives the end user a template-based interface to specify the stylesheet for a work, and a centralised location to store the stylesheets. You could in principle use a <templatestyles src="styles.css" /> tag anywhere manually, so {{BookCSS}} is just about user friendliness and manageability.
{{sbs}} on the other hand is currently more of an experiment. While our formatting templates provide unlimited flexibility for tweaking, most of the time we do not actually use that flexibility: we just use default values, or we use a small number of values falling into a broad category (think small, medium, or large; vs. a numeric value from 0–100). These can fairly easily be handled by TemplateStyles at the expense of flexibility that is rarely used and mostly not really needed even when we do use it. In addition I have been bothered by the inconsistent interfaces and names provided by our formatting templates. We have {{center block}} and {{block center}}. Some templates need the text they apply to in a parameter to the template, some automatically fall back to start and end templates, and some provide /s and /e variants. Some need a unit specified for parameters, and hard code the unit and take only a number ({{bar|2}} but {{gap|2em}}).
{{sbs}}/{{sbe}} (mnemonic: styled block start and end) and their inline siblings {{sss}} and {{sse}} (mnemonic: styled span start and end) are an experiment to see if we can improve this using TemplateStyles and a lightweight template wrapper around a stylesheet where all the interesting stuff is defined. By using CSS selectors ("style names") as the template parameters, what we're effectively doing is just adding CSS classes to the div and span elements. The templates are always start+end templates, and each invocation of them can apply any formatting for which we have rules in the stylesheet. Staying close to CSS also means we can re-use "muscle-memory" and know-how for those who have worked with CSS before, and Google will be an effective help for looking stuff up in general CSS resources.
So far {{sbs}}is a qualified success, but {{sss}} looks like it'll be a bit too verbose to be worthwhile. There are some unsolved problems, and it remains to be seen if the community will take to this approach, so if you want to play with it in mainspace you should be prepared for the possibility you may have to go back and redo pages without it (if it gets actively deprecated or something). That's why there's no documentation for it yet: I am purposefully leaving the bar a bit high so people won't get fooled into using it with the caveats that currently apply.
Hope that was helpful, and please don't hesitate to ask if there is anything else that is unclear or you're wondering about. --Xover (talk) 12:37, 27 December 2020 (UTC)

Zawis and Kunigunde[edit]

Hello. May I ask you to have a look at File:Zawis and Kunigunde (1895).djvu? The IA uploader uploaded there some extra first page and now I have found out that also all the OCRs are shifted, e.g. page 15 has the OCR layer of page 16. I reported it to the phabricator, but now I need this particular file get fixed. I would help me very much but there is no hurry. --Jan Kameníček (talk) 14:02, 30 December 2020 (UTC)

@Jan.Kamenicek: Done. Please check the result. --Xover (talk) 15:26, 30 December 2020 (UTC)
Oh, that was quick, and the result is great. Thanks very much! --Jan Kameníček (talk) 15:49, 30 December 2020 (UTC)

Fonts[edit]

I just tried to prod along phab:T166138.

Couple of other fonts we could do with are a sans and serif "Outline" font, because clever though {{font outline}} is, it doesn't look great. E.g. the second line of this page for sans. But I can't immediately see good SIL-licensed candidates. Any ideas? priority: lowest.

Also for reference, phab:T270743 tracks the ability to export used ULS fonts. Inductiveloadtalk/contribs 11:49, 31 December 2020 (UTC)

Yale Shakespeare 2021[edit]

The three titles entering public domain in 2021 are listed at Wikisource:Requested texts/1925. If you do not have the time to check the scans page-by-page, then I am quite willing to take the trouble to do so becfore you generate the files, if that would help.

I do not know when the two linked titles (Henry VIII and Love's Labour's Lost) will be available for download, but would appreciate it if, once they are available, you could generate a DjVu from each. --EncycloPetey (talk) 00:09, 1 January 2021 (UTC)

@EncycloPetey: LLL and H8 are done. I checked them, and they looked fine, but you have a better eye for these than me. Pericles I've failed to find any scans of. --Xover (talk) 21:39, 1 January 2021 (UTC)
I have found a Google scan [3] but I haven't checked it's quality. --EncycloPetey (talk) 01:53, 2 January 2021 (UTC)
Note that It should be uploaded to "File:Pericles (Yale) 1925.djvu" without the full title. Using the full titles for the Yale Shakespeare series is needlessly cumbersome, or we'd have had "File:The most excellent and lamentable tragedy of Romeo and Juliet.djvu" --EncycloPetey (talk) 02:04, 2 January 2021 (UTC)

Proper Poem tag[edit]

Just a note as you don't appear to be watching that bug or component: I've hacked up a proof of concept for phab:T8419 (semantic poem support). Inductiveloadtalk/contribs 15:47, 1 January 2021 (UTC)

@Inductiveload: Thanks. Not that I've thought this through, but… I'm sceptical of using the wikimarkup indentation syntax. I see the reasoning, but it's designed for a different purpose (which has landed us in trouble before). And I suspect we'll rapidly want more control ("this is a hanging indent line", "this is a centered line", etc.; CSS classes essentially). The wikimarkup table syntax with {{ts}} is one possible model for it. Explicit stanza and line extension tags another possibility. But in any case, thanks for kicking this; and thanks for the headsup! --Xover (talk) 19:13, 1 January 2021 (UTC)
Hmm, yeah, I can see why you might want some better handling. The existing tag (and the new tag) are cheating a bit with just splitting the wikimarkup lines on newlines, which can be a bit suboptimal:
<pp?oem>
line 1<ref>reference line 1
this will be a new line</ref>
</pp?oem>
The thing that attracts me most about a "bodge mark 2" like the ppoem tag, rather than a full-blown table-style syntax is that it would be a drop-in replacement in 99% of cases. Conceivably, you could also think of a syntax like this to apply styling to the lines:
<ppoem>
line 1
|style="text-align:center;"|line 2 (centered)
</ppoem>
A "poem table" syntax like this might work and be more robust since it doesn't use line breaks to break up lines and stanzas, but you'd have to be more careful in applying it to existing pages and it's not so easy to proofread as every line needs markup:
<poem>
| line 1
| line 2
| line 3
|- 
| line 4
</poem>
Inductiveloadtalk/contribs 20:20, 1 January 2021 (UTC)

Eumenides request[edit]

Could you also generate a DjVu from (external scan) and upload locally to File:Eumenides (Murray 1925).djvu? You can compare against File:Choëphoroe (Murray 1923).djvu for much of the markup. I have checked the scan, and there seem to be no problems. I just ask that you use straight quotes and apostrophes, rather than smart quotes and curly apostrophes, if you can manage that. The other two volumes in the set have text layers with straight quotes, and it will go faster if I do not have to check them as I go.

There is no rush. I will finish his Agamemnon volume before starting on the Eumenides. --EncycloPetey (talk) 03:28, 2 January 2021 (UTC)

@EncycloPetey: Now up at Index:Eumenides (Murray 1925).djvu. Since this edition used curly quotes that's what the OCR is going to pick up. I do have some hooks where I could manipulate the text in the conversion, but I'm not immediately comfortable doing that since it wouldn't match the edition. But since curly quotes can be unambiguously converted to straight I'm thinking we can find some half-decent client-side way to fix them. I'll get back to you on that score… --Xover (talk) 12:53, 2 January 2021 (UTC)
Thanks. --EncycloPetey (talk) 15:46, 2 January 2021 (UTC)

template Header[edit]

Hello. I have noticed that the TOC at The Czechoslovak Review/Volume 1 which is wrapped in the float right template has recently appeared above the header, although it was not causing any problems some time ago. Is it possible that the reason is this change in the {{Header}}? --Jan Kameníček (talk) 11:20, 3 January 2021 (UTC)

Solved :-) I just moved it below the header and now it is OK :-) --Jan Kameníček (talk) 12:06, 3 January 2021 (UTC)
@Jan.Kamenicek: :-) --Xover (talk) 12:08, 3 January 2021 (UTC)

#switch for Template:PD/US ?[edit]

Hi. I am on the run, can you please look at Template:PD/US and see if we can easily migrate it to #switch:. All the #ifexpr: are going to be expensive. — billinghurst sDrewth 05:04, 6 January 2021 (UTC)

@Billinghurst: That won't work here because #switch can only compare equality and the logic in the existing #ifexpr is evaluating relative size (greater than, less than, etc.). I can take a look at moving that logic into a Lua module though, to see if the same logic expressed there will be easier to read and maintain. It should definitely be less expensive computationally since we can do variable assignment and relative comparisons natively, but it might not necessarily be more readable and maintainable for people used to template syntax. --Xover (talk) 08:11, 6 January 2021 (UTC)
Thanks, thought it was the case, though didn't have the time to get my brain into that space. It is not native thinking for me. — billinghurst sDrewth 23:49, 6 January 2021 (UTC)

File:Comp3-pre-0415.pdf[edit]

Index/Page clean up required. — billinghurst sDrewth 11:47, 11 January 2021 (UTC)

Done. Thanks. --Xover (talk) 11:55, 11 January 2021 (UTC)

GLAM presentation[edit]

Hi, [kind of continuing the conversation we sort of started a few months back about sorting a process for GLAM contributors… ] I've just arranged with @Giantflightlessbirds: to go down to their part of the country (Westland) in a month's time and give a seminar on Wikisource to the library staff and volunteers. Covering how it can be used to rescue and make available collections of PD texts. One of the local historical societies has acquired some journals of early settlers of their region, and there is a collection of local publications that aren't available anywhere else. My thinking is that I'll post the seminar somewhere on here after I've delivered it so that others can use it—or parts of it—when they have similar opportunities (pandemics permitting). GFB will also use the content to reach out to GLAM colleagues across the country and see if we can increase the NZ content beyond my own contributions. Cheers, Beeswaxcandle (talk) 07:43, 14 January 2021 (UTC)

Migration templates[edit]

Do we have a template for "this version is a dodge-ass PG version. We also have a scan, we should use that one day"? Or should we make a versions page with the PG and scan version (as redlink + {{small scan link}})? The latter involves quite a bit of page shuffling if you want to place the versions page at the base name. Inductiveloadtalk/contribs 13:20, 19 January 2021 (UTC)

@Inductiveload: No. And we have people who are not yet persuaded that scan-less dodgy Gutenberg texts are anything but a legitimate edition on par with real editions. So the options are a) to complete a new scan-backed edition and then speedy the dodgy text as redundant (which the policy allows) and hope nobody notices so that we have to have that discussion again, or b) to complete a new scan-backed edition and transclude it over the dodgy text and hope nobody notices and makes a fuss so that we have to have that discussion again. I'm opting for option b as the, hopefully, least controversial approach (besides, I like preserving edit history when it doesn't cost anything).
But whatever we do, we should definitely not preserve the Gutenberg texts once we have a proper edition. They're grandfathered in, and there's a legitimate argument that they're better than nothing (I disagree, but it is a valid argument), but once we have at least one proper scan-backed edition they definitely need to go.
However, I think we could create a "this text is not scan-backed / Proofread [and should be replaced with one that is]. Here's a (scan|index) to work from." without stepping on too many toes. --Xover (talk) 13:39, 19 January 2021 (UTC)
A wild pair of secret templates appear! {{Project Gutenberg}} and {{Second-hand}} have existed since 2012. Inductiveloadtalk/contribs 14:18, 19 January 2021 (UTC)
@Inductiveload: Ah, yes, but these are talk page templates (nobody checks the talk page). --Xover (talk) 16:42, 19 January 2021 (UTC)
Maybe we should move them to the main pages? After all, that's where the other ones go. Inductiveloadtalk/contribs 16:43, 19 January 2021 (UTC)
@Inductiveload: I'd be leery of doing that en masse, but adding namespace detect and subtly nudging new uses towards mainspace sounds manageable. Maybe. --Xover (talk) 16:47, 19 January 2021 (UTC)
I think all you'd have to do is switch tmbox for ambox in the namespace detector. Inductiveloadtalk/contribs 16:48, 19 January 2021 (UTC)
Also, what do we do with the {{textinfo}}s from transitioned pages? unsigned comment by Inductiveload (talk) 18:35, 19 January 2021‎ (UTC).
@Inductiveload: tmboxambox: ayup. {{textinfo}} should go once it no longer reflects reality. If the talk page is otherwise empty we can just nuke it entirely. --Xover (talk) 20:03, 19 January 2021 (UTC)

Fix to Template:Categories by date[edit]

My brain is busy doing other stuff, could you please look at this template, as it is not marking BCE in the top runner for 1st C BCE per Category:10s BCE works and others

It is also categorising incorrectly into Category:1st century works whereas it should be in Category:1st century BCE works‎, guessing something about the year parameter being empty and the BCE being outside of the determining criteria.

Also guessing that all "(works|deaths|births) by ..." templates have same issue and this will fix them all. Thanks if you can. — billinghurst sDrewth 10:19, 26 January 2021 (UTC)

@Billinghurst: I'll take a look. Could you point me at one specific page that is incorrectly categorised by the template, and—just to make sure my dumb brain doesn't get me lost on the way—list the incorrect category it is currently in and the correct category it should be in instead? Looking at the cats you already gave, the only obvious problem I saw was Category:10s BCE works which was being categorised into Category:1st century works when it should have been in Category:1st century BCE works. But that one was due to a missing era parameter in the template invocation, so that doesn't tell me much about the bug in the template itself. --Xover (talk) 10:42, 26 January 2021 (UTC)
It was the runner in the category, and you fixed both. Sorry am stuck in fixing some of the NLS works that I have tripped upon when fixing something else, upon something else, while doing something else. <sigh> the depths of issues at times. Thanks. — billinghurst sDrewth 11:16, 26 January 2021 (UTC)

Another toy[edit]

Handy script for editing without going into edit mode:

// maintain script has no purpose on edit or special
if ((mw.config.get("wgCanonicalNamespace") !== "Special") && ["edit", "submit"].indexOf(mw.config.get("wgAction") === -1) {
  mw.loader.using(['ext.gadget.utils-difference', 'mediawiki.util', 'mediawiki.api',
      'oojs-ui-core', 'oojs-ui-windows', 'oojs-ui-widgets']).done(function() {
    mw.loader.load("/w/index.php?title=User:Inductiveload/maintain.js&action=raw&ctype=text/javascript");
  });
}

Let me know if you can think of useful things to preload into it if you find it useful. I will eventually make it possible to plug in your own tools. Inductiveloadtalk/contribs 00:05, 28 January 2021 (UTC)

"Invisible" maintenance text[edit]

Hi. Magnus has created a template for me that enables me to do some main subject finds and additions User:Magnus Manske/topicmatcher test, the issue is that I need to poke it into the works to function, which I can manage, though it leaves a visible message, and one which I don't wish to impose on the punters. Can we create an invisible class (#mw-hidden-template ?) that where a maintenance user can activate its visibility through tweaking the class in their common.css? While here, what would be the means to force a template's text to the outside of the header/footer templates, if that was what I chose to do. Thanks. — billinghurst sDrewth 11:50, 9 February 2021 (UTC)

FWIW I have temporarily stuck MM's template inside template:Compendium of Irish Biography for the means of testing and design. — billinghurst sDrewth 11:53, 9 February 2021 (UTC)
@Billinghurst: Hmm. Making it invisible by default with possibility of override from user css should be doable, but I'll have to dig a bit to see how best to do it. Moving the displayed text outside the other output from the same template is a bit trickier, so there we'll need JS (or possibly global CSS), but depending on the details of what's wanted we can probably find some way to do it. --Xover (talk) 12:03, 9 February 2021 (UTC)
I thought that we had a class to do it. Don't fuss the placement too much, that is the least of the issues, and housekeeping only, as I was just thinking search engines that ignore invisible. — billinghurst sDrewth 12:06, 9 February 2021 (UTC)
@Billinghurst: {{topicmatcher}}. It wraps the link in a span with the id #topicmatcher, and imports a TemplateStyles stylesheet that sets that id to display:none (so it's hidden by default). You can override it in user css using #topicmatcher {display: inline !important;} (don't forget the !important, it's needed to override the templatestyles here). --Xover (talk) 13:42, 9 February 2021 (UTC)
Thanks for all the work, all seems to be working well, I have done some adaptation for the cats and stuff, and some annotations about use. — billinghurst sDrewth 00:22, 10 February 2021 (UTC)

Using a template is good in that it doesn't need any fancy JS or CSS to show, but it does inject it into the page area. Would it make more sense to have it shunted to the .mw-indicators area by JS? Perhaps adding a class like .wd-tool-template or similar so that further tools can be added and get the same treatment (while keeping the ID for granular targeting if a user only wants one)? Then it can just be:

$(function() {
  $('.wd-tool-template').prependTo('.mw-indicators')
}

Inductiveloadtalk/contribs 14:24, 9 February 2021 (UTC)

@Inductiveload: What I really wanted to do was implement the whole thing as a gadget instead of a template so that only those that want it get it. But I couldn't find any sane way to access WD from JS, so this was more of a quick hack solution. I think right now Billinghurst is the only one that wants the topicmatcher link, so having it in every page (in the relevant works) for every user is a bit overkill. --Xover (talk) 14:35, 9 February 2021 (UTC)
Hmm, I've have a look-see - Wikidata is my next target for User:Inductiveload/maintain.js and User:Inductiveload/popups reloaded.js anyway. Inductiveloadtalk/contribs 14:57, 9 February 2021 (UTC)
Here's a first draft: User:Inductiveload/topic_matcher.js. Seems to work OK, though my rubbish SPARQL skills seem to indictate there are actually not a lot of examples:
SELECT ?item ?label WHERE {
  ?item wdt:P1433 wd:Q19020593.
  MINUS { ?item wdt:P921 [] } .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en".
    ?item rdfs:label ?label.
  }
}

Go to query page

An auto-item creator based on the page base name would be a natural next step. Inductiveloadtalk/contribs 16:02, 9 February 2021 (UTC)
For most of the works that I am doing we know basepagename and its Q, subpagename, authorname and its Q, volume (if exists) and page number on the view, then we are most of the way there. The label for is always subpagename, and the description is usually generic for biographical articles of a work. There are some variations, eg. different contributors, thought that will be on the page where known, or stating unknown author. Depending on whether we are doing biographical or some other topic, we can get title and subtitle, for example Andrews, George (Q105143209).

Also noting that WEF framework has an IMPORT DATA function, so even if we can arrange the underlying data per page so that can inhale something would be fantastic, as I usually then manage the data in article across to biographical article. Noting that I have found no documentation for WEF's import data, and author has not been responsive to my naïve questions. — billinghurst sDrewth 00:43, 10 February 2021 (UTC)

Re your query, running it in on IrishBio is the fail, as much of that work predates WD, and I usually work item by item so we don't have incompleteness. I chose that work as it had the derived template. The Dictionary of Australasian Biography, 1892 (Q19084840) is a better work to run your query, though as a work it needs to be migrated to its template, just on my TO DO list as I work on other priorities.

SELECT ?item ?label WHERE {
  ?item wdt:P1433 wd:Q19084840.
  MINUS { ?item wdt:P921 [] } .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en".
    ?item rdfs:label ?label.
  }
}

Go to query page

billinghurst sDrewth 00:50, 10 February 2021 (UTC)

I think QuickStatements is probably the more usual way to import ad-hoc bulk data to Wikidata, and it's explicitly easy to post data into. If there was a tool to generate data batches, that's what I would target. Inductiveloadtalk/contribs 01:03, 10 February 2021 (UTC)
*But* that is only functional if you want to create a batch. I am more looking to a functional and integrated addition of the data, linking to WD, data extraction to the subject item, and a link at WP. I also do them as I create them per transclusion. — billinghurst sDrewth 01:45, 10 February 2021 (UTC)

Capitalization templates[edit]

Hi Xover. Thanks for the detailed edit summary here. I am a bit confused though — the guidelines at Help:Templates#Text Case make no mention of the principles you described; in fact, it seems to suggest the contrary:

While it is OK to omit the use of the upper case, lower case, and capitalize templates altogether, where they are used they should only be inserted where the choice of case in the work is a formatting decision (ex. special formatting in a work's title), rather than as required for spelling or grammar (ex. in an acronym).

The edits I made were precisely where capitalization was used in the source for styling/formatting, rather than because it was semantically meaningful. And the language used in that passage ("it is OK...") hints that omission of the templates is accepted, but not necessarily preferred.

Am I missing some community discussion or additional help page where the position you describe is explained in more detail? Cheers, Waldyrious (talk) 19:57, 10 February 2021 (UTC)

@Waldyrious: That page is a help page for the various formatting templates available. It's trying to tell you that while these templates exist, their mere existence should not be read as any requirement, or even suggestion, to use them. But if they are used, they should only ever be used when text case is a stylistic matter. As for dos and don'ts we (to my great personal annoyance) tend not to document stuff like that in any structured form because the community is generally averse to anything that smacks of "bureaucracy". --Xover (talk) 21:24, 10 February 2021 (UTC)
I don't know why we bother putting {{uc}}, {{lc}} and {{capitalize}} up there in pride of place. They're almost entirely useless (or worse) in 99.9% of actual content-space use and are mostly actually useful in templates for technical chicanery to do with string matching. {{sc}} and {{asc}} are nearly always the correct choice. Inductiveloadtalk/contribs 23:10, 10 February 2021 (UTC)
@Waldyrious: That is a help page in our Help: namespace that talks about our common templates, the place for style and guidance information is WS:Style guide; our policy stuff is housed in the Wikisource: namespace.

@Inductiveload: Part of the reason that templates exist is they are there is to stop the use of {{lc:xxx}} and {{uc:xxx}} which was a particular problem; and to allow a copy and paste of the text without getting all the case. If you think that the order is wrong, or less relevant, then change it. — billinghurst sDrewth 03:45, 11 February 2021 (UTC)

Hmm, I'm intrigued by what you mean with "and to allow a copy and paste of the text without getting all the case" — the main reason I thought it made sense to use these templates in content-space was precisely so that copying the transcribed text wouldn't bring non-semantic casing along (e.g. the all-caps first word after a dropped initial). Am I interpreting this incorrectly? --Waldyrious (talk) 18:25, 12 February 2021 (UTC)
Furthermore, Wikisource:Style_guide#Formatting says (emphasis mine):
Formatting should be flexible and not interfere with access to the document, knowing that we are trying to reproduce works for modern readership, not provide facsimiles of the time and place. See also Help:Adding texts, Help:Beginner's guide to typography, and Help:Editing.
...which (1) to my eyes seems to suggest that stuff like preserving original capitalization is not an explicit goal (am I reading this incorrectly?), and (2) links to Help pages as further guidance, so it's not clear that one's not supposed to follow what they describe. Am I missing some strategy that would allow me to follow the existing documentation without inadvertently stepping over lines? --Waldyrious (talk) 12:04, 13 February 2021 (UTC)
Thanks for the replies, guys. Xover, I can see your reasoning, and I agree that it's quite a sorry matter that these guidelines are not explicitly documented. I've have several edits reverted (including in the Help namespace, where my goal was to provide others the additional help I wished I had had myself!), because I didn't know the unspoken local rules, and although I'm an an experienced editor in other Wikimedia projects, it's always an unpleasant experience. I can totally see newcomers will less thick skin being entirely turned off after trying to contribute constructively, only to have their edits reverted. Let me be clear — I am not complaining about your reversion, which I can understand; but the documentation would be really helpful. I still recall that comprehensive guidelines documentation was the main reason I was able to get by smoothly when I started out as an editor in English Wikipedia. Anyway, If you once again find yourself advocating for more explicit documentation of guidelines, I'd be happy to add my voice to that effort. --Waldyrious (talk) 18:25, 12 February 2021 (UTC)
@Waldyrious: You hit it square: for anyone who haven't been involved in the long slow organic development of the current practice it feels arbitrary and impenetrable, and for those who have it's nearly as frustrating because all these newcomers keep arguing on points you "know" are settled. There are good arguments on the other side of the coin too, of course, but for my part I think we've skewed too far towards one extreme of the scale.
In any case, glad to hear we've not discouraged you; and do please feel free to hit me up here if you need help with anything or something else seemingly doesn't make sense. --Xover (talk) 19:00, 12 February 2021 (UTC)
Thanks — will do! --Waldyrious (talk) 11:52, 13 February 2021 (UTC)

Lang's Fairy Books[edit]

First of all, thank you for putting them together or back together.

Second, wikisource used to have a link at Andrew Lang's Fairy Books which (iirc) I accomplished via a subpage from the main and then "transcluded" that to the main.

Maybe the data link was gone when you removed it.

I would like to do that same thing for other items, like biological species, stars, planets, representations of mathematical formulas, etc. -- items that don't require a whole portal but are lost to the rest of wikidom by being available only at the broader subject (portal) to which they belong.

Was there a wd link to the fairy book subpage?

Will you not collapse things in the future if they have a wd-id?--RaboKarbakian (talk) 14:49, 13 February 2021 (UTC)

Category:Wikipedia message box parameter needs fixing[edit]

Not certain that this worked. The black magic in the template is not something I have dug into. Apart from anything else that it says WP. — billinghurst sDrewth 14:48, 16 February 2021 (UTC)

@Billinghurst: Thanks. It didn't. That's partly why I'm doing a spin through these now and making them more consistent. --Xover (talk) 15:04, 16 February 2021 (UTC)

I need your response please[edit]

At Wikisource:Copyright_discussions#Remarks_by_President_Biden_in_a_CNN_Town_Hall_with_Anderson_Cooper_deleted it seems like you've made a claim that really requires some substantiation and clarification. As an admin, I would hope that you see the value in having a clear understanding of the copyright status of works on this project. In case you didn't get the ping message somehow, I'm requesting on your talk page that you answer my questions at the original post. Thanks. —Justin (koavf)TCM 16:36, 21 February 2021 (UTC)

@Koavf: It's on my list. --Xover (talk) 17:43, 21 February 2021 (UTC)
Thumbs up emoji. —Justin (koavf)TCM 17:48, 21 February 2021 (UTC)