User talk:Xover/Archives/2022

From Wikisource
Jump to navigation Jump to search
Warning Please do not post any new comments on this page.
This is a discussion archive first created in , although the comments contained were likely posted before and after this date.
See current discussion or the archives index.

Large numbers, depression and coping mechanisms

[In response to your comments elsewhere, but which I am too lazy to get diffs for, and not wishing to derail unrelated discussions with personal observations]

I find a good coping mechanism for the Red Menace Page Problem is to consider that fundamentally, a red page and no page may be considered much the same: the text of that page does not exist as far as the mainspace is concerned - either it's trapped in "latent form" in the OCR layer or even just in the image data or it's squatting in the Page NS.

We have two main "kinds" of red pages (as you almost certainly know):

  • Copy-dumps: 250k of which are from one particularly dedicated user (and I assume a worn out mouse button)
  • Splits: text that is almost correct, but cannot be said to be "proofread". Functionally many of these pages are yellow at heart. Most of the 2021 bump are these.

I would say that while neither particularly distress me personally much, the latter is certainly not too much to worry about and is just another incremental step towards green pages, albeit one that we don't capture as a colour change.

The former is only really a problem when the index needs surgery and now you have 500 red pages to shuffle about for no good purpose just to insert a missing page. Indeed, there are some people who think that OCR-dumping reference works is a good idea for the purposes of search-ability, which isn't a completely insane concept to me, since while CirrusSearch _can_ search the text of a PDF or DJVU, it won't then take you to the right Page page it just tells you (matches file content). I do wonder if there's a way to get PRP to present raw OCR for missing pages, without having to bot-spam thousands of pages, though I doubt CirrusSearch would know how to deal with it.

Anyway, that's How I Learned to Stop Worrying and Love the Bomb. Inductiveloadtalk/contribs 10:45, 21 December 2021 (UTC)

@Inductiveload: Yeah, I've been clinging to that reasoning for a while, but the problem is just growing too too untractable. The key difference here is that redlinked pages attract new contributors, and yellow pages attract a few dedicated Validators (too few, but that's a more tractable problem), but "bluelinked" "Not proofread" pages actively turn away contributors. Those pages are never going to get proofread (barring a few minor exceptions). Even the Match&Split pages have this problem, and while they may contain slightly "better" text than raw OCR, they still are not actually proofread against the scan (except in the roughest most mechanical sense), much less being consistently formatted. Thus, just Match&Splitting something from mainspace in practice does nothing to improve quality and prevents (discourages) the rest of the work from ever getting done.
Our Red Menace is now growing at a rate far outstripping our generation of actual quality content, which means that in just a few years a visitor to Wikisource will find that it consists of "mostly garbage, with maybe an occasional nice text or two hidden away in the haystack somewhere".
I am actually a lot less fussed about the need to shift pages because that is a problem that can be reduced through technological means; I just haven't found the time to hack up the scripts yet. For DjVus it is even feasible to make an interactive tool to let arbitrary contributors perform simple surgery on the file. The lack of decent MW support for gadgets (including a decent story for a UI widget library) makes it a lot more fiddly to do than it needs to be, but it is absolutely doable.
In any case, I do try not to let it push me into mouth-frothing ranting (too often), but some days I simply wallow in despair. For example, I was riding a nice warm glowing wave of good feelings after having observed how well the Monthly Challenge had turned out, and in spite of my personal misgivings at its inception, when I realised that even if the MC manages to double its output, the number of "Not proofread" pages will surpass our total number of actually proofread pages within a year or two. Now I'm deep down in The Valley of What's The Point, Really, When It Comes Down To It? and pondering the pros and cons of ice cream-induced diabetes versus late-onset alcoholism (As a wise man once wrote, "Scotch is for sipping, relaxing, and deep thoughts. Bourbon is what you drink to get through the pain.").
The only "comfort" I have is that once we've turned into mostly a big tubful of crap the problem will become obvious to everyone, at which point there will be a mass purge of all such page at the "mere" expense of ongoing bad blood and the loss of chunks of our "abundant" contributor base. Oh well. One day the pandemic will be over, sane people will be voted into political office, and my unbounded optimism will assert itself. Real soon now, I'm sure. Xover (talk) 11:31, 21 December 2021 (UTC)
To be clear, I certainly do not support letting thousands of red pages pollute the mainspace, I just don't really have the emotional energy to cultivate strong feelings about red pages that are quarantined (ugh so bored of that word) in Page namespace. In fact, I can see that allowing the text to exist there might actually draw suckers fresh meat newbies in via, say, Google searches for particular phrases. More than once, my only Google result for something has been a red enWS Page NS page.
That said, I do, in general, support a purge of abandoned crapbooks in mainspace, though my personal preference is to find a scan first and make sure it's in place before deleting so the blue link to a crap page is replaced by a redlink and a blue {{ssl}} on the author page pointing to a nice, tidy, inviting index page.
We do indeed really need to clean up some of the mainspace crap, and also try to transition some almost-crap to "decent" status. This is actually feeling slightly better recently as the MC is, though slowly, shining a pen-light under the dark furniture. Banging on another of my drums that I'm sure everyone finds tedious, Category:Ready for export has potential to help there, since if something's RfE, it's probably also "pretty good" in general.
Also doing something about the bajillion SCOTUS pages would be nice because if I press Special:RandomRootPage one more time and see United States v. in the title, I may not responsible for my actions. I'm not sure what we can do about, but my Soapbox Of Grumpiness +1 doesn't have space for chicanery like solutions.
Also, some kind of outreach must be needed, because how do projects like PGDP attract so many people when their interface is so dreadful and you have to deal with honest-to-God self-assigned "Project Managers" for each work? Inductiveloadtalk/contribs 11:57, 21 December 2021 (UTC)
@Inductiveload: The way it goes is this: someone finds a page containing some random fragments of a work as barely-formatted-above-OCR, and proposes it for deletion. Two people vote to delete it, one of whom provides a scan and a basic index as an alternative. A third person then shows up and Match&Splits the crap into that index. The bathrobe brigade then votes to keep it because it is now scan-backed. Net result? We have a fragmentary work, with the same crap barely-above-OCR text (ok, I exaggerate for effect; sometimes the quality is more or less decent), sitting in mainspace. The Index: has random red "Not proofread" pages, which, as I keep repeating turns people off from working on that Index. Ever. The net result? The crap copydump is now a complicated crap copydump that actively discourages improvement. I appreciate the philosophical differences and that reasonable men may disagree on the individual case. I do. (Don't nobody reading feel I am bitching at you, personally!) It's when you look at the net effect in aggregate that the individual little droplets turn into a dark and bottomless ocean that I have no idea how to ever drain.
If all we had to worry about were old pre-proofreading texts I would feel much better. That stuff is a backlog that can eventually be nibbled to death by kittens at some point before the heat death of the universe. It's the continuing and accelerating accumulation of crap that gets me down. We need to set standards and stick to them, instead of this monomaniacal hoarder approach.
Gutenberg has first-mover advantage, and mindshare (wiki = Wikipedia; we're invisible next to big sister); and it operates in a conceptual framework that normal people can relate to. Even the "Product Manager" nonsense is something people understand, and most people will most of the time appreciate being led (less thinking, less responsibility, and genuinely less effort: see Monthly Challenge). Our major advantage is our ability to provide higher-quality transcriptions with clear provenance and metadata. The openness and lack of a gatekeeper are good things but mixed blessings: they also turn people off ("Everyone will see what I do!", "Really, they let just anyone do it?!?"). And our lack of a professional-ish standard for conduct and tolerance for all sorts, means we get all sorts and newcomers get exposed to all sorts, which turns away a large chunk of the population (enWP wonders why they have a gender gap, in long confrontational discussions where nobody uses any cuss words but all except the borderline psychopaths go to bed crying).
SCOTUS is probably pretty easy to deal with (in theory). Just set the standard that we host published works as published, instead of copies of excerpts from some random website. All SCOTUS opinions are published in big beefy dead-tree volumes, all of which would be subpages under the main series title. If we then allow United States v. … redirects only when there are inbound links to them (i.e. they have been mentioned by name in at least one proofread work here) we get a manageable number of redirects top-level in mainspace, that direct visitors to the context in which they were originally published. Elitism and high standards are nice in that they tend to exclude or contain a lot of crap. You just have to be on guard so it doesn't run away with you and turn into rampant snobbery and exclusionary gatekeeping. Xover (talk) 13:29, 21 December 2021 (UTC)
@Xover hey I resemble that remark! But yeah, the trick is the mainspace pages still need deleting until the pages are genuinely in a proofread state. I will admit to some idealistic VK's here, but now I'm more for nuke the mainspace, fix it in Index/Page and then put it back once it's become a complete, proofread "unit" (perhaps not always a complete physical volume, but something with a start, middle and end).
I'm not sure red pages necessarily do discourage proofreading that much, but if they do we should figure out if we can address that somehow (IDK how, stop raising that eyebrow, even just toning down the red fill might be enough - that is literally a CSS one-liner?), because, as I said, there's no practical difference between a red (non-transcluded) page full of OCR and a completely not-there page other than some poor guy spent an entire evening mashing the Save button for some reason.
If the red not-transcluded page has good M&S text from the right edition that's within spitting distance of proofread and basically just needs checking that it's actually on the right page and a chapter title is centred properly, to me that at least feels like progress has been made somewhere (even if it's not recorded on the scoreboard of "mainspace or yellow pages").
So, anyway, my feeling is swinging the axe harder in NS0 but an avuncular tolerance to Page NS high spirits is a way to inner peace. Inductiveloadtalk/contribs 14:11, 21 December 2021 (UTC)
@Inductiveload: I did indeed have you in mind for one of the roles of my little epistolary novel above. But as mentioned, I don't really complain of any individual acts in any individual discussion by any individual contributor. Different opinions are fine, are mostly a strength of our process, and most of the time I am just grateful for any and all participation (I want lots and lots more of it, not less, irrespective of the positions taken).
I should do the work to produce actual stats on the "Not proofread" → "Proofread" transitions. In my experience poking around these recesses—of which I do a depressing amount—it almost never happens. I also know several contributors that flat out refuse to work on such pages; and I sympathise because I feel much the same, except I have a pretty low threshold for just nuking whatever is there and replacing it with ad hoc OCR text from the gadget (my ocrtoy also does a few banal fixups). Which is, incidentally, a neat little trick when validating: the diff will then show you the usual noise, but also sometimes trouble spots, like punctuation that got mixed up in the proofreading, that are hard to spot otherwise. The one exception is that some contributors like to work in multiple passes, saving draft pages as "Not proofread" as they go, and eventually turn them yellow when they're happy with them. These works tend to get proofread at about the proportion any index worked on by a single contributor does. But that's peanuts compared to all the Match&Split and semi-automated creation of essentially raw OCR pages.
Hey, maybe Match&Split bot should tag each page with a timer template so they get speedied if not Proofread within about a month or three? :)
In any case… Yeah, if we could at least keep mainspace clean—except for the inevitable edge-cases—I'd feel a lot better. In the mean time, I should probably take up meditation or something. Or, hey, pay a professional to listen to my ranting. Xover (talk) 15:24, 21 December 2021 (UTC)
it almost never happens: this is certainly true, but that (rather interesting) research question must be relative: is it more or less true than it is for the literal millions of pages that don't exist at all? I.e. is a red page more or less likely to go yellow than go directly from 'missing' to yellow (maybe via red with the same user).
Also, we definitely should distinguish between "almost there M&S of actually good text" and "OCR dumping" (with or without an M&S step), because operating on those pages is a very different proposition: the former is more akin to validation, the second is basically the same as if there were no pages at all.
Funnily enough, this would be actually fairly easy to compute such stats with the the change-tags page status system that's kicking about half-baked. You might have noticed that edits now get a special tag when the status changes: this is the first step of the system. The second needs a DB script to be run to back-fill the tags for old revisions. Probably that isn't going to happen before I get a proper job though (I kind thought it would have been done a few months ago, but I've had that optimism smashed out of me now).
While I'm typing semi-formed thoughts: you know what might be a bit less off-putting? A template to go above the page list (a friendly blue one like {{printer errata}}) that says "this work has had a third-party proofread text inserted instead of the OCR, but cannot be marked proofread until it is verified. Proofreading this volume may need fewer corrections to be made than usual for a work marked 'not proofread'". Inductiveloadtalk/contribs 15:43, 21 December 2021 (UTC)
You both have helped to transform this website into a much better place and I'm grateful that both of you are here. As far as I see it, red-pages come from four main sources. First, users pressing the OCR button to replace terrible OCR. This can probably be solved by a setting that would automatically reOCR pages. Second, users match-and-splitting and then not-proofreading. Depending on the source text, this can be either be a small improvement or a fairly large mess. Third, users too shy to change the status to Proofread. These are often very high-quality. Fourth, batch imports from PG mostly (or exclusivity) done by me. That's probably the huge bump in the number in red pages from March/April of this year. These are fully-proofread texts that need formatting. After last months MC, I realized that this does not lead to faster proofreading and I've largely abandoned the idea.
Honestly, I don't think that red pages are such a huge issue. Most of them sit in the Index/Pages ns and users never see it. The larger issue is the sheer quantity of garbage pages in ms. I feel that OCR dumps, unformatted copydumps, and mostly unfinished non-scan backed texts should all be speedy deleted. Why do we need to spend weeks debating whether a one-page OCR dump should be deleted? Then, we can work on replacing all non-scan backed texts with scan-backed editions starting with the popular ones first. Languageseeker (talk) 16:23, 21 December 2021 (UTC)

Saw this today and thought of you: phab:F34906436. It's actually a pretty good sign: the accelerating width of the yellow band is fairly healthy, no matter what one may think of the red! And while you can read a lack of progress into phab:F34906444, you can also at least say the problem is looking contained (and probably not so bad if you exclude the para-infinite SCOTUS dumps). Inductiveloadtalk/contribs 00:14, 4 January 2022 (UTC)

Fun fact, there are 52k SCOTUS pages, so that's 25% right there (petscan:21124236). Inductiveloadtalk/contribs 00:55, 4 January 2022 (UTC)

How we will see unregistered users

Hi!

You get this message because you are an admin on a Wikimedia wiki.

When someone edits a Wikimedia wiki without being logged in today, we show their IP address. As you may already know, we will not be able to do this in the future. This is a decision by the Wikimedia Foundation Legal department, because norms and regulations for privacy online have changed.

Instead of the IP we will show a masked identity. You as an admin will still be able to access the IP. There will also be a new user right for those who need to see the full IPs of unregistered users to fight vandalism, harassment and spam without being admins. Patrollers will also see part of the IP even without this user right. We are also working on better tools to help.

If you have not seen it before, you can read more on Meta. If you want to make sure you don’t miss technical changes on the Wikimedia wikis, you can subscribe to the weekly technical newsletter.

We have two suggested ways this identity could work. We would appreciate your feedback on which way you think would work best for you and your wiki, now and in the future. You can let us know on the talk page. You can write in your language. The suggestions were posted in October and we will decide after 17 January.

Thank you. /Johan (WMF)

18:14, 4 January 2022 (UTC)

Blackwell

Hi, can you take another look at these as the files at IA may have been updated.

Index:A curious herbal volume 1 blackwell.djvu Index:A curious herbal volume 2 blackwell.djvu

Using the hi-res scan option here makes these readable and it would be nice to match them up as far as possible. :) ShakespeareFan00 (talk) 21:36, 10 January 2022 (UTC)

@ShakespeareFan00: I'm not sure what you mean here. I fixed up and regenerated Vol. 1 of that yesterday (but see the note on the missing plates on the index talk page), and am planning to do Vol. 2 next, based on the scans that were on IA as of the last few days. Are you saying there are now even better resolution scans of this on IA uploaded in the last few hours? Xover (talk) 07:13, 11 January 2022 (UTC)
The original scans on IA appear to be in very high quality compared to the DJVU. Some losses are inevitable in the compression process, but I was wondering if this is a work where a different approach to using the DJVU may be warranted. I've moved the files back to Check file, as you'd put placeholders in the first file, and I will manually check Volume 2 against the IA copy. ShakespeareFan00 (talk) 08:15, 11 January 2022 (UTC)
@ShakespeareFan00: The original upload of these were in utter crap quality: an ~8MB DjVu file for ~800MB+ of scan images, that had clearly been compressed into mush. I have now downloaded the original scan images (after 5 January when they were updated at IA) of Vol. I, cropped out the black borders, and compiled them into a DjVu with high-quality settings instead of high-compression settings. I have also inserted placeholders for the missing plates, uploaded the new version to Commons, and updated the Index: and its pagelist here. I plan to do the same for Vol. II, but it's at least a couple of days worth of work even when IRL commitments don't interfere, and I haven't started yet so, it'll be a while until it's done. --Xover (talk) 08:34, 11 January 2022 (UTC)
(Aside: Do we have a high quality cursive font I could set for this work when formatting?) ShakespeareFan00 (talk) 08:42, 11 January 2022 (UTC)
@ShakespeareFan00: No. We've requested the addition of a cursive webfont, but for various reasons it is not likely to happen in the near future. The most we can do is specify that the style is a generic cursive and hope we get something better eventually. Xover (talk) 08:58, 11 January 2022 (UTC)

Block

Please block 74.75.9.45 (talk · contribs · deleted contribs · logs · block user · block log · SUL) can't find an admin noticeboard. Cheers--Synoman Barris (talk) 21:04, 19 January 2022 (UTC)

@Synoman Barris: Thanks for the notice. Admin noticeboard is at WS:AN. Xover (talk) 21:38, 19 January 2022 (UTC)

1926 Shakespeare

It's that time of year again, On the Yale volumes entering public domain, only Titus Andronicus has had its copy released on IA.

Could you please generate a DjVu from this copy, and upload it to Commons as File:Titus Andronicus (1926) Yale.djvu?

No, rush, but this is likely the volume I will tackle myself this year, or at least be the first to tackle, as each year I work to complete at least of of them. Thank you. --EncycloPetey (talk) 05:54, 10 January 2022 (UTC)

@EncycloPetey: Index:Titus Andronicus (1926) Yale.djvu. I've done basic sanity checks, but not a full quality check. Xover (talk) 10:04, 10 January 2022 (UTC)
Thanks. --EncycloPetey (talk) 16:44, 10 January 2022 (UTC)

Shakespeare of Stratford (1926) is now available. This scan has been visually checked by me for completeness and correct sequence. The Commons file should be named File:Shakespeare of Stratford (1926) Yale.djvu. Note: this volume is in a completely different format from the rest of the series, because it covers the sources for information about the Bard. --EncycloPetey (talk) 21:51, 10 January 2022 (UTC)

@EncycloPetey: This copy appears to be missing p. v, the first page of the ToC covering chapters I.–XIX (pp. 1–26). I checked the raw scan images and it looks like it's missing from the copy, and not something that was fatfingered in the scanning. Xover (talk) 08:46, 11 January 2022 (UTC)
Hmm, yes, it's missing an entire page: the copyright page and p. v should be between the title page and p. vi. Xover (talk) 08:55, 11 January 2022 (UTC)
@EncycloPetey: I have multiple other scans from which I can patch in the missing pages, for both the 1926 first edition and the 1947 third reprint, but since we don't have the copyright page from this copy I can't tell which specific printing it is. Most of the other scans are also typical scribbled-on, over-crushed black&white Google scans, but I suppose that's something we might have to live with. I can also crib a somewhat better scan of the two pages from a books-to-borrow scan on IA, but that's from the 1947 reprint so that's only an option if we conclude our copy is the reprint. Xover (talk) 09:15, 11 January 2022 (UTC)
While I would prefer a 1st edition, if we can only get a reprint, then that's what we'll have to do. I suspect this is a reprint, mainly for the fact that the publication year does not appear on the title page in Roman numerals. This is only a guess, however. This is not a volume I intend to transcribe myself; it seems more like one for which you'd have the interest to complete. If that is indeed the case, then making the decision what to do would rest appropriately with you. I am disappointed with myself for having not noticed the missing page v. --EncycloPetey (talk) 13:05, 11 January 2022 (UTC)
@EncycloPetey: Aha! The first edition here does indeed have the roman year on the title page, which makes the IA copy a reprint. I'll look through the options and pick one. The one you'd found looks really nice (both the copy and the scan) except for the missing leaf, so I may land on patching it up from one of the other copies. And, yes, I was going to ask whether you had any particular designs on this one before grabbing it for myself! :-) Xover (talk) 13:24, 11 January 2022 (UTC)
@EncycloPetey: Index:Shakespeare of Stratford (1926) Yale.djvu and Shakespeare of Stratford. Xover (talk) 07:27, 27 January 2022 (UTC)
Wonderful! I've linked it to the WD data item, and added it to the "works about" section on Shakespeare's Author page. --EncycloPetey (talk) 18:18, 27 January 2022 (UTC)

Songs of the Soul

@Xover:

Hallo Xover, can I ask you again to make me formatting the pages (Page7-9) to me. That would be most important to me because if it is not right I make the whole book again. If you have time, in the contents chapter Caption II-IV is on the right. Also, I do not know how to bring the chapters to the first page as in the other projects. Thank you very much ! https://en.wikisource.org/wiki/Index:Songs_of_the_Soul(1923).pdf --Riquix (talk) 09:01, 22 January 2022 (UTC)

@Riquix: I've gone over the existing pages. For poetry you'll want to use {{ppoem}}. It's slightly complicated to get the hang of, but it is still in general the easiest way to deal with poetry here. Feel free to ask if you need help.
But from where did you get the PDF file you uploaded at File:Songs of the Soul(1923).pdf? It is lacking the requisite metadata (the file page should have a {{book}} template with all relevant fields filled in), and the file size does not match the scan of this copy available at the Internet Archive. Xover (talk) 10:37, 22 January 2022 (UTC)
@Xover:

I took it from here. https://archive.org/download/songsofsoul00swam I thought PDF I know and is also a larger file. You come to the overview at the top "Go to parent directory". I will work the other things as usual, piece by piece. Thanks!--Riquix (talk) 12:43, 22 January 2022 (UTC)

@Riquix: The PDF at IA is 2.2MB. The one uploaded here is 32.84MB. So these are not the same file. Xover (talk) 14:05, 22 January 2022 (UTC)
@Xover:

I downloaded her and uploaded again. Did not use a tool this time because it has been so badly translated to the first trials. Both have the "Dedication" at the beginning.--Riquix (talk) 15:42, 22 January 2022 (UTC)

I have repeatedly downloaded them and after a time is displayed in the info window of the file : 34.432.290 Byte (35,3 MB auf dem Volume) Should be true.--Riquix (talk) 15:46, 22 January 2022 (UTC)
@Xover:

Is that OK to connect the first three links? I would do the rest. Regards https://en.wikisource.org/wiki/Page:Songs_of_the_Soul(1923).pdf/9 --Riquix (talk) 08:40, 23 January 2022 (UTC)

@Riquix: No, you are linking to wikipages in the Page: namespace. The Page: and Index: namespaces are internal production namespaces; borrowing a theatrical term they are "backstage". Once proofreading is completed we transclude the content from the wikipages in the internal namespaces onto wikipages in the main presentation namespace (the main namespace has no prefix followed by a colon, but it is a distinct namespace all the same). Any links should be to the wikipage in the main namespace where the content will end up once done. The links I had put in the table of contents were to those destinations; they were just red because the proofreading was not complete and so the pages had not been created yet.
I am in the middle of something right now, so it'll have to be later today, but if you want me to I can go over and transclude the first few poems that are finished to illustrate how it will work and what it will look like.
PS. For links to wikipages that exist on this project, it's generally best to use internal links, like so: Page:Songs of the Soul(1923).pdf/9. Once you get used to it it's usually also a lot easier than using external links (the ones with a "http://" and a host name like "en.wikisource.org"). Both kinds work, so don't worry too much about it, but using internal links is a good habit to get into early. Xover (talk) 09:57, 23 January 2022 (UTC)
@Riquix: Ok, I've transcluded the first couple of poems at the correct locations. The links in the table of contents should be ok now. You can do the rest using the examples. Let me know if you run into trouble.
The PDF file is weird. The one at the Internet Archive is definitely just 2.2MB, so where your 32MB file comes from I have no idea. I have reuploaded the file directly from the Internet Archive so that we have a file of known provenance. At the same time I have added the required information template and moved it to Wikimedia Commons (Commons is the central media repository for all the Wikimedia projects, including Wikisource). Xover (talk) 14:17, 23 January 2022 (UTC)
@Xover:

Hallo Xover, in the files in the book, the original is now not show on the right side. So I can not compare now. To the file size change, this is apparently normal for another operating system. Unfortunately, I have not seen any English contribution.https://www.mactechnews.de/forum/discussion/PDFs-immer-groesser-als-Quelldatei-Normal-277803.html --Riquix (talk) 07:27, 24 January 2022 (UTC)

@Riquix: Oh, I see the file appears to be broken in some way such that Mediawiki is unable to process it. Strange, it looked fine when I opened it locally on my computer. I'll look into it and find some workaround.
Regarding the file size… The forum thread you linked to discusses the difference in final output file size as a result of using different PDF tools to produce it from the source material. That is, it discusses differences in file size for files that have been modified. Which is indeed my concern: the PDF file that the Internet Archive produced and the PDF file you uploaded are different sizes, so the file you uploaded has been modified in some way. Xover (talk) 12:44, 24 January 2022 (UTC)
@Riquix: Ok, I have no idea what the problem with the PDF file was. I am guessing something went wrong at the Internet Archive when they created it, but it could also be a software error in Mediawiki. To work around the problem I have downloaded the original scan images from IA and generated a DjVu (or w:de:DjVu in German) format file from it, and then migrated the index and all the associated pages there. The index is now at Index:Songs of the Soul (1923).djvu (to match the file name, File:Songs of the Soul (1923).djvu). It should work as expected now. Xover (talk) 14:41, 24 January 2022 (UTC)
@Xover:

Are you agreeing if I still insert two spaces on the page number? From this [15] to this [ 15 ] https://en.wikisource.org/wiki/Page:Songs_of_the_Soul_(1923).djvu/21 --Riquix (talk) 09:12, 26 January 2022 (UTC)

@Riquix: Yes. The precise formatting of the header doesn't really matter because it doesn't get included when the text is transcluded to mainspace. How faithfully you want to reproduce its formatting is entirely up to you. Xover (talk) 10:47, 26 January 2022 (UTC)
@Xover:

Hallo Xover, Can we do it that way, I look at me on text and then as far as I can? Such as "{{ppoem|start=follow|}}" and "stanzas". I still have to look at it. Think so for a week, I should be done. Would contact you then. So I learn that then, and you have a hopefully to correct a little less. --Riquix (talk) 15:45, 30 January 2022 (UTC)

@Riquix: I didn't understand this message. Could you try to phrase it a different way? Xover (talk) 17:19, 30 January 2022 (UTC)
@Xover:

The changes (Such as "{{ppoem|start=follow|}}" and "stanzas") where you meant in the previous History note is that okay for you if I do the end. When I finish with the text.--Riquix (talk) 18:09, 30 January 2022 (UTC)

@Riquix: Hmm. Not sure I understand still. But let me try to address what I think it might be what you're concerned about:
There is no requirement that all the pages are 100% perfect on your first pass. So long as all the text itself has been corrected, and the formatting mostly correct, that's fine. The changes I have made, and the associated comments in the edit summary, are mainly intended as explanation and instruction so that you can learn what to do. It is easier (less work) to do everything at once, if you are able to do so. But if you are not it's ok too. It only means you'll have to go back and fix some things afterward. In this case, you would see some problem when you try to transclude the poems: there would be gaps between lines where there shouldn't be gaps, or there would not be gaps where there should be gaps. It is quite alright to go back and fix any such issues at that point. It is very common to see problems when transcluding and needing to go back to the page or pages involved to fix them.
Did that help at all? Xover (talk) 18:53, 30 January 2022 (UTC)
@Xover:

Hallo Xover, I have it as far as I could make it relatively sure, the edits. There are two things that are still open, where I'm not sure and do not want to do wrong, which then causes more work. One is the "Stanzas" because I can not English as well, and I'm not sure where it starts and ceases. The other one is part IV, I do not believe the page number is right (e.g., "Foreword" is page 95 but other number show page with "Part III"). I used translation program for writing, therefore probably sometimes the text does not clearly understand. --Riquix (talk) 08:52, 9 February 2022 (UTC)

No good deed goes... (part XIX)

((fewer bugs than initially it appeared, but Gentium Plus font is wonky fer shur, but manageable(?); read at your leisure))


On the one hand, your efforts are noticed quickly.

On the other hand, &@%#%!!   or something like that.   ;-)

I thought my eyes were playing me tricks, a LAMBDA λ with a KORONIS on top λ᾽ . . . Gasp!


Started dumping the Unicode text](before "Hipp. 316 sq") in binary and delving into mysteries, and it really is a LAMBDA followed by KORONIS character,

U+03BB GREEK SMALL LETTER LAMDA character (λ)
U+1FBD GREEK KORONIS character (᾽)

Then what font is (now) active:

font-family: GentiumPlus;
font-feature-settings: "cv78"on,"cv83"on;

Then found something purporting to describe those features and font:

Gentium - Font features

(on that page do a find for "Greek alternates" for cv78 & cv83 descriptions)


Now

1) a KORONIS U+1FBD is *not* a combining character! It should not be playing king-of-the-hill. It *is* supposed to look like   , but as a single character by itself. So the font is doing something it shouldn't do, basically converting the KORONIS into a "U+0313 COMBINING COMMA ABOVE" " ̓ "

Note that there is such a thing as "0343 COMBINING GREEK KORONIS" but that is not in my text.


2) cv78: Porsonic circumflex does not work:

plain font:

Ἆ Ἦ ᾯ    ἆ ἦ ᾧ

cv78 with Gentium plus

Ἆ Ἦ ᾯ     ἆ ἦ ᾧ
showing that cv78 is not converting the
form ◌̃ aka COMBINING TILDE 0303
into the
form ◌̑ aka COMBINING INVERTED BREVE 0311
Oh good grief!! This changed in just the last half-hour to now display as spec'd. I switched to an older tab and saw tilde, then refreshed and now see inverted breve. Your change was at "07:46, 7 January 2022". Time now is approx. "00:24 14 January 2022"   Damn you network caching!!!


3) cv83: Capital adscript iota (prosgegrammeni) does not work:

plain font:

ᾼ ᾜ ᾯ     ᾳ ᾔ ᾧ

cv83 with Gentium plus

ᾼ ᾜ ᾯ    ᾳ ᾔ ᾧ
showing that cv83 is not converting the
form ͺ aka 037A GREEK YPOGEGRAMMENI (not a combiner)
into the
form ◌ͅ aka COMBINING GREEK YPOGEGRAMMENI 0345
Oh good grief again!! It changed to as spec'd, which means the display *now* agrees with the normal behaviour as seen in _other_ fonts. The Gentium doc shows as 'standard' a form I've never seen in other fonts, so cv83 is good and necessary. Thank you.


4) However, U+1FBD KORONIS is not even the Unicode-approved way to do a KORONIS mark, it's deprecated. Somewhere (I can't find this right now) it is advised to use a "U+2019 RIGHT SINGLE QUOTATION MARK (’)" instead. I only saw this because the TESSERACT transcription apparatus inserted this combination LAMBDA KORONIS.

If instead of LAMBDA KORONIS λ᾽   I substitute LAMBDA U+2019 λ’ we get the correct appearance. So the Gentium Plus font *is* wrong, but in this one regard it shouldn't matter. (But who knows what else might pop up?)

Notes:

Getting all this (HTML/CSS/fonts/WP/templates/styles/etc./etc.) straight makes ancient Greek look easy, so I'll go back to that.

BTW: να πάθεις, να μάθεις - once it has happened to you, then you know.

Shenme (talk) 00:59, 14 January 2022 (UTC)

Found another example from TESSERACT (advanced mode with Eng and GRC selected) - δ᾽ is supposed to look like δ’. Bleh! Good there's a workaround for the font bug. Shenme (talk) 01:07, 14 January 2022 (UTC)
@Shenme: I'm not sufficiently familiar with polytonic Greek, or Unicode's guidelines for it, to tell off-hand whether Gentium's treatment of the koronis here is reasonable. I'll try to take a closer look when time allows; but in the mean time, if you're sure it's a bug in Gentium Plus you can contact SIL and report it. Presumably they are just treating it as a ligature-eligible form (but if so, I'm not sure we can disable just that ligature).
The reason for the change to the porsonic circumflex and ypogegrammeni you observed is either because the lang team deployed version 6.001 of the font, or because you installed it locally. The ULS webfont repo used to have an ancient version of Gentium that was updated in T298613 (many many thanks to Santhosh and KartikMistry!), but I didn't think that was deployed yet. The old version had several bugs / suboptimal behaviour that bit us, and had no support for the porsonic circumflex and capital adscript iota font features. Xover (talk) 07:07, 14 January 2022 (UTC)
Newer Gentium font should be available in ULS now. -- KartikMistry (talk) 11:07, 14 January 2022 (UTC)


Really it is sufficient to note that there are two different Unicode codes,
◌ͅ U+0345 COMBINING GREEK YPOGEGRAMMENI
ͺ  U+037A GREEK YPOGEGRAMMENI
and that the font is forcing the latter to act like the former, forcing combining action when it is not a combining character. If typing 'A' followed by ':' always got you 'Ä' you'd be negatively impressed, yes? But as I mentioned, using the different character U+2019 RIGHT SINGLE QUOTATION MARK works around that font bug.
As for posting a bug report to SIL, my question would be would they respond / update? I'm dubious since I keep finding font bugs that are decades old. (30 years wrong, yay Microsoft)
Heck, I didn't get finished typing my initial note before I found another bug in the font resolved to by "font-family: monospace". The browser says this is Microsoft's Consolas font, a 15 year old font.
There are two characters
  • (Ἇ) U+1F0F GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI character
  • (Ι) U+0399 GREEK CAPITAL LETTER IOTA character
With a space between them they look like " Ἇ Ι ", which is correct. Placed together they look like " ἏÏ " (<--simulated) Somehow the font is magically inventing an umlaut/diaresis on the second character. Happens also for UPSILON. Here is that pair in the various fonts:
sans-serif font ἏΙ Ἇ Ι
Gentium font   ἏΙ Ἇ Ι
              monospace   ἏΙ  Ἇ Ι
Hopefully they all work wonderfully on your system.
None of these is a major problem. But each of these is a cautionary tale. Network caching can betray you. 30 year old fonts can betray you. 15 year old fonts can betray you. A change that has upsides, also has downsides, for the very same population you were trying to serve well!
Thank you for persisting anyway. :-) Shenme (talk) 21:46, 14 January 2022 (UTC)

Ugh, I hadn't considered that this would happen. Indeed δ᾽ and λ᾽ should not combine the diacritic with the characters, and this is how Greek does words where the end of the word drops because of interaction with the following word. Think of it as like "I d' know" (for I don't know). Putting the apostrophe above the d is not correct. This sort of thing is fairly common in the Greek texts I transcribe. --EncycloPetey (talk) 22:11, 14 January 2022 (UTC)

@Shenme: I can't see what you did for the workaround, but this is a common thing, and it either means hunting down every instance of this and kludging the workaround, or making some other change. I can't see the difference between δ᾽ and δ’ which you used above. Can you explain? --EncycloPetey (talk) 22:15, 14 January 2022 (UTC)

@EncycloPetey: When Unicode put together the Greek Extended block they were determined to cover *all* the characters not already covered in the Greek and Coptic block. They went overboard, including code points they later regretted.
U+1FBD GREEK KORONIS character (᾽) was one of those characters. It was the 'intended' character for that isolated pause character, ala ἀλλ’ ἀλλ’ and seen wrong here ἀλλ᾽.
Then the Unicode people said, oh no, rather... we really want everyone to use a more normal 'apostrophe' character (so text searches work). Hence (somewhere it says) substitute an apostrophe, but *not* an apostrophe U+0027 ' APOSTROPHE, but rather a typographic apostrophe 2019 ’ RIGHT SINGLE QUOTATION MARK . So clear all this is...
    (see demo table I'm getting ready to submit bug to SIL)
So the workaround is to *not* use the Greek Extended char (as originally spec'd) but to substitute the newer recommendation U+2019 RIGHT SINGLE QUOTATION MARK.
So, yeah, let's go back and change everything that conformed to their original fiat. Only... let's not, not for old stuff anyway. I'm changing it as I work. My new Ancient Greek IME I need to submit to the ULS people uses U+2019.
As to why not the top variant? I've now done many pages across several works that employed polytonic Koine Greek, including Thayer's Lexicon (present on a lot of 'bible' sites with errors!), Scrivener's NT in Greek (present on a few 'bible' sites with errors), an academic text The New Testament in the original Greek - Introduction and Appendix (1882) having much Greek (!), and others. I have not seen anywhere the use of the pause apostrophe except in the to-the-right-side variant. So no (?) usages of the top variant in the 1800's? (I could have overlooked the crasis case, thinking it was smooth breathing mark.)
So the top variant violates the "match the original scan" goal that I'm obsessed with. To my mind, it is a much more jarring divergence than the tilde circumflex ever was (since, again, that tilde variant is what is seen in _so_ many texts).
Sorry for all the words, thanks for the attention, and I can't believe Koine Greek is so badly supported by 'Western' fonts. Shenme (talk) 23:32, 14 January 2022 (UTC)


Submitted bug to SIL for chaotic KORONIS Greek KORONIS U+1FBD improperly coerced to U+0343 COMBINING GREEK KORONIS behavior and pointing them to User:Shenme/GentiumPlus for examples. Shenme (talk) 00:24, 15 January 2022 (UTC)

And... SIL say the bug is fixed in next release! Oooo, eek! Fixed and released! Shenme (talk) 23:42, 12 February 2022 (UTC)

Stewart's Textual Difficulties in Shakespeare

I set up the Index:Some Textual Difficulties in Shakespeare.djvu --EncycloPetey (talk) 01:57, 11 February 2022 (UTC)

NLS deletions

Per this discussion, please delete those index talk pages, as well. TE(æ)A,ea. (talk) 22:20, 22 February 2022 (UTC)

@TE(æ)A,ea. and @CalendulaAsteraceae: Those index talk pages are not associated with any of the indexes listed as duplicate, and their associated indexes still exist and are not apparently listed for deletion anywhere. What's the rationale for deleting them, and is it only the talk pages or is there an implicit assumption that the associated indexes and their Page: pages should also be deleted? Xover (talk) 07:55, 25 February 2022 (UTC)
Xover: The Index talk: and Page: pages which have been listed by CalendulaAsteraceae are all dependent on the Index: pages which were deleted, and are thus subject to speedy deletion under criterion M.4. Also, re: the most recent comment here, asking about the location, it is from here. TE(æ)A,ea. (talk) 15:03, 25 February 2022 (UTC)
@TE(æ)A,ea.: I must be being particularly dense again today. Index talk:Captain Barnwell.pdf, which is one of the talk pages listed at WS:PD, has an associated index at Index:Captain Barnwell.pdf and 8 associated Page: pages, all marked as validated. Similar is the case for the other talk pages listed there. What am I missing? Xover (talk) 17:09, 25 February 2022 (UTC)
@Xover: Possibly you were looking at a different list? I took the list of index pages which you deleted and replaced Index with Index talk and got the following list of index talk pages.
Extended content
CalendulaAsteraceae (talkcontribs) 00:52, 26 February 2022 (UTC)
@CalendulaAsteraceae: Oh. a dim bulb starts flickering I see now. Not "the index talk pages listed at WS:PD", but "the talk pages of the Index pages listed at PD". Duh! I got confused because at the top of the discussion you listed 8 actual index talk pages, and those have still-extant Index: pages and Page: pages. Apologies again for being so dense, but… dumb admin is dumb, needs spoonfeeding. Thank you both for the hand-holding! --Xover (talk) 07:53, 26 February 2022 (UTC)
@Xover: Glad I could help clear that up! Could I request that you also delete the Page-namespace pages of the Index pages listed at PD?
Relatedly, could you edit the use of {{category handler}} in {{sdelete}} so that adding "nocat = false" will work in other namespaces, like the Page and Talk namespaces? I understand the {{category handler}} documentation well enough to know it can be done, but not well enough to do it myself. —CalendulaAsteraceae (talkcontribs) 23:34, 26 February 2022 (UTC)
@CalendulaAsteraceae: Meh. So it seems I got exactly zero things right in that deletion. Grr! I'll take a look and see if there's any sane way to deal with them short of custom coding (in which case it'll be a bit before I have the spare cycles).
I've tweaked {{sdelete}} to categorise in all namespaces. nocat = false only bypasses the blacklist (regex of page names where cats should not be added, primarily for /Archive pages), but has no effect when it's in a namespace {{category handler}} doesn't know about (and it doesn't know about Page: and Index:). Xover (talk) 08:50, 27 February 2022 (UTC)
@CalendulaAsteraceae: Ok, I think I've got all of them now. Please let me know if I messed up anything else there. Xover (talk) 10:18, 27 February 2022 (UTC)
Thank you! Everything looks good to me. —CalendulaAsteraceae (talkcontribs) 22:33, 27 February 2022 (UTC)

Page end hyphen

The template {{peh}} is not working the way it should. In the Main namespace, it is transcluding as a double hyphen instead of as a single hyphen. I am not sure why. --EncycloPetey (talk) 19:31, 23 February 2022 (UTC)

@EncycloPetey: My quick sandbox test looked ok. On what page are you seeing this? Xover (talk) 20:23, 23 February 2022 (UTC)
On several pages. I noticed the issue when User:Yodin had made several hws/hwe replacements, and those where the word should have had a hyphen preserved in transclusion instead had a double hyphen. He has now reverted the affected pages, but you can see where the problem occurred by looking at their self-reversions Special:Contributions/Yodin. --EncycloPetey (talk) 20:28, 23 February 2022 (UTC)
@EncycloPetey: See User:Xover/sandbox. I am unable to reproduce this problem. Can you find a page where this is currently displaying incorrectly? Xover (talk) 21:29, 23 February 2022 (UTC)
The problem appears only under certain kinds of usage. On Test page 2 it occurs in the first call, where the pages are transcluded separately, but not in the second, where the pages tag is used. I noticed it first in the transclusion where you tried to replicate the problem, but it does not seem to happen there now. I am not sure why. Extra blank lines or carriage returns? A missing carriage return? Something that was cleared upon any edit? --EncycloPetey (talk) 22:39, 24 February 2022 (UTC)
@EncycloPetey: {{peh}} depends on the automatic hyphen handling that is provided by the Proofread Page extension to work sensibly. Or rather, {{peh}} works the same regardless of where it's used, but its behaviour makes no sense unless it's used with PRP's automatic hyphen handling. When you use direct transclusion with {{Page:…}} you completely bypass PRP so neither the automatic hyphen joining nor {{peh}} will work. That's one of the reasons why direct transclusion shouldn't be used.
Similarly, if pages are transcluded using separate instances of PRP's <pages … /> extension tag the automatic hyphen handling will not work since Mediawiki invokes the extension with only the content provided in that instance of the tag. Or put another way, each individual use of <pages … /> lives in its own little world and knows nothing about other content on the wikipage, including any other <pages … /> tags that may be present.
If there were issues with {{peh}} when used through a single <pages … /> tag then I'm unsure what might have caused it. I'm not aware of any code changes recently that seem likely to have caused something like this (not that I necessarily would pick up all such), but if it was a side effect of some change elsewhere in Mediawiki (not directly related to PRP) it's certainly possible. If that's the case, any edit to the mainspace page (the transcluding page) should force any cache to get regenerated in normal circumstances. Most of the time these kinds of changes also manage to invalidate the cache without any end-user action (there are both automatic dependency-based mechanisms and periodical scheduled jobs for this), but some such code changes can end up needing an edit (which is why we sometimes have to run a "touch edit" bot job; these days usually invisibly because the bot can "purge" the page instead of actually editing it). Xover (talk) 07:27, 25 February 2022 (UTC)
Gah! There are instances where direct transclusion is still used, such as Tables of Contents on Index pages. And I have seen editors split <pages … /> into multiple calls. I will have a better idea what to look for in future. --EncycloPetey (talk) 17:28, 25 February 2022 (UTC)

Goblin Market moved

I just checked, and you did not say to put sdelete on all of the moved pages, so I didn't... but everything got moved and, it looks great on a djvu! If I should do anything else, let me know. Thanks again!--RaboKarbakian (talk) 04:15, 27 March 2022 (UTC)

@RaboKarbakian: And the old redirects and index have now been deleted. Xover (talk) 08:35, 27 March 2022 (UTC)

Does ppoem support ULS for different languages and font choices?

Exmaple:- Page:15 decisive battles of the world Vol 1 (London).djvu/61

I am asking because these would be a {{lang block}} for Greek or polytonic under different circumstances, and felt that ppoem should eventually have the ability to set up language tags and font choices appropriately.

Your thoughts? ShakespeareFan00 (talk) 18:41, 8 April 2022 (UTC)

@ShakespeareFan00: Hmm. Good point. A lang tag should probably be added at some point. @Inductiveload: fyi. Xover (talk) 20:09, 8 April 2022 (UTC)

Div span swapping in references..

I'm going to give up on trying to resolve most of these, for the simple reason that the actually stable repair would be for the cite extension to properly support block based references (and thus I can rely on a defined and documentedbehaviour, as opposed to one that only appears to resolve the Linter concern.) There is a Phabricator ticket requesting this support, but the lack of response to it suggests that it is not considered a priority currently.

ShakespeareFan00 (talk) 08:33, 8 April 2022 (UTC)

Well I found a temporary tracking approach - Category:Pages_with_block_based_footnotes for some of them. It seems the DIV SPAN swap doesn't occur if you have a span immediately after the opening REF tag, hence the {{blockref}} template I've been using as tracking entity.

This should also demonstrate that it isn't just a few rare instances where the ref tag needs to be able to cope with block/P/DIV etc level elements :( ShakespeareFan00 (talk) 18:03, 11 April 2022 (UTC)

When this was imported the importer dutifully imported all the references which had a link in the page, however, what it did not import was all the references that were given by a range in the text itself, as I've encountered missing references which are in the PLOS original, but not in the version/edition as presented on Wikisource.

How many 'bad' or incomplete papers imported by this method does Wikisource have? ShakespeareFan00 (talk) 15:37, 9 April 2022 (UTC)

@ShakespeareFan00: Everything under Wikisource:WikiProject Open Access/Programmatic import from PubMed Central is experimental imports, and, as you'll note, not in mainspace. It is annoying that they keep turning up in various maintenance categories, but I don't think there is much we can do about it. I don't believe the community would support just deleting them. So meanwhile we should just try to do the minimal we can to keep them off the maintenance categories. Xover (talk) 13:26, 10 April 2022 (UTC)

I've found that this has a better copy at - https://archive.org/details/abiographicalin01boulgoog

compared to the copy at:- https://archive.org/details/abiographicalin02boulgoog

that's actually on Commons.

And they seem to be the same edition.

As a temporary work-around, I've set the new source at Commons ( I haven't uploaded the new file), so the hi-res scans (from the better version) can be used for proofreading.

If you wanted to compare the files in more depth and generate a new version with better scans locally I have no objections, provided the pagelist doesn't need a bulk move as I was planning to work on it. ShakespeareFan00 (talk) 09:20, 16 April 2022 (UTC)

Parallel texts

Hi, This page describes 2 kind of bilingual books saying (a) and (b) as if this is the 2 only ways that exist. How do you want to describe the third way which is really different and new with computers?
BluePrawn (talk) 15:19, 26 April 2022 (UTC)

Template:“ ‘

Isn't Template:“‘ a duplicate of Template:“ ‘ ? unsigned comment by EncycloPetey (talk) 19:37, 7 May 2022‎ (UTC).

@EncycloPetey: Indeed it is. Thanks! --Xover (talk) 17:40, 7 May 2022 (UTC)

I need help

Thanks for the quick response to my CSD tag. I need another favor. I tried reverting this vandalism a few days ago but got blocked by an edit filter that seemed to think I'm a long-term abuser, which I'm absolutely not. I'm autoconfirmed now but I don't know if the filter will still think I'm vandalizing or not. There's also Index:Editing Wikipedia brochure EN.pdf, which needs to be restored to this revision. Coolperson177 (talk) 14:24, 12 May 2022 (UTC)

@Coolperson177: Reverts taken care of. Apologies for the auto-block; that filter is evidently being a little bit too aggressive.
@Inductiveload, @Billinghurst: Filter 42 seems to have triggered when Coolperson here tried to revert a vandalised page back to this before they were autoconfirmed. There are no obvious trigger strings in that diff so I'm guessing it's one of the ones in almatch3 that is hitting too broadly. Since this is an action:block filter it's probably worthwhile tracking it down and making it more conservative, but that ruleset made my head explode just looking at it. Xover (talk) 14:44, 12 May 2022 (UTC)
I think this would be the line with 0nsmatch which will trigger on any use of "&oldid=" (i.e. linking to amy permalink at any wiki) in mainspace. I have no idea if that's intentional, but it seems like a very blunt instrument to me. It dates back to the original import of the filter from en.wikivoyage.
I have removed that part of the filter, as such a filter cannot, in good conscience, be that broad. On the other hand, the filter is actually still being triggered correctly here and there by the LTA it's designed for. Inductiveloadtalk/contribs 17:05, 14 May 2022 (UTC)
Thanks. Yeah, blocking non-autoconfrmed on any permanent link is too aggressive. But maybe this was originally a non-block filter? I seem to recall we promoted one to auto-blocking relatively recently. Xover (talk) 19:00, 14 May 2022 (UTC)
I don't think it was this one: #42 has been set to auto-block since it was imported in early 2021. Inductiveloadtalk/contribs 19:51, 14 May 2022 (UTC)

Transclusion checker

Good work on cleaning and gadget-ifying the transclusion checker! And the other gadgetery!

Just one thing that's probably not working quite right: it looks like it's sucking up the targets for the pages in a way that includes the (pages does not exist) bit. For example, I see requests to https://en.wikisource.org/w/api.php?action=query&format=json&titles=Page:The Commentaries of Caesar.djvu/151 (page does not exist)|Page:The Commentaries of Caesar.djvu/152 (page does not exist)|Page:The Commentaries of Caesar.djvu/153 (page does not exist)|.... Which means you cannot see if the non-existent pages are being transcluded.

It would make sense for the extension to add the name of the target page to the link as an HTML data field (so <a href="..." data-page-name="Page:The Commentaries of Caesar.djvu/153" title="The Commentaries of Caesar.djvu/153 (page does not exist)/>153</a>. Which might actually be more possible now that someone has changes the PHP to use the link generator (related: phab:T267617, which is the page's numerical position-in-index) But until then, stripping the parenthesised text would be brutal and effective. Inductiveloadtalk/contribs 16:50, 14 May 2022 (UTC)

@Inductiveload: Fixed. Nice catch!
Yeah, page index, label, quality level, and associated wikipage (i.e. what's currently in @title) would all be nice to have as data attributes. Most of the time you're going to use quality level as a selector, so the existing class is fine, but occasionally you'll come at it from the opposite direction and need to get and use the numerical quality.
PS. So are you getting in some wiki'ing in the leadup to Eurovision, or hiding from it? :) Xover (talk) 18:14, 14 May 2022 (UTC)

Can you take a look at this template, because the AccessDate parameter doesn't seem to be working correctly, in that nothing is displayed?

Was attempting to cleanup references here - https://en.wikisource.org/wiki/Wikisource:WikiProject_Open_Access/Programmatic_import_from_PubMed_Central/A_Collaborative_Epidemiological_Investigation_into_the_Criminal_Fake_Artesunate_Trade_in_South_East_Asia ? ShakespeareFan00 (talk) 08:56, 9 April 2022 (UTC)

@ShakespeareFan00: Hmm. Why do we even have this template? The point of the citation templates on enWP is to make all citation conform to the enWP house style; whereas on enWS we should be replicating whatever was used in the original. I don't think we should fix this template so much as putting it out of use and deleting it. Xover (talk) 11:29, 22 May 2022 (UTC)

use of deprecated pages tag

About this edit: I use pages syntax around plates and the like because it contains the page mark-up that tells e-readers to make a page break there. I can provide a link to the page break css if you require it.

Consider reverting that change?--RaboKarbakian (talk) 13:46, 16 May 2022 (UTC)

@RaboKarbakian: That diff link leads off in the weeds. Xover (talk) 16:32, 16 May 2022 (UTC)
Heh, I am very sorry to have wasted your time. This is the whole url: https://en.wikisource.org/w/index.php?title=St._Nicholas/Volume_40/Number_12&curid=3847330&diff=12322012&oldid=12322007 And the heh was about me trying fancysmancy diff templating in the wee hours of the day. Again, my sincere apologies about the abuse of your time.--RaboKarbakian (talk) 16:48, 16 May 2022 (UTC)
Further, I saw the page template you put in its place. I ended up thinking about the day, which was not so long ago, really, that I was shown the pages tag working. It was the most pleasant interaction I have had here, really. So, removal of the deprecated tag is not such a problem due to the page break template (probably, as I haven't investigated). Lack of such positive interaction with what I was sure were humans, especially since I asked what I am pretty sure was a non-human (that also hit on me) to stop following me and trying to chat--eh, positive interaction among people, has that been deprecated also? --RaboKarbakian (talk) 16:54, 16 May 2022 (UTC)
@RaboKarbakian: No worries about any wasted time. If something isn't clear it's always better to ask (some people will actually rather fume in silence over such things, and I have no idea why). If one of my edits doesn't make sense you should feel free to assume I was insufficiently caffeinated until we can establish otherwise.
Regarding the #tag:pages syntax, the only instance I'm aware of where it's needed is when you have discontiguous pages that must be joined without a line break (a newspaper article that's "continued on p. 42" and similar). Page breaks for paged media is ultimately up to the ereader in question, but, as you say, we give them a hint with {{pb}} and {{ppb}} and this seems to mostly work.
PS. If bots start hitting on you, you know they're getting close to taking over the world. I for one welcome our robot overlords! :) Xover (talk) 18:20, 16 May 2022 (UTC)

Potential workflow for scanned microfilm documents

I ask this question here, rather than on Scan Lab, because I think you might be able to set up a process—but I’m not sure, so I’ll get to the point. I have scanned in some documents on microfilm, but they need to be processed before they can be uploaded. This was created as a result, but it took a long time off-hand, and so I wonder if you can help. For the next file, there are ostensibly four pages of the text per TIFF file, which need to be cropped, color-inverted, and combined as a PDF (or DJVU, your choice). I was wondering if there is some way to go through the processes more quickly. I can upload the files if you have questions or want to test something out. Thanks in advance for looking into it. TE(æ)A,ea. (talk) 01:06, 1 April 2022 (UTC)

@TE(æ)A,ea.: How much it can be automated depends on the files. Cropping out a square area, or multiple squares, is not a problem so long as the coordinate offsets of the areas are consistent between image files. Very often this is not the case, and then it comes down to how much margin is present (i.e. how sloppy can you be with the rectangles and still avoid cutting out any of the content you want to retain). A solid black or solid white border can be automatically removed, but it has to be relatively uniform and it has to have sufficient contrast with the content to be preserved. Inverting color is not a problem. And once the images are otherwise done, I can obviously create a DjVu from them with my existing tools. Also, depending on the total number of files we'd need to figure out some way to transfer them that's practical. Xover (talk) 06:35, 1 April 2022 (UTC)
  • The offsets are not exactly the same, but for almost all of the files they are close enough not to really matter. The contrast is black-and-white (white originally, black is color-inversion comes first). I have uploaded one of the files here. This specific works only has 30 TIFF files, but future works which I may scan have more pages; for this, I can just upload the images myself. (The names will be 169630001–169630030.) For the 30-image work, I put in effort to have four pages (or page-areas) per file; however, because that was quite difficult, and led in some cases to horizontal offsets, in the future I will probably only scan for two pages per image. TE(æ)A,ea. (talk) 14:29, 1 April 2022 (UTC)
    @TE(æ)A,ea.: I've looked into the example image a bit to see what's feasible.
    Color inversion is easy and fully automated, so it's just an extra step to throw in that the computer takes care of. Cropping the pages can be partially automated using a fancy algorithm (short version: it tries to find a point in the middle of the text, and then starts scanning outwards until it finds the edges), so long as the scanned images have sufficient contrast between the foreground (the text) and the background (everything else), not too much noise in the image (coffee stains and similar from ageing are a nightmare), and decent margins between the edge of the text and the edge of the sheet. It is preferable to have scans with either one or two pages per image. Three or more (i.e. four) will require an extra processing step that may need a lot of manual adjustment (crop offsets). Non-uniform images (i.e. if the first or last images have just one page but the rest have two) may also need manual processing.
    One of the operations that will be needed (for several purposes) is what's usually called "thresholding". Since these images do not have perfect white (that is, pixels with RGB value 255,255,255) or letters that are perfect black (RGB 0,0,0), but rather they have a lot of shades of gray in between, we need to pick a shade of gray where everything darker than that is considered "black" and everything lighter than that is "white". Once we have that, the software can use logic like "every black pixel belongs to text" in order to straighten skewed pages, crop the edges, etc. This is also the same thing you need to do when converting a grayscale image to actual black & white (which gives very small files). If you look closely at the inverted version of File:169630011.tif you'll notice there are a lot of very dark gray pixels around and behind the text. Probably the ink on the next or previous page that has rubbed off or bled through. On these particular images I was able to just use a default threshold of 0.5 (midway between black and white) and get decent results; but if the darkest non-text parts of the page are significantly darker, or the lightest parts (faded) of the text is lighter, finding a working threshold gets difficult. If the functioning threshold varies between pages you essentially cannot automate this step any more (you need to manually pick the threshold for each page).
    I don't have a batch downloader for Commons (or Wikisource) set up, so I'll have to put that together. Easiest is probably if you tag each image with a category (which can be a redlink) and I can work from that. I'll also need a reliable way to order the images correctly. My existing tools are set up to look for a numerical sequence after the filename and before the filename extension: canbeanything-0001.tif. But so long as an alphanumeric sort of the filenames puts them in the correct order it should be fine.
    Finally, figuring out bibliographic data, licensing details, etc. can be pretty time-consuming (compared to hitting a few buttons and letting the computer do its thing), so it'll help if you prepare the file description page using a template like this (doesn't have to be exactly that, but those are the things I consider important to include). Also, if there is any degree of uncertainty at all about copyright status please let me know beforehand so we can figure that out beforehand.
    In any case, the long and short is that this should be doable, and hopefully without my available time being too much of a bottleneck. If you upload the first batch I can try to run it through and we can see how it goes and whether you're happy with the results. Xover (talk) 10:03, 2 April 2022 (UTC)
    • Thanks for your lengthy response! I’ve uploaded the images from this batch manually (as 169630001–169630030, as stated). Some images do have other than four pages per image. For other works (if I am able to scan more), I will go for two pages per image. I think (I hope) that the other pages will be fine for thresholding. For the source, I don’t have the reel on me, but I will be able to get it later. All of the works from this reel are from the late 1790s, so copyright shouldn’t really be an issue. Again, if I scan other works (which are longer), it would be easier to deal with off-site zipped uploads rather than dozens of (for me, manual) on-site TIF uploads. TE(æ)A,ea. (talk) 21:25, 4 April 2022 (UTC)
      @TE(æ)A,ea.: I'll have a go and see what we can do.
      Do you particularly want to keep the cover pages added by the microfilm company? We usually delete pages like that (think Google cover page etc.), and since they have a different geometry they'll need special-casing if included.
      A single-file download would be the easiest, yes, and I don't mind if it is on a site outside Wikimedia (just so long as it isn't too too spammy). Xover (talk) 07:22, 5 April 2022 (UTC)
      @TE(æ)A,ea.: Ok, the results are at File:A Letter on the Subject of the Cause (1797).djvu.
      Note that pp. 24–25 was missing from the scan, so I have replaced them with placeholders.
      Most of the time was spent in tweaking the workflow in order to automate it as much as possible (so it'll be faster in the future). A couple of findings / notes / musings:
      A single file download would be much easier than having to manually open a gazillion tabs and doing "Save as…". However, if the most convenient way to upload the images is to do it here—for example because you can then use one of the ready-made bulk upload tools like Pattypan or Sunflower—it should be possible for me to write a script to download all the files from a given category without too much effort. I checked the API and some related existing code, and it doesn't look like it'd be particularly difficult (and such a script would have other uses, so it wouldn't be only for this particular use case). Mostly it comes down to what's most convenient for you.
      The optimal format would be a one-page-per-image, followed by a single two-page spread per image, followed by exactly four pages per image. Odd numbers of pages per image will almost invariably lead to a need for manual interventions, and inconsistent number of pages per image within a single batch / work will similarly need manual work. Which, in addition to being kinda tedious, means it'll take longer to process (because I don't often have the time slots needed to do it). Guessing about your source material and workflow, I think that probably means you should aim for exactly two pages per image (but if something else is better for you then please let me know and I'll see what can be done).
      Finding an automated process to churn through these scans and produce acceptable results was a bit challenging for a couple of reasons. One was variation in geometry anf page placement within the image. Most of that was due to the images with a divergent number of pages, but there were also some issues caused by insufficient margins between the page and the edge of the image. The scans also had insufficient contrast between the page background (theoretically white, but in reality it's a shade of gray with lots of noise in the form of near-black pixels), and this varied from page to page. This made it hard to find good threshold values that would work, and would work for all the pages. This was compounded by the presence of various forms of noise (non-black pixels) in the image background. For this particular batch of images I was able to find the right combination of threshold values by simply dithering the images to black and white (i.e. no shades of gray), but this is something that will vary from scan to scan so here's where I expect there will be some need for experimentation and tweaking in the future.
      In any case, the most critical factor is making sure all the images are as uniform as practically possible, along all dimensions (geometry, placement, contrast, margins, etc.). I have the tools to automate a vast number of image manipulation operations, but if I can't apply them identically to every image we'll soon end up where it would actually be faster to do it by hand in Photoshop or similar. But for images that are sufficiently uniform etc., processing anything up to something like a thousand pages shouldn't be a problem.
      Anyways… Have a look at the result and see if it looks ok. Xover (talk) 11:03, 7 April 2022 (UTC)
      • First, thanks! Second, there are no pp. 24–25 (a printer’s error), so no placeholders are needed. The workflow on my end produces the individual files. However, it is much faster for me to ZIP those files and upload them to an off-WMF Web-site; in the future, I will do that. Because the scans in this specific work (not the microfilm reel, but this specific scan) fit very neatly into four pages per image, I tried to stay with that; however, in doing so, I messed up the alignment, which made it take much longer than necessary. For future scans, I will stay with two pages per image, unless they fit so easily that it’s not necessary to mess with alignment. The inconsistency in number of pages/image was annoying, but not universal in this reel; I don’t think it will be too much of a problem. As you can see, the images are displayed on the reel side-by-side, which makes it easiest for me to scan in two pages per image. The “Book” template form I placed on all of the individual scanned images; the only thing I need is the microfilm information, which I can get the next time I see the reel. The work itself looks great! All that’s needed is to remove the placeholders, add the information template, and the file can be moved to Commons and the work on the index started. (The few other works I looked at didn’t have (or didn’t have as much) the variable number of pages per image.) TE(æ)A,ea. (talk) 14:53, 7 April 2022 (UTC)
        @TE(æ)A,ea.: Placeholders are now removed. Xover (talk) 18:52, 7 April 2022 (UTC)

Complete dogs breakfast owing to someone just OCR dumping across an entire Index that was later replaced with a new scan entirely. I would suggest deleting everything and starting again, with a known and clearly sourced scan. ShakespeareFan00 (talk) 10:46, 18 June 2022 (UTC)

@ShakespeareFan00: Yeah. As I recall, we have several hundred thousand such pages all by the same contributor. But we have direct policy against doing that, so it needs to be handled case by case (and then the contingent that wants to delete nothing not ever comes swarming in). Xover (talk) 12:35, 18 June 2022 (UTC)

Vol. 41, a re-upload request; or my terrible mistake

It would seem that last December, I did not search commons for the ia name, and went to the scan lab asking for a lesser version to be uploaded from Hathi.

What I would like to happen is:

  1. 41.1 Index:St. Nicholas - Volume 41, Part 1.djvu to use https://archive.org/details/stnicholasserial411dodg
  2. 41.2 Index:St. Nicholas - Volume 41, Part 2.djvu to use https://archive.org/details/stnicholasserial412dodg

Problems I can see with this are:

  1. quite a bit of 41.1 has been done and the page numbers will not match. I can work out the differences if it will help.
  2. 41.2 is broken and will need further repair. My experience with finding the needed repairs uses the interface here which would mean a second upload. If avoiding the second upload is the best way to maintain your good spirits, I would be willing to try a different method to find the missing parts.
  3. (might be a problem) I should not be here with this, but at the Scan Lab or just keeping it to myself and just working with these problems I created.

I considered moving onto different volumes, but I am really looking forward to more skyscraper, tunnel, aqueduct engineering. My other problem is (kind of a fun and completely different task) that I have not been able to determine what the name this building is: Page:St. Nicholas - Volume 41, Part 2.djvu/455, maybe it no longer exists. Thanks for your time!--RaboKarbakian (talk) 14:28, 22 May 2022 (UTC)

@RaboKarbakian: I can easily download those IA scans, generate DjVu, and upload over the existing files. Doing it twice, after repairs, is not a problem (though the repairs themselves may be, depending on complexity etc.). Shifting Page: pages around isn't a big problem either, provided you do the actually hard bit of figuring out offsets and/or the fromto list. I can give you detailed instruction on what input is needed to (semi-)automate the moves. Let me know if you still need this done and I'll try to get to it relatively soonish (hopefully this weekend at the latest). Stuff that doesn't require me to think I can usually squeeze in; it's the stuff that requires the brain to actually engage that may have to wait until I get some sustained stretch of free time.
Feel free to make requests here rather than at the Scan Lab if you prefer (I don't mind); the downside is that then you're at the mercy of my ofttimes erratic schedule. At the Scan Lab you have a bigger pool of contributors to draw on (granted it's a bit shallow right now since everyone's busy IRL, but…). Xover (talk) 06:24, 21 June 2022 (UTC)
Xover In this case, (and in many more, I guess) the waiting was a good thing. I have learned how to get the logos off from the replacement scans. I have been doing this for individual replacements for a while and should be able to (fairly easily) script it for multiple pages. So all the uploaded tif should be replaced with logoless png. So, more waiting is very good right now.
Striping the logos from the page scans is not just a pretty upgrade, sometimes that logo is blocking text or covering portions of images.
I am considering Index:St. Nicholas, vol. 40.1 (1912-1913).djvu where a whole section was replaced -- to replace that with logoless scans. But you did the work on this so, your opinion counts more than mine for the redo. It is a good bunch to write my script for though, either way.--RaboKarbakian (talk) 14:31, 21 June 2022 (UTC)

Bethesda

Hi, I noticed this. I have lost track of where I was at with this, but I appreciate your followup, and I should revisit it to see if we can get a more definitive answer. -Pete (talk) 21:40, 20 June 2022 (UTC)

@Peteforsyth: You're very kind to say so, but in truth the current state is more fairly described as being due to my failure to follow up. If you're able to get all the authors (presumably by proxy of your contacts) to explicitly agree to a compatible license (and for the sake of all that's good and holy, do it to OTRS/VRT so we don't run into any failure to observe formalities trouble!), that would be ideal and would neatly shortcut any niggling doubt (regardless of one's position on the underlying legal issue). Appearances possibly to the contrary, I would find that a great relief, personally. I may eventually follow up on trying to get a policy / community consensus on c:COM:Joint authorship, but as I find the prospect both daunting and stressful it's going to stay well below the actionable line on my todo list for a good long time.
PS. If there's any way I can help with getting it resolved, then do, please, ask. I don't imagine there is, and I've taken note of the frustrations with how I managed the copyvio discussion, but if you should have need of a bull in this particular china shop I am at least happy to try to help. For whatever that's worth. Xover (talk) 06:00, 21 June 2022 (UTC)

For some reason the IA-upload tool mangles the OCR text layer if it's anything non-standard. Any chance of rebuilding the DJVU file so it's actually usable? ShakespeareFan00 (talk) 18:31, 21 June 2022 (UTC)

@ShakespeareFan00: Done This one had rather a lot of weirdness, so I've pruned relatively aggressively. In particular, the protective sheets over the plates seem to have confused the scanner so there were several duplicate and uncropped pages at those points. Instead of doing a lot of manual faffing to restore the otherwise empty protective sheets, I've simply omitted all of these from the DjVu. Xover (talk) 06:31, 22 June 2022 (UTC)

Off by one in respect of the text-layer. It seems the calibration page at the front confused the IA-Upload tool. ShakespeareFan00 (talk) 18:33, 21 June 2022 (UTC)

@ShakespeareFan00: Done I also removed the worst of the black borders while I was at it.
The off-by-one thing is probably due to T219376, but triggered by something ia-upload does when it removes the first page. It's a known issue and relatively common (I've fixed several tens, if not hundreds, of these). Xover (talk) 06:21, 22 June 2022 (UTC)
I will also note here that when I did a massive attempt at page-listing a few years ago I aslo set up Category:Scans with misaligned text layer to potentially mark works affected by the Djvu handling issues you note in the ticket. No obligation to fix these unless someone expresses interest, but I thought I would mention it in case you have any (rare) spare time. :rofl: :) ShakespeareFan00 (talk) 08:02, 22 June 2022 (UTC)

History of the Indian Mutiny...

Affecting:-

For whatever reason the volume are split awkwardly... This is what the structure is as far as I could determine:-

IA volume Work Volume pages
historyofindianm11ball_1 Volume I 1 to 184
historyofindianm12ball_1 Volume I 185 to 376
historyofindianm13ball_1 Volume I 377 to 568
historyofindianm14ball_1 Volume I 569 to 664
{{{1}}} Volume II 1 to 112
historyofindianm212ball_1 Volume II 113 to 303
historyofindianm223ball_1 Volume II 304 to 496
historyofindianm234ball_1 Volume II 497 to 664

Perhaps if you ever have some spare time it would be possible to figure out and rename stuff so there's a consistent approach? (and possibly add the later volumes that make up Volume II?

ShakespeareFan00 (talk) 12:31, 22 June 2022 (UTC)

What is actually wrong with this, because the markup use of {{ppoem}} should have not caused any lint errors? Thanks. ShakespeareFan00 (talk) 10:34, 1 July 2022 (UTC)

@ShakespeareFan00: Nothing, so far as I can tell. What makes you say there is anything wrong with it? Xover (talk) 15:18, 1 July 2022 (UTC)
It was down to an /s part of template what wasn't needed after I converted to ppoem :( now fixed. ShakespeareFan00 (talk) 17:00, 1 July 2022 (UTC)

Finding parameters inside SPANS...

This https://regex101.com/r/225QHF/1 was as far as I got.

What I am wanting to do eventually is to have a regexp I can use with listpages.py to give me a list of templates that have parameters that are inside a SPAN, (and thus potentially subject to the wrapping quirk you mentioned in the Scriptorum.)

I can then use a different set of regexp(s) to collapse the line-breaks supplied to those templates in content pages.

However finding a reliable regexp is proving elusive. ShakespeareFan00 (talk) 10:46, 2 July 2022 (UTC)

https://public.paws.wmcloud.org/User:ShakespeareFan00/obs_spans_direct was a partial list of templates affected.
https://public.paws.wmcloud.org/User:ShakespeareFan00/linthints.ipynb being my list of various listpage queries...

Center in headings...

https://en.wikisource.org/w/index.php?title=Constitution_of_Malaysia&diff=prev&oldid=12439358 , How many other pages which were auto fixed have this issue? ShakespeareFan00 (talk) 16:33, 2 July 2022 (UTC)

@ShakespeareFan00: Quite a few. I plan to do a second pass looking for heading markup containing {{c}} and simply removing the heading markup. Oh, hmm, or I may actually be able to do it in one pass for the remaining pages... Xover (talk) 18:28, 2 July 2022 (UTC)

Cleaning up after obselete tag replacements...

Wow.. And thanks.

Please consider taking a look at my efforts at resolving the resultant misnesting or stripped tags. :) ShakespeareFan00 (talk) 17:30, 2 July 2022 (UTC)

@ShakespeareFan00: I will. But if you see any specific problems, apart from the {{c}} in headings please let me know. There are rather a lot of them so I won't be able to catch all errors as the edits scroll by. Xover (talk) 18:30, 2 July 2022 (UTC)
I would suggest looking at my recent history
  1. {{smaller|{{c}} should be the other way round.
  2. In some instances <small><small>'''Bold'''</small></small> has matched the first closing small not the outer one. This hasn't just happened with small tags , It's happened with BIG tags as well.
If I think of or note others, I'll list them here.
ShakespeareFan00 (talk) 18:44, 2 July 2022 (UTC)
@ShakespeareFan00: Thanks. The first one will need a separate fix (possibly manually). The double-tags leading to misnesting I have a fix for for the remaining pages, at least for big and small. Please let me know if you see others. Xover (talk) 18:47, 2 July 2022 (UTC)
@Xover:
https://en.wikisource.org/w/index.php?title=Page:Natalie_Curtis_-_Negro_Folk-Songs_Book_1.djvu/10&diff=prev&oldid=12453030
You might want to check if {{c\s}} needs a line feed after it. Otherwise the content inside might be collapsed.
Also:-
https://en.wikisource.org/w/index.php?title=Page:The_works_of_Horace_-_Christopher_Smart.djvu/79&oldid=12451688
here the Center directly center's the wrapped HR tag. I went back to a previous version in my edit
https://en.wikisource.org/w/index.php?title=Page:The_works_of_Horace_-_Christopher_Smart.djvu/79&oldid=12453201
and will be reviewing pages I did Lint corrections on to look for related situations. ShakespeareFan00 (talk) 19:36, 3 July 2022 (UTC)
  1. <center foo=bar> is not converted but the closing is.
  2. In my PAWS I had a list of Page: s with mismatched CENTER tags...
  3. :<div> is a known wrapping quirk, I've been replacing the indentation effect with {{left margin}}
  4. HR tags should be converted to {{rule}}

@Xover:ShakespeareFan00 (talk)

Finding a leaking template paramater marker...

@Xover: https://en.wikisource.org/w/index.php?title=Century_Magazine/Volume_48/Issue_2/Across_Asia_on_a_Bicycle._A_Pause_at_the_Mountain_of_the_Ark&diff=prev&oldid=12443461

I've got a partial regular expression for this - \|([^=|}]+)\<(([^=])*?)\= but at present this ALSO detect every usage of table syntax. How do I narrow this to find ONLY the usages in calls to templates? ShakespeareFan00 (talk) 07:30, 3 July 2022 (UTC)

@ShakespeareFan00: Regexes aren't great at context and "if this match foo otherwise match bar" (which is why regex is not a good approach for parsing any kind of structured or semi-structured data: regexes are designed to operate on a string). But at first glance I would guess that what you're trying to do might be possible by using a negative lookbehind assertion: r'(?<!\{)\|([^=|}]+)\<(([^=])*?)\=' (note that Python regex has some severe limitations on lookbehinds, in particular if you use alternation all the patterns must be literals of equal length). Alternately you could perhaps look for the double curly brackets that must open any template invocation. The r'regex' stuff is Python syntax for declaring a string as a regular expression during variable assignment, btw, and not actually a part of the pattern. The negative lookbehind syntax is the (?<!pattern) bit. I'm not sure how much Python you know so please ignore if you already knew that.
But you may also want to consider an alternate approach. You're trying to find a completely general solution to this problem, but the problem isn't actually one uniform one: it's a cluster of similar ones. A completely general solution is going to be hard and might actually impossible. Instead, I suggest looking for more narrow subsets that are easier to solve; because very often these kinds of problems are the result of a single human being with a particular habit (and there are a finite number of humans here). For example, it's possible you could search for r'{{c|<span style="font-variant:small-caps">(.*?)</span>}}' and get 10%, and so on, so that ten "dumb" patterns gets all of them in a much shorter time than it would take you to develop a completely general solution. Xover (talk) 08:24, 3 July 2022 (UTC)
That was kind of the approach I'd used. look for a more limited situation. On some I've replaced {{c|<foo>}} with {{c|1=<foo>. However in places like

https://en.wikisource.org/wiki/British_War_Economy/Chapter_XII, The center is in a heading, in which instance {{heading}} might be a better approach (see- https://en.wikisource.org/w/index.php?title=Bulletin_of_the_Torrey_Botanical_Club%2FVolume_35%2FThe_published_work_of_Lucien_Marcus_Underwood&type=revision&diff=12444026&oldid=12437962 ? You said you were working on some specifc fixes earlier, so I can hold off from doing any more manual repairs for a bit, if you want me to. ShakespeareFan00 (talk) 08:36, 3 July 2022 (UTC)

@ShakespeareFan00: The current {{heading}} is... suboptimal (I'll eventually look into replacing it), so my plan for cleanup after this run was just to go looking for instances of {{c}} inside heading markup (=== and friends) and just removing the heading markup. If you're using raw heading wikimarkup you're at the whim of whatever new skins are released and updated for Wikipedia anyway (for example now the sticky left-floated sidebar toc that's about to become default with Vector 2022), so trying to perfectly reproduce the formatting the original contributor had in mind in 2006 is both literally impossible and a fools errand.
I'm running this as a bot replacement, so mostly your manual fixes should not matter or get in the way (and manual fixes are always going to have higher quality than bot fixes, because humans can take context into account). When working on it you may also come up with other problems or new/better solutions, so from my point of view they're an advantage. But whether you want to expend the effort when a lot of them may get fixed "well enough" by a bot eventually is a different question. Xover (talk) 08:51, 3 July 2022 (UTC)
@ShakespeareFan00: No need to revert manual fixes. As I said, your manual fixes are likely to be better than automated ones because you can look at the context and better approximate the original intent. And just because I personally don't particularly like the current {{heading}} template doesn't mean it's "wrong" or a bad solution for this (it may in fact be better in this case). Xover: (talk) 09:11, 3 July 2022 (UTC)
@Xover: {{heading}}{{span}} and {{p}} share related coding approaches to the former version of {{ts}}. {{ts}} was re-written to use a module approach. So if you were reworking, it may be sensible to consider if some degree of harmonisation between them was desriable.. (I'm not sure they all use the same command sets/ short codes :( ). I've reverted some of my 'fixes' to allow for the bot to handle the headings issue. ShakespeareFan00 (talk) 09:14, 3 July 2022 (UTC)
@ShakespeareFan00: All these three probably mostly need to be reworked to primarily apply a class name and then delegate all formatting to the Index-styles. But editing index styles is a little advanced (technical) for most people, so it needs careful consideration, good docs, maybe some pre-made examples, etc. And of course migrating existing usages, which is going to be a bit of a slog. I also know Inductiveload has some thoughts on this (and they are much more negative towards the {{ts}} approach than I am), so I plan to brainstorm it a bit with them before landing on a plan there. Xover (talk) 09:27, 3 July 2022 (UTC)
See also the approach I took with {{*/s}} {{*!/s}} and related (as to the classed approach vs short codes.). In some examples I was able to do things using that approach that wouldn't be possible without Indexstyles :) ShakespeareFan00 (talk) 09:30, 3 July 2022 (UTC)
I have a set of listpages query on PAWS that may be useful to you.. :-
see https://public.paws.wmcloud.org/User:ShakespeareFan00/linthints.ipynb, in particular the ones I outout to rawfoo style names
ShakespeareFan00 (talk) ShakespeareFan00 (talk) 09:36, 3 July 2022 (UTC)

Category for {{Center}} content that isn't parameter 1 or text...

Would it be feasible to add a check to see if no text has been supplied (i.e check parameter 1 or text actually contains something? And then categorise into a suitable maintenance category?

This might help find the bad arguments due to unescaped = quirk, because paramater 1 wouldn't be defined in those situations? ShakespeareFan00 (talk) 20:23, 4 July 2022 (UTC)

@ShakespeareFan00: Category:Pages using center with no text argument. It's catching some weird cases so we should probably have had this all along. Xover (talk) 07:29, 5 July 2022 (UTC)
Related. I've noticed some instances where someone meant to use {{rh}} and typed {{c}} or related instead. Could you amend your detection code for no text arguments, to also detect for 'Pages using center with additional arguments" most likely it will be a 2 or 3. Catching examples where there's a named parameter are going to need grep or even a full blown parser effort, as I'm not sure there's currently a mechanism for the parser to indicate it's been passed something the template doesn't in fact use.
ShakespeareFan00 (talk) 08:44, 5 July 2022 (UTC)
@Xover: - Per the documentation for Template:Category_handler#Namespaces you might have to add handling for Page: namespace manually. I suspect most of the instances of 'works' showing up in the category will be where the actual problem line in fact lies in Page: namespace. Thanks for implementing this. ShakespeareFan00 (talk) 09:02, 5 July 2022 (UTC)
@Xover: - Around line 100 of Module:Category_handler/config add <syntaxhiglight lang="lua"> [104] = true, -- page </syntaxhighlight> and it should catch a lot more of the concerns. Alternatively a special handling line could be added to {{center}}?ShakespeareFan00 (talk) 09:06, 5 July 2022 (UTC)
We also now have template parameter analysis for English Wikisource:- https://bambots.brucemyers.com/TemplateParam.php?action=invalidlinks&wiki=enwikisource&template=Center
I asked the author on Wikipedia to enable it :) , it doesn't yet function in Page: namespace, so I was wondering if you'd prepare some kind of technical explanation so that can be enabled in the tool.
Progress is being made :)
ShakespeareFan00 (talk) 15:07, 5 July 2022 (UTC)

Center template...

I've noted that sometimes the CENTER tag has been used for a 'block center' around tables and other block based content, That's partly why I added the additional stuff in the CSS. If you are saying this IS a different template fair enough.

However, determining where CENTER has been used to actually mean block center isn't something that can be automated as easily. ShakespeareFan00 (talk) 07:10, 5 July 2022 (UTC)

@ShakespeareFan00: No, you're right. That's one of the major problems with the old center tag, and why it's been deprecated for yonks years. It behaves as both text alignment and block alignment, and it behaves as both an inline element and a block element. In other words, it is fundamentally incompatible with both HTML standards and the the CSS box model, and is special-cased from here to hell and back in browser rendering engines. Xover (talk) 07:16, 5 July 2022 (UTC)

Unsigned MassMessages

Hi Xover, Re: your edit summary here, I just wanted to note that MassMessages are generally not signed with 4-tildes, but instead with 5-tildes. This is to prevent a mixture of Left-to-Right and Right-to-Left text on some wikis, and to avoid confusion for new editors trying to reply to/ping/thank the bot-account (and possibly other reasons I'm forgetting). You can see the docs about this in (the last 2 bulletpoints of the "Body" sub-list of) m:MassMessage#Global message delivery. I hope that helps clarify! Quiddity (WMF) (talk) 17:39, 5 July 2022 (UTC)

Addendum: Timestamps/5-tilde signatures are enough for the usual Archive-bots, hence that became the standard. Quiddity (WMF) (talk) 17:46, 5 July 2022 (UTC)
@Quiddity (WMF): Enough for the archive bots, but it breaks the reply tool. And four-tilde signatures are the essentially universal norm on all Movement projects (with the exception of those still using Flow, where the issue is moot). Making up special rules on this for the mass-message bot seems like a bad idea. Contrariwise, signing such messages with the username of an actual human being would seem to have some benefits; as other mass message senders tend to do. I'd encourage rethinking this practice. Xover (talk) 18:53, 5 July 2022 (UTC)
@Quiddity (WMF): Oh, and PS.: The subject line is fairly straightforward. It may not exceed 240 bytes (doing so causes truncated edit summaries). I don't think this is true anymore. The edit summary length was raised to something like 1000 (possibly multi-byte) characters a while ago, so 240 bytes seems excessively conservative even accounting for boilerplate and markup overhead. Xover (talk) 19:02, 5 July 2022 (UTC)
Re: Reply Tool, Ah, foo, good point. I will ask around about that aspect.
Re: 240 bytes limit - I believe this is still the case, because the extension is just using the ==header line== and not the 'edit summary field' itself (or something!). Cf. Description of phab:T164503. Ah, yes, I started to make a lorem ipsum test-delivery, and found that the text-field for the "Subject of the message" at m:Special:MassMessage is limited to 240 bytes. (Sidenote: For context on both the docs and the code, MassMessage is a volunteer-written and -maintained extension.) Hope that helps. Quiddity (WMF) (talk) 21:37, 5 July 2022 (UTC)
@Quiddity (WMF): The limit looks to be configurable. To the degree anyone feels the current limit is hampering (which may not be the case; it was just an observation from the docs), it looks like it might be relatively easy to up it to somewhere closer to the hard limit imposed by MediaWiki and the database schema. If I'm reading the code right the limit is enforced at the front end so that, apart from the bytes vs. characters distinction, increasing it won't touch on any fundamental assumptions or require massive surgery.
And, yes, there are rather a lot of components that both the WMF and the communities rely on that are community supported. For example, did you know that all the Wikisources base their core workflow on the Proofread Page extension which the WMF has allocated no resources to and which survives basically entirely on periodic efforts from pure volunteers and staff developers taking pity on us and fixing stuff in their spare time? During the worst of the Covid shutdowns we had volunteers trying to implement new stuff and get rid of technical debt, but we couldn't even get the patches reviewed because that requires other volunteers to invest years into getting +2 rights and having the time to serve as reviewer. This is wasteful of the donations of time, skill, and effort that volunteers make, is not sustainable over time, and does not reflect well on the WMF as custodian for the Movement and its assets.
My point being, so far as I can tell, the biggest users of MassMessage—judging by what shows up on WS:S—are the WMF and not the community. We've had more (rather bland and corporate-speak, I must say) notifications about various strategy processes than anything very community related (with the exception of the Tech News, whose existence and ongoing reliable delivery is a bright spot) for as long as I can recall MassMessage being a thing. It seems downright negligent for the WMF not to allocate sufficient resources to its maintenance to secure proper functioning of a critical tool and communication mechanism for both strategic and day-to-day communication with the community. And on an ongoing basis it cannot possibly need a lot of developer time; it just can't be zero.
In any case, I perform this little rant (or a variant of it) any time someone gives me the least little opening to do so. Please do feel free to quote it (with or without attribution) anywhere you think it can do some good. Or, you know, just ignore it if you've heard it before. 😎 Xover (talk) 05:48, 6 July 2022 (UTC)

{{Smaller}} is also affected by the paramater issue...

https://en.wikisource.org/w/index.php?title=Page:GeorgeTCoker.djvu/16&diff=prev&oldid=12458633

This needs to be flagged as a specfic Linter erorr., as it's going to be very time consuming trying to find it for every single template combination there is. ShakespeareFan00 (talk) 20:16, 5 July 2022 (UTC)

Also {{larger}} - https://en.wikisource.org/w/index.php?title=Page%3AHistory_of_Iowa_From_the_Earliest_Times_to_the_Beginning_of_the_Twentieth_Century_Volume_1.djvu%2F11&type=revision&diff=12458758&oldid=12452048 ShakespeareFan00 (talk) 21:19, 5 July 2022 (UTC)
Sorry. I didn't mean to to start a process whereby we see more termites before we can remove the nest.... ShakespeareFan00 (talk) 21:19, 5 July 2022 (UTC)
@ShakespeareFan00: All templates with this model of operation (content in first unnamed arg) are affected, and we can't track all of them. But what we can do is pick out common variations of raw HTML tags and replace those with the equivalent template, which will neatly solve the problem through a different avenue. In my experience the variations are relatively limited, even if in theory they could be nearly infinitely varied. Xover (talk) 05:52, 6 July 2022 (UTC)
@Xover: Yes, that's a good strategy. Are you converting smallcaps at the same time? (And I was applying HR-> rule conversions, as in a number of instances that greatly reduced the amount of 'raw' HTML that was needed to start with. )ShakespeareFan00 (talk) 08:00, 6 July 2022 (UTC)
@ShakespeareFan00: Right now I'm sort of doing them ad hoc as I find them, mainly to fix fallout from the center tag conversions. There are about 15k of those left so I'm using this opportunity to look for patterns that might be quickly fixable or can be handled at the same time as center tags. Xover (talk) 08:20, 6 July 2022 (UTC)

Escaping =

https://en.wikisource.org/w/index.php?title=Page%3AOn_the_Determination_of_the_Index_of_Refraction_of_Glass_for_the_the_Electric_Ray.djvu%2F8&type=revision&diff=12461440&oldid=12449635

Maybe this is overly cautious? ShakespeareFan00 (talk) 15:51, 6 July 2022 (UTC)

@ShakespeareFan00: Yes. Either 1= or {{=}}, not both. Xover (talk) 15:57, 6 July 2022 (UTC)

English Wikipedia

Dear Xover do you edit on English Wikipedia? 152.86.164.35 14:17, 7 July 2022 (UTC)

Occasionally, yes. Why? Xover (talk) 15:50, 7 July 2022 (UTC)

TemplateData

Do we have a list of templates by transclusion count in ns0 and ns104?

I am asking so that I can add TemplateData for the highly used ones to find 'parameter' errors using the tool someone enabled for Wikisource recently. ShakespeareFan00 (talk) 18:16, 7 July 2022 (UTC)

@ShakespeareFan00: I have a todo to look into that, but it's pretty far down just now. Xover (talk) 20:47, 7 July 2022 (UTC)

Illustrated books from Wikimedia Commons

I saw your comment on the discussion; could you please transfer File:MU KPB 033 Rubiat of Omar Rayyam.pdf and File:MU KPB 034 Parsifal.pdf, as I am unable to? TE(æ)A,ea. (talk) 22:51, 12 July 2022 (UTC)

Obsolete tags...

Planning on doing another batch of these at some point? ShakespeareFan00 (talk) 16:52, 13 July 2022 (UTC)

I've done some myself manually but it's going to take a 'l..o..n..g... time..'

ShakespeareFan00 (talk) 16:52, 13 July 2022 (UTC)

@ShakespeareFan00: Yes. I'm just taking a break until I can give it sufficient attention (the replacements are not reliable enough to run unattended). Xover (talk) 16:59, 13 July 2022 (UTC)

Table Class, and related...

Apologies for adding to a lengthy to-do list, but I recently migrated some of my previous Table class stuff over to Indexstsyles as they were mostly single 'work' uses.

1. I'd appreciate someone reviewing {{table class}} and it related family, now that Indexstyles seem to be the preferential practice on English.

2. It was also suggested that {{ts}} also be re-examined when I asked why a specfic page was hitting the lua equivalent of the expansion size limit.... In many many situations migrating the relevant tables over to IndexStyles would help avoid 'script time out' conditions.

3. I'm in the process of converting parts of {{dtpl}} over to using TemplateStyles and {{optionalstyle}} to make it more readable if anything. I think Inductiveload wrote {{optional style}} with a view to even more templates being migrated to only need the inline styles if their were non-default approaches being applied.

ShakespeareFan00 (talk) 22:38, 13 July 2022 (UTC)

@ShakespeareFan00: On {{tc}}, I'm not entirely sure what you mean by "reviewing" it?
On a purely technical nit, {{#if: {{{2|}}}| {{{2|}}}|}} is exactly equivalent to {{{2|}}}. If that space character had been important (or you needed other wrapping around the actual parameter value) then this might have been necessary, but in {{tc}} you're adding that space between the if-statements anyway so that's just needless complication.
Other than that my main observation at a quick glance is that the {{tc}} to easily add a class to a table element might be useful, but /import and the stylesheets are no longer suited in that form. We need a better way for users to manage their Index:-styles, but {{table class/import}} probably isn't it. I'm thinking it should probably be converted to a Help: page with cut&paste-able snippets of CSS to use in Index:-styles. But I haven't really tought that through.
I am also in general questioning toward whether tables are really well suited to standardised formatting through style sheets. For use cases like Wikipedia, where all tables should look roughly the same, certainly. But here, where most tables have quirky formatting, when combined with the limitations of the table model in HTML (and wikimarkup), I'm just not seeing any clear idea of how to do that that is significantly better than just using {{ts}}. Stylesheets are great for standardizing formatting, but not so much for handling divergent formatting. For tables my thinking currently is that what's needed is actually an interactive tool to let you visually design a table, roughly like what Word does, that spits out the necessary wikimarkup. But if I'd had a really good solution to this problem I would have probably implemented something long ago.
Regarding MW script time and size limits… After I converted {{ts}} to Lua it's pretty rare that we hit those limits. The Monthly Challenge is an exception, because Inductiveload was in full-on Mad Scientist mode when they designed that (it's really pushing the limits of what MediaWiki is designed to do). For other pages we have a small handful that are currently showing up in those tracking categories, and those are pathological cases. I'm looking at those on and off, looking for fixes or optimizations. The rule of thumb is that when we hit those limits the first instinct should be too look at splitting the page up because anything that hits those limits is too large for a reader to conveniently navigate too. One main class of exception is indexes, because those naturally tend to be really long and in one "unit" and splitting them up in a natural way would lead to loads and loads of tedious little pages (i.e. the other extreme).
{{optional style}} made more sense before Lua and TemplateStyles. I haven't looked specifically at what it would entail, but my base assumption is that rewriting templates in Lua would obviate the need for {{optional style}}. It's possible, though, that it'd be useful to have a Lua module version of it to standardise some stuff on the backend. But looking at this is as yet way down on my todo list. Xover (talk) 06:40, 14 July 2022 (UTC)
Could you as one very specific change consider changing Module:Table style, so it supports class= classlist (That means there would not be a need for an independent table class template?. Thanks. ShakespeareFan00 (talk) 09:57, 14 July 2022 (UTC)

What was the purpose of this template? Is it supposed to be another Template:Side by side? If so, what's different about it? I was going to create it as a redirect, but saw it was already something else. PseudoSkull (talk) 13:47, 14 July 2022 (UTC)

@PseudoSkull: An experiment in a saner approach to applying arbitrary bits of formatting. It's "styled block start" (there's a {{sbe}} too). It was intended to be more general ("this div is bold italic larger …"), but the main impetus was actually the problem that {{ppoem}} solves (and solves much much better). I haven't quite given up yet, but I am now less certain that approach makes sense. So… it's in limbo? Or something like that. Xover (talk) 13:52, 14 July 2022 (UTC)

Re: center tags and center template

See the examples below for differences:

ALICE’S ADVENTURES IN WONDERLAND


ALICE’S ADVENTURES IN WONDERLAND

ALICE’S ADVENTURES IN WONDERLAND

The three options are, from left to right: center template, center tag, and underline. The line in the original is not right below the text, but it is not as far away as the center template produces, so I use the center tag. So, yes, it is intentional. TE(æ)A,ea. (talk) 19:31, 3 July 2022 (UTC)

@TE(æ)A,ea.: Hmm. Interesting. But why the block center around the already centered text? Xover (talk) 20:44, 3 July 2022 (UTC)
Xover: The block is so that the rule matches the length of the text. TE(æ)A,ea. (talk) 21:02, 3 July 2022 (UTC)
@TE(æ)A,ea.: Oooh, that's quite clever! But, ok, you're not actually using it for (block) centering, just to have a common container for the text and rule that gives them both a width. That makes a lot more sense. :-)
I dug into the reason for that huge vertical gap and, as suspected, it's p-wrapping that's biting us again (long-form/technical explanation available on request). Long story short, your {{c}} is getting wrapped in not just the <div> that the centering needs, but also a <p> that MediaWiki adds; and that <p> is styled by the skin to have a hard 7px bottom margin. And since the <p> is added outside of our control we can't easily suppress it or otherwise fix it "automagically". *sigh* But, knowing it's there we can semi-manually correct for it in these specific cases. I've added support for a nomargin=1 parameter to {{c}} that forces the bottom margin of contained <p> tags to zero. You can see the effect relative to the alternatives in this table:
{{c}} {{c|nomargin=1}} <center> {{u}}

ALICE’S ADVENTURES IN WONDERLAND


ALICE’S ADVENTURES IN WONDERLAND


ALICE’S ADVENTURES IN WONDERLAND

ALICE’S ADVENTURES IN WONDERLAND

Can you check that its rendering is indeed (close enough to) what you needed here? --Xover (talk) 07:57, 4 July 2022 (UTC)
  • Xover: Sorry I missed you with this! Yes, nomargin=1 does the trick quite well. If you don’t mind, could you (1) replace all of the <center></center> usages in the work and (2) explain the p-wrapping problem? Thanks for solving this problem! TE(æ)A,ea. (talk) 00:36, 26 July 2022 (UTC)
    @TE(æ)A,ea.: With how lame my own response time has been lately you should definitely not worry about missing stuff like this. Even more than usually I mean. :)
    Replace is done. Please do a spot check that it looks ok.
    p-wrapping is… Hmm. What we usually call the "parser" in MediaWiki, is really a whole bunch of different layers and components whose overall function is to take the wikimarkup you enter into the editing text box and transform it, ultimately, into the HTML that gets sent to the web browser for rendering as a "web page". One of the things that's done in one of the stages is to try to make sure the HTML that gets generated is well formed and won't confuse web browsers. This was first implement way way back in the early oughts, and with a rather narrow perspective. In particular, it had the base assumption that all page content must be contained within at least one block-level container (div, p, etc.) to be valid (this was before html5, and wasn't really true even then), and it had eyes primarily for Wikipedia, where a page consists of sequential paragraphs of text that should all conform to the house style, and having "automatic" paragraphs was a good thing.
    The result is that the parser uses a relatively dumb bit of code to go through the wikimarkup and wrap chunks that it thinks needs it in <p>…</p>. It's dumb in the sense that it has limited understanding of context, looking mostly on just one line, and gets easily confused. But every time we insert a blank line (technically, two consecutive newline characters) the parser will consider it a paragraph break and wrap things in p tags. Except when it doesn't. For example, if a paragraph of text begins on the same line as a <div> start tag, or ends on the same line as a </div> end tag, that paragraph will be considered to already be contained within a block-level element and not need p-wrapping.
    So the most obvious and common (for us) example of p-wrapping going wrong is templates like {{c}}. It works by taking the text its fed as an argument, and wrapping it in a <div>…</div> to which some appropriate styling is applied. For {{c|Chapter 1}} that works fine and reliably. But then you have a longer text you want to center so you enter something like {{c|First paragraph.\n\nSecond paragraph.\n\nThird paragraph.}} (\n\n stands in for two newlines, aka. a blank line, obviously). Now it all depends on how the template was written. The very most obvious way to write it is (omitting all actual formatting etc.): <div>{{{1}}</div>, which with the argument expanded becomes <div>First paragraph.\n\nSecond paragraph.\n\nThird paragraph.</div>. Woops. The first and last paragraphs are on the same line as a div tag so they are ignored, but the second paragraph isn't so it gets wrapped in p tags. Within a single template the three paragraphs now behave differently. So maybe you try working around it by manually inserting newlines before and after the first/last paragraph? Sorry bub, the parser strips leading and trailing whitespace in template arguments. So the only way to get consistent behaviour is to write the template like this: <div>\n{{{1}}\n</div>. That makes the parser treat all three paragraphs the same, wrapping all of them in a p tag. Great.
    Except, we created a gajillion templates before this issue was understood (and we're still creating more at an alarming clip), and putting those extra newlines in the template changes the output. Often it doesn't matter, but in cases like your Alice heading above you get an extra p tag that you didn't have before, and both the web browser's default stylesheet and MediaWiki's skins (e.g. Vector) have styling for paragraphs based on Wikipedia-like assumptions. The net result is that your Alice text gets wrapped in a p to which the browser applies a 7px top and bottom margin, and when you're fine tuning its distance to the following horizontal rule that really gets in the way.
    And we have a huge huge number of similar cases. They're hard to fix because our templates were slapped together by different people at different times with no real unified design behind them (some of those people had directly contradictory design approaches), and without any particular attention to this particular problem. And we have developed a habit of just using whatever template gives us the visual effect we're after, regardless of whether the particular template makes sense in that use or not. So whenever we try to fix things like this p-wrapping problem we need to go through an endless number of very very weird edge-cases to see if they break, and find some kind of strategy to replace the ones that do (nomargin for {{c}} being one such strategy). A "preventive measure" in that regard is trying to slowly move toward a more "designed" use pattern for templates; for example by using {{c/s}} and {{c/e}} and similar for blocks of content, so that we can have clean block or inline semantics for a given template (but making people use those is hard since our templates are inconsistent in this regard). Of course, since most of the community either doesn't care about technical minutia (very much understandable) or are actively offended when someone tries to tell them what they should and shouldn't do with their templates, that's a pretty slow and uphill struggle. I'm sure we'll get there one day though. :)
    In any case, despite its length the above is a very simplified explanation, but the gist is that sometimes MediaWiki adds extra p tags that can mess things up if we're not careful about how we design and use templates. --Xover (talk) 09:48, 26 July 2022 (UTC)
    • Xover: Thanks for the explanation! Yeah, that “parser” seems really annoying. My lack of knowledge of Lua leads me to believe that modules could fix this problem, by treating every line (of text) in, for example, {{center}} the same way. But that seems like a lot of work, especially with the great number of templates, to fix a problem with MediaWiki. As for your quest for proper template usage, spread the message! I’m sure, one day, people will start using table formatting for tables of contents. I also had another instance, where I almost used the center tag over the template; but in that case they produced the same result, so it was a separate problem. (I was trying to add text at the end of a {{***}}, if you want to know.) The replace looks good; thank you for that, as well. TE(æ)A,ea. (talk) 17:03, 26 July 2022 (UTC)
      @TE(æ)A,ea.: The output from a Lua module would still be processed by the parser, so that in itself wouldn't help. Xover (talk) 17:22, 26 July 2022 (UTC)
      Xover: I was thinking that the module would wrap each new line of Lua-centered text in a div, so that the parser wouldn’t act on the second line (in your case above), just like it doesn’t act on the first and third lines. This (I think) would avoid the problem, but I don’t know either how Lua works or how it interacts with the parser, so I may be wrong. TE(æ)A,ea. (talk) 18:00, 26 July 2022 (UTC)

This no longer seems to generate a 'tab' for me. Did you or Mediawiki change something recently that affects how tabs and 'portlets' are added?

(@Inductiveload:'s Jump to file script seems to have stopped working as documented about the time the Purge tab also stopped working.)

ShakespeareFan00 (talk) 09:04, 18 August 2022 (UTC)

@ShakespeareFan00: The Gadget hasn't changed, but the Desktop Improvements / Talk Pages Project dropped a release in this week's train (see the last few Tech News at the Scriptorium). That being said, I still have all the purge options as expected in the "More" dropdown menu next to the search field. What skin are you using? Does it matter whether you're logged in or not? Have you tried a different web browser? Xover (talk) 14:24, 18 August 2022 (UTC)
Thanks... I solved the 'purge' option by toogling it on and off again in Prefrerences ( Yes I know classic meme/trope etc.)ShakespeareFan00 (talk) 15:36, 18 August 2022 (UTC)

Please do not arbitrarily move works solely based on case. If that is what the author chose for using modern titling vs what was in use at another time/standard, we should be letting it stand. That has typically been the community's approach. We have always just said to add redirects. Doing that will just lead to bickering and other moves that are all essentially arbitrary and neither resolve the issue or create any benefit. Thanks. — billinghurst sDrewth 12:06, 24 August 2022 (UTC)

@Billinghurst: In this particular instance the author actually used title-case for the work's title (there are self-references to it in the prose), and since I had to retransclude most of it and rejig subpages anyway I took the opportunity to correct the capitalisation too. There are redirects in place for the old capitalization so it should be essentially transparent. Xover (talk) 13:31, 24 August 2022 (UTC)
We have not been slave to the title case, and that is covered in my above components about older and modern standard. There are numerous conversations about retaining the contributor's general approach and not overriding it without good reason, rather than our preference. If it is essentially transparent, it wasn't as it came up in a few places for me, and the move is essentially not required when we had redirects in place. — billinghurst sDrewth 10:56, 25 August 2022 (UTC)
@Billinghurst: If something broke I'd appreciate a headsup so I know to check for that in the future. And as already mentioned, in this particular instance I was going to have to perform major surgery on the page structure in any case, and I saw no indication that the page name capitalisation had been chosen as a deliberate expression of the original contributor's preference so I took the liberty of tweaking the page name in the same mode and manner I made any number of other little tweaks that I felt would improve the text. Absent an actual complaint from (one of) the original contributor(s) that says they prefer the original page name I don't quite see why you're making such a (relatively) big deal out of it. Xover (talk) 13:59, 25 August 2022 (UTC)

Grimm days for Rackham files

I have renamed all of the missing files and I would upload the (zip) file to my google drive but this browser is too old for gdrive. So, Tuesday morning (library is closed tomorrow), I can head to the library with the file and share the link here (or there at the djvu file). The replacement files can be put into a one off download for Tuesday, also, making download so much easier.

Also, I am not as good with zip as I am other archive files. So, if you wait until Tuesday, you have a choice between zip, gz, bz2, or xz. No guarantees for bzip or zip -- I just haven't made that many of those, for sure, I would not try anything fancy with them.

So, in summary: On Tuesday, should you agree, a two file download instead of legion, archived at your choice/whim, within reason.--RaboKarbakian (talk) 20:43, 4 September 2022 (UTC)

Methods for interwiki transclusion?

I notice that you just recently added MediaWiki:Gadget-interwiki-transclusion.js. I've encountered a few different methods for interwiki tranclusion so far; is there anything that makes one better than another? For example, there already exists mw:Manual:$wgEnableScaryTranscluding, which seems like it would be the most preferred solution, yet there are evidently other options available. What makes the problem of interwiki transclusion so technically difficult that it has inspired such a proliferation of methods to address it? Shells-shells (talk) 07:21, 11 September 2022 (UTC)

Well, in essence it is just simply not supported. Since the WMF have not implemnted support for it MediaWiki, all ways to do it are hacks that are trying to do something the platform they are operating in does not actually support. They thus all have lots of downsides and weaknesses, and as such alternate approaches tend to proliferate.
As for why it is so difficult… What happens to templates that are used on the remote wiki, but do not exist on the local wiki or has different content? What about categories? Should remote pages show up in searches? How do you reconcile different content policies? Do changes to remote pages show up in the watchlists of local users? What if the remote page is vandalised? How about if it is simply moved to a new name? What about interwiki transclusion of executable code, like a (javascript) Gadget?
None of the issues are unsolveable, but it's a relatively big problem and the need is very limited (Wikisource is the only project that really needs this). It is very unlikely that anyone will try to tackle this in any relevant timeframe. However, there is a great need (and an acknowledged need) for global Scribunto modules and templates, and possibly also Gadgets, and it is possible this will get tackled at some point And since a lot of the underlying plumbing will be the same, this may make it more likely that we get a proper solution to the kind of interwiki transclusion we need.
PS. I didn't write MediaWiki:Gadget-interwiki-transclusion.js, I just migrated it to a Gadget so we could get it out of MediaWiki:Common.js (and so people can turn it off). I plan to rewrite it eventually because the current code is pretty archaic, but that mostly for general code hygiene reasons. Xover (talk) 07:58, 11 September 2022 (UTC)

Speedy deletion

Well, yesterday I checked the list of translations and the one I marked is really pointless (since there are many other and better translations already out there), that's the reason this page (and the corresponding scan pages) should be deleted. D.H (talk) 06:03, 12 September 2022 (UTC)

@D.H: Thanks for the clarification. Just wanted to check that it wasn't just you throwing your hands up in disgust at our inability to provide clear guidance in this area. :) Xover (talk) 08:43, 12 September 2022 (UTC)

Interwiki gadget

Is there any information on how to use the interwiki gadget? My reasons for using it are mundane. I want to use a single home page for the wikis I am active. Is this possible? — ineuw (talk) 00:36, 12 September 2022 (UTC)

@Ineuw: You mean your user page (User:Ineuw)? No special magic needed for that: you just create a user page on meta (m:User:Ineuw) and it'll show up on all the projects where you don't have a user page. To show it on the projects where you do have an old user page, just delete the local page and the global one will show instead (most projects will let you ask for speedy deletion of your user page for this reason).
There're som gotchas with links and templates (the page is rendered on meta and the result is displayed on the other projects), but you should be able to figure that out with some trial and error (and feel free to ask for help if you get stuck). Xover (talk) 06:05, 12 September 2022 (UTC)
Thumbs up, thank you. I wasn't sure if I was allowed to do this. — ineuw (talk) 02:25, 13 September 2022 (UTC)

Re: List of Governors

I noticed this when I transcluded the page; would subst:-ing the templates get around this problem? If not, the next best option seems to be to split the list into two pages. (Also, the whole work needs a function auxTOC, which should get around this whole issue.) TE(æ)A,ea. (talk) 21:18, 12 September 2022 (UTC)

@TE(æ)A,ea.: No, these templates just aren't subst'able.
BTW, keep in mind, the purpose of the dot leaders is to help guide the eye from the text on the left to the page number on the right, and primarily for short "chapter" titles (i.e. large distance). For this particular list, where the distance between the left and right column are inherently short, the dot leaders no longer serve any actual purpose. They do make the page look a little bit tidier, if a lot busier, but they no longer serve their original purpose. I, as always, recommend just getting rid of them. Xover (talk) 05:27, 13 September 2022 (UTC)

Big, bold red text about wikidata entities!!!

The fables, all 6 or 7 hundred of them, have been indexed. So, (as suggested at the portal) I put them in the order of the index and I started to link them to wikidata via {{wdl}} because, they are a "bitch and her whelps" to find there so, one big finding and pasted on the portal and all is well, etc.

BUT!! I have big red letters within the 500 series. See Portal:Aesop's_Fables#Perry_501–584. Normally, I would just think to rethink it but this error has more entities behind it and is weird to have where it is, if it is indeed a software complaint of the usual kind.

So, 1) is that really a problem or just about that one data? 2) is it going to become a problem (if it isn't already)? 3) what can be done to collect these eh, whelps?

This is nothing like herding cats, btw....--RaboKarbakian (talk) 00:17, 21 March 2022 (UTC)

@RaboKarbakian: I'm not sure why Perry 508, and only Perry 508, is throwing an error there. I see nothing obvious at The Trees Under the Protection of the Gods (Q105289012) to explain it, and if it was the total number of entities used on Portal:Aesop's Fables I would have expected lots more big bold red errors there. When I have the time I'll try to trace what {{wdl}} is doing with Q105289012 in more detail, just in case there's some weirdness that's not apparent at first glance, but so far I see no explanation for what's going on. Xover (talk) 06:11, 21 March 2022 (UTC)
Xover:Not entirely unrelated, but close: Bevis and Butthead Ben and Jerry fix a scan. Much more like Ben and Jerry, I guess.--RaboKarbakian (talk) 00:10, 22 March 2022 (UTC)

Xover I am sorry to bother you, but I am in need of some dialog, I think. The abuse of wdl was naive and accidental; I am not anxious to make the same mistake with {{nsl2}} (which worked very nicely for a simple %s/problem/hack/g). Through my mind have been running some different options. First, just using simple bracket links like I have been (in the 600s and 700s mostly). Second, continue with the nsl2 template. Others: a table where the QNNNNNN is shown and linked to with the row shared by the linked page name here. A Portal:Perry index where Portal:Perry index/001-099, &c. subpages exist. I am actually looking forward to some scripting, much more than wanting a specific format.... So, here is where you inject some guidance and direction, replace my naive with knowledge, or, tell me what to do and where to go if that suits.--RaboKarbakian (talk) 01:37, 8 April 2022 (UTC)

Xover I had some observations about the behavior of that page and where the warning happens. Adding an edition or translation to any of the wdled items would make the warning move up, like from 420-something to 410-something. And that would happen without touching the page. That got me thinking that perhaps the or a module is grabbing "everything" and then picking from that what it wants. And that just using the base module and asking for just the links might fix that page. Maybe I am wrong, but it would explain that behavior. --RaboKarbakian (talk) 01:14, 13 April 2022 (UTC)
@RaboKarbakian: On closer inspection it seems to be simply that the total number of wikidata for the entire page is tipping over a hard limit. But I still don't understand why only a single item is showing an error.
I think we're going to have to rethink the entire entire Aesop's portal for this. What are we trying to achieve with the {{wdl}} invocations there? Do we actually need to fetch data from Wikidata for most of those links? Are we just trying to document what the Q-number is for each of these? Would it make more sense to generate the entire portal from Wikidata by bot, that is, the page itself is just static wikicode and all the lookups happen offline? Xover (talk) 15:23, 24 May 2022 (UTC)
Xover rethought for sure! I have been thinking about it. It shows on the link where it has reached it's limit. I watched it move as I added editions or versions to numbers lower than the "problem" number and the number would move up, meaning, occur at a lesser number.
So, my (unconfirmed) thoughts about the "why" are this: the (or a, since I am not sure which one and I myself have used more than one) module pulls in all of the properties from the data item and then delivers the requested properties. So, if the problem is at fable 14 and I add a version to fable 8, the overload moves to fable 13.
{{wdl}} is a great thing. It is wonderful for linking within articles. I just used it for Yerkes Observatory, for instance. Right now, it points to en.wiki but in a potentially very near future, it might point to a collection of articles here (it has a very famous "largest refractor telescope"). But terrible for a table like the Perry index. But, what would be good for the perry index would also be good for {{wdl}}....
If all that is needed are the links to this or other wiki, then a "simple" module might be made that gets just those inter-wiki links. This module could never be used to build a versions or translations page, for instance, like the other module(s), but would be easier on all systems because of the single purpose.
About the linked Perry index. Yes, it is a very good thing to have all of the fables with their links. I can expound more on this, but I already have somewhat. Not only is the list good, but it should be on every wiki that hosts fables (like el.s, es.s, nl.s, la.s, fr.s....) and on wiki with articles about fables (en.wiki, fr.wiki, el.wiki, etc) with the same cascading way with its links. I am certain that there are other lists of things like the fables that could stand the same thorough treatment to make it easier for the wikis to work together on.
Thanks for getting back to this.--RaboKarbakian (talk) 15:50, 24 May 2022 (UTC)
@RaboKarbakian: Actually, I am a genius and everyone should bow down before me and worship my über leet coding skillz! :)
It turns out that while {{wdl}} doesn't fetch more than it needs from Wikidata, the underlying Lua library it uses does. By tweaking it to call the Wikidata access functions juuuust right, I think I've eliminated the current big red error message on this portal. We'll probably hit a similar limit at some point eventually, but this gives us some more breathing room.
So longer term we probably need a different approach for this portal in any case. Do we really need to lookup anything at all from Wikidata on that portal? Maybe the important bit is just having the Q-number alongside the link? Something like a custom template called as {{perry fable|503|Q7726564|The Cock and the Jewel}}, which just spits out a local link with the provided label (i.e. "Perry 503. The Cock and the Jewel") without ever touching Wikidata. We'd lose automatic linking to other projects when the local page doesn't exist, but performance would improve dramatically. Finding interwikis for each fable is best done from that particular fable's individual page, rather than the portal, in any case. Xover (talk) 16:56, 24 May 2022 (UTC)
Lovely tweaks! This is a thank you and a warning. The warning being that I am going to put the page completely back onto wdls, so we will know what the limit is, or isn't if the index doesn't reach it. I have no problem with whatever decision is made about how the index is handled; it's just that when I was trying to find which fable I had, I was looking at en.wiki, fr.s the link that the perry number goes to -- anywhere I could find it. The same fable have different characters and, they are also reflective of changes in domestics, ie. what was a weasel in the 1600s is a cat now. Goats in Greece are sheep in England, etc. 2100 years of fables, if they were ever clear-cut, they aren't now. So, I will always vote for inter-wiki linking. I would have used it had it been available.
Thanks again! --RaboKarbakian (talk) 18:55, 24 May 2022 (UTC)

Good news and bad news!

Xover I have converted the portal to use only the template now. Bad news first

The bad: The limits of the new and improved {{wdl}} are (still) unknown.

The good: The index at Portal:Aesop's Fables is (still) rendering without warning and also, in seconds (where it used to take 10s of seconds!).

Such is the outcome of "our" very fine tweak....--RaboKarbakian (talk) 13:37, 25 May 2022 (UTC)

@RaboKarbakian: That's awesome news! And do, please, take the credit for the improvement: it was your pointer in the direction of a too greedy Wikidata function that made me go looking for this (the mere technical change here was essentially trivial once the issue was spotted). Xover (talk) 16:33, 25 May 2022 (UTC)
Xover Was bragging about the big {{wdl}} fix and a few more module improvements were wished for at WD templates.--RaboKarbakian (talk) 18:05, 29 August 2022 (UTC)

broken wdlness

Xover: I am having {{wdl}} related problems. A person is removing the category for the book and making it into a shared category (which is very stupid, imo, because it will not ever be a "shared wikimedia category, the publication linking to an article -- commons is the only place for such a cat!!) As a result, links to categories that contain scans and other matter related to the book are not going to commons, but instead going to resonator. There has been instruction (where it was discussed) about which property needs which tweak, but I would rather the practice of making a "wikimedia category" for every commons category a thing of the past.

{{wdl}} is broken for abusive use of commons categories, and I am not big on fixing it to not be broken then. I consider this to be abusive, maybe I am wrong, but shared cats should be made when two or more cats are using it on wikimedia sites (plural) and not before -- well, I use that for when there is a "character"/"tale named after the character" cross over mess, to get the publications out of the mess....--RaboKarbakian (talk) 21:27, 29 September 2022 (UTC)

Concerns about contributions from User:JoeSolo22

Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. ShakespeareFan00 (talk) 16:27, 28 September 2022 (UTC)

Being: NIS 9, Spain, Armed Forces NIS 11, Sweden, Armed Forces NIS 13A, East Germany, Armed Forces

These are completely unsourced and given their nature may contain portions that would need to be redacted. ShakespeareFan00 (talk) 15:50, 28 September 2022 (UTC)

Source is here: https://www.cia.gov/readingroom/collection/nis. MarkLSteadman (talk) 15:54, 28 September 2022 (UTC)
Thanks.ShakespeareFan00 (talk) 15:58, 28 September 2022 (UTC)
I left them a reminder about sourcing of other items they've contributed (User talk:JoeSolo22#Sources for material you have uploaded and transcribed. ShakespeareFan00 (talk) 16:13, 28 September 2022 (UTC)
My bad. I'm extremely new to Wikisource and probably should have been attaching specific sources from the FOIA Reading Room.
Any still-classified information on those documents is still redacted or censored as per CIA declassification policies.
I'll attach sources ASAP, thanks for letting me know. JoeSolo22 (talk) 16:24, 28 September 2022 (UTC)
@ShakespeareFan00, @MarkLSteadman, @JoeSolo22: Thanks all for being vigilant; and for sorting this out quickly. Much appreciated! Xover (talk) 18:06, 28 September 2022 (UTC)

copyright discussions

I'm mostly not over here, but feel free to give me a cross-wiki ping on my talk page if you need someone who knows "how" to actually search the NUC, CCE, or VCC for you. I'm mainly doing biblio research on Wikidata these days, but absolutely don't mind taking a few minutes to look into something that's actually being discussed. Jarnsax (talk) 06:02, 30 September 2022 (UTC)

@Jarnsax: That's mighty kind of you to offer. Thank you!
But since I'm greedy and tactless (and speaking to talk page stalkers as well :)), the major problem we have with WS:CV and WS:PD is often not researching actually determinable copyright status, but a very limited pool of people willing to participate and offer opinions on the undeterminable cases. We thus get mostly the same old regulars, and the occasional contributor fighting for "their" text, with by now more or less entrenched positions. Not to imply that there's anything wrong with having a particular bent on such issues (although the "preserve old junk at any cost" stance can be a little tiring when you're trying to clean up old sins), but with so few participants these positions do not obviously reflect the community as a whole and a lot of discussions become impossible to actually close. Thus, backlogs for miles (or, well, years, more aptly). So… Anyone willing to systematically look at every copyright discussion or proposed deletion to offer their opinion would be very helpful. And I know that's a very big ask, so I'm mostly mentioning it in the hopes that little by little I'll get a substantial number of people to do a little each and thus get those backlogs back on track and possible for normal people to follow without drowning (because right now those backlogs are hideous).
In any case… I saw elsewhere that you're a Wikidata person, which I don't think I'd really picked up on before. I'm interested in improving our use of and contribution to Wikidata for the bibliographic area, so if you see any issues or run across any opportunities there I'd love to hear about it. At the top of my wishlist is some sort of sane (JS) API I can use to build better tooling here. Books and person-as-author information models are fairly abstruse and hard to deal with without ubiquitous and user friendly visual tools, so I'd really like to make that as smooth and in-your-face as possible on the Wikisource side. We put such a lot of effort into bibliographic research here (not to mention all the copyright research) that it's a real shame we can't better push that back to Wikidata. Xover (talk) 04:34, 2 October 2022 (UTC)
Yeah, how to actually get books working "correctly" on WikiData has quite the learning curve, and for a long time was somewhat pointless, since the tools to actually "use" it were lacking (Cite Q). There's also that actually getting you head around WD's version of the FRBR, and knowing enough about how library cataloging actually "works" to get your head around what it's telling you... a LoC LCCN can actually point at any level of the data model... for something like "Lonely Planet USA", it's at the work level, with 20 years of bi-annual editions cataloged under the same LCCN, but there are also things like collections of pamphlets bound together where each one is cataloged separately. Getting the basic identifiers pointing at the right level is not always obvious, and to actually "deconvolve" stuff you really need to get them at the right level. Merged datasets can really suck.
If you look at History of New York State, 1523–1927 (Q114083307) (which is a the work entity) and the edition and volumes, I've actually put some effort into doing that one as 'completely' as possible, with references for everything, and all the info needed to verify it's copyright status (hopefully) automagically.... which has included some tweaking of the various properties and such in WD, lol. It's a process.
FYI, I've uploaded the (de-watermarked) PDFs, and scans of the plates, for volumes 1-5 on Commons, and I'm working on volume 6. If you look at the Category over on Commons commons:Category:History of New York State, 1523–1927 (1927) and the ones for the volumes, all the data in the infobox (and the links to the volumes from the edition, and between the volumes) are being handled automatically, from WikiData... the pages just invoke c:Wikidata Infobox with no parameters.
If you look over on enwiki, at w:User:Jarnsax/citations, I have a 'catalog' of automatically generated citation templates... I'm overriding the author/editor names just because of enwiki policy, the "house style" is last, first, otherwise I'd get the "object stated as" that gives the name as stated on the title page. At w:Ulster County, New York you can see it actually used at an article, specifically in the "as elaborate as makes sense" citation style (by which I mean, the Sullivan cite is through a footnote, using the "simplified footnote" to insert a pagelink, and linking to a citation of the specific chapter about the county in the bibliography). While some of that linking is 'forced' now, because Cite Q isn't done yet, eventually these links should go to WS, and be automatically maintained.... I am not sure exactly how it will work, but for existing transcriptions the links can be 'forced' to WS.
An intended future feature for Cite Q, from what I read, is to 'automagically' update the 'title' interwiki links in citations (I'm faking this now, by adding commons as a "full work available at url" and setting it preferred... the same would work for WS for now, to transcriptions, but eventually a regular interwiki to here should take precedence and the full work entry can go away).
I'm actually 'working the crap' out of Sullivan over on enwiki right now (while digging out the plates from volume 6) to replace the many, many times it was cited via a "Holice Deb and Pam" transcription on a now dead webpage, as well adding the chapters as general references and "founding date" cites for counties, cities, and towns in New York. I hope this book can end up as an 'exemplar' for how to get this shit all flying in formation, down to pagelinks in wikipedia articles (hopefully) being automatically created down to the page number anchor in WS transcriptions... across dozens of articles. :)
And then, wash rinse and repeat a few thousand times. I do want to prioritize stuff that exists here, tho. Jarnsax (talk) 05:47, 2 October 2022 (UTC)

Vaught’s Practical Character Reader Copyright

Search up Vaught’s Practical Character Reader on Wikisource. It’s date is 1902 Blahhmosh (talk) 05:04, 2 October 2022 (UTC)

@Blahhmosh: I'm happy to help with the copyright question, but I'm not going to do your research for you. If it is on Wikisource somewhere it should be easy for you to link to it, rather than make me search for it. And I mentioned some factors that are essential to determine copyright status in my previous message. Xover (talk) 05:08, 2 October 2022 (UTC)

Author died in 1903. The book was published in 1902. The place was in the US. Blahhmosh (talk) 05:32, 2 October 2022 (UTC)

{{wdl}} and categories

You asked to be told about issues there... this came up after WD edits I made, which resulted in someone from here yelling at me about broken interwiki links. After some 'unproductive conversation', lol, eventually I figured out there is a bug in this template (actually, the module), which is widely used here. This will take a bit of explanation, and it's worth noting that I don't know Lua, so don't look at me to fix it.

{{wdl}}, which is used by {{header}}, attempts to fail down a list of possible interwikis when attempting to find a valid pagetitle to link to. One of the possibilities it considers is a Commons category.... it looks to see if one is directly linked to the WD entity. This is not a valid test... it is incomplete, and doesn't reflect the way that categories are normally linked to "articles", meaning pages in mainspace.

For categories, there should be an entity "instance of" Wikimedia category (Q4167836), that is where category interwiki links (and the "Commons category" property) live. This entity is linked to the main entity about the subject with wikidata:Property:P301 "category's main topic" and wikidata:Property:P910 "topic's main category". A similar mechanism exists for Wikimedia list article (Q13406463), and lists are linked to categories using wikidata:Property:P1753 and wikidata:Property:P1754

This way of modeling lists and categories about a subject is "ontologically correct", which WD likes. It's also essential to resolving cross- and intra-namespace conflicts, when the 'main entity' needs to be linked to a "article" and/or "list" in mainspace, and/or a category with the same pagetitle. The ability to directly link a Commons category to anything but a "Wikimedia category" was only added after much much much complaining from Commons people, and you shouldn't rely on things being linked that way, as the existence of a Gallery on Commons (collecting plates from a book, for example) will require the existence of a "Wikimedia category" entity to deconflict the pages.

{{wdl}} needs to, when looking for a pagetitle, follow "topic's main category" and look for a Commons category interwiki link there, as well. Otherwise completely valid and legitimate "cleanup" edits to WikiData, adding a category entity, will cause undetectable broken pages here. Supporting list articles (I can dig up the ids, there are "has list" and "has list of works" properties) would probably be a good idea as well, since a lot of "articles" (meaning only pages in mainspace, that are not redirects, part of Proofread Page, or something else odd) here, that are things like lists of editions, actually are IMO "list articles" and should be able to link to a matching "list article" on enwiki first, and then fail back to a "normal article" on the topic (and not break if there is only a list article for some odd reason, and so an enwiki interwiki link doesn't live on the main entity).

None of this "list article", stuff applies to Author pages or Portals, BTW.... they don't live in the main namespace, so are not "Wikimedia list articles", they are their own thing. Jarnsax (talk) 10:41, 2 October 2022 (UTC)

@Jarnsax: Thanks. Questions…
What makes you say {{header}} uses {{wdl}}? It doesn't, so far as I know (I could be wrong).
Hmm. Are you saying that a edition level item should have a topic's main category (P910) pointing at an item that is a Wikimedia category (Q4167836) (or possibly Commons category (Q24574745)) and which in turn has an interwiki to commonswiki? What then is the role of Commons category (P373)?
Also, this may be problematic due to the performance issues with accessing Wikidata. Grabbing interwikis from a single item is reasonably performant; walking the dependency hierarchy for multiple items and grabbing random properties from them is slow, painful, and requires ooodles of boilerplate code (existence and sanity checks for every single level).
Then again, I am not certain linking to Commons even makes sense for what {{wdl}} is intended for. Essentially it is a way to link to English Wikipedia inside texts in a way that will flip back to Wikisource as soon as we create the relevant page. Commons categories, galleries, and other sister projects are borderline under the intent of the linking policy. Xover (talk) 12:14, 2 October 2022 (UTC)
I got the impression header uses it from what things I was told my edits broke... I didn't 're-break' it to check (the person who complained had undone my edits over there) or read {{header}} until just now, and you are right, it does not depend on {{wdl}}. I actually understand lua well enough to see where the issue is in the module {{wdl}} uses.
It might be an edition item, it might be a work item, it might be volume items as well, it depends on what categories exist over on Commons, if there are multiple editions, and if they have page or plate scans... whatever level of detail of categorization is appropriate depending on what 'they have'.
There is a bot that automatically adds P373 to every entity, regardless of what it is an instance of, that has an interwiki to a Commons category, and also adds the interwiki to any entity that has P373. My impression is that it's to let other wikis pull either one, for simpler code on that end, but I could be wrong. Never bothered to go back and read the bot request.
Where I was told, finally, that the issue was {{wdl}} was on my talk page, after chasing the guy off my WD one, and I don't think there's an easy way to link to it... Portal:Aesop's Fables was the example given of it's use (though why, unsure, since I didn't edit those entities) that caused it to fail through to reasonator, and I see how it would cause that behavior if the link was on a "Wikimedia category".... but maybe it's just being misused. Jarnsax (talk) 13:27, 2 October 2022 (UTC)

Jarnsax: Shouldn't you be making the shared category for d:Q114144281 and not bothering others to make it work with out a reason? The "stuff" I was working on has been tied up with this, so, please justify it with another category somewhere to share.

Also, don't worry about the bot, that property is not necessarily about shared categories. I use it to show what category the scan should be in, for instance. I "should be" or "am" in communication with the bot author already.--RaboKarbakian (talk) 17:10, 2 October 2022 (UTC)

@RaboKarbakian Please don't start this again over here.... what I "should" be working on is whatever I "choose" to be working on. Right now, I'm not working on that, and have chosen to do other stuff to do first...actually, specifically to avoid breaking your shit.
I am not, as I have already told you, intending to be adding Wikimedia category entities to any books that might be, or might end up, over here. I'm not doing it, and actually started working on other stuff for a while, because you told me it broke stuff here. The specific complaint you showed up with was not due to any issue with the edits on Wikidata, despite you trying to start a revert war over "housekeeping" edits to WD that are completely valid and only unusual in that they are being linked to by this template.
That I'm "not doing it" does not mean someone else won't, or that I'm going to refrain, forever, from doing it correctly just because a template here is 'broken'. Now that Cite Q is working better, and people will hopefully start fleshing out the dataset of books, anyone not from here that looks at how WD normally does it, and does it like that, will break your pages. Any book with a Gallery on Commons will break your pages, because it will require a Wikimedia category entity to hold the Commons category link.
I'm actually under no obligation, over on WD, to try to hunt down if Wikisource is 'undetectably' linking to an entity, and going to spit out bad results when fed good data, by coming over here and searching the wiki... no other editor would have the slightest idea that you have an issue, and most would have taken you to AN after they told you to get off their page the first time and you ignored them, after you showed up out of nowhere complaining incoherently and calling them a stalker. Jarnsax (talk) 17:47, 2 October 2022 (UTC)
@RaboKarbakian: Your frustration is leaking out as snide hostility. Take a few breaths, listen to some smooth jazz, have a cuppa… Then lets figure out how to fix stuff.
@Jarnsax: Please excuse RaboKarbakian. They are really invested in using Wikidata specifically for the interwiki linking functionality, and put enormous effort into it. As a consequence they tend to get a bit flustered when stuff breaks in seemingly mysterious ways.
The bottom line is that this is not a panacea. Wikidata is its own project, with its own goals, practices, and policies. That MediaWiki shows a interwiki link that goes where you want it to go does not ipso facto mean that what's been put into the Wikidata item is actually correct. And there is absolutely no guarantee that what Wikisource wants is even possible within the constraints of 1) what Wikidata is, and 2) what MediaWiki as a software platform is capable of. Jarnsax gave me some pointers in this thread that might lead to some way to make everyone happy (or they might not), or at the very least actually understand the problem so we can document the limitation and manage expectations. Xover (talk) 19:52, 2 October 2022 (UTC)
Thank you for that, Xover. It's not that I have some mad attachment to the "Wikimedia category" thing, it's that stuff that uses WikiData should, logically, all work the same way, or at least be compatible with the most common way, instead of creating some undocumented special subset of items that only work "the less common way" and will be undetectably (from editing Wikidata) broken, since they are being used on pages here that are not "directly" linked to them. If not, it should be discussed and documented on Wikidata, here, and probably other places.
The categories on every language wikipedia use "Wikimedia category", they have to, otherwise they would have massive conflicts because of how many articles and categories share the same pagetitle, or translations of the same pagetitle on different language wikis. Ignoring it isn't a 'sustainable' answer, and it's not because of me. Look at something like Category:Earth (Q1458521), and imagine the inter-wiki conflict horror. People know it's done that way, and do... there is a very large population of people who are 'wikipedia people' that have been trained for the better part of a decade that it's how it's done, it's really only Commons people that put them directly on the 'main entity'.
Wikipedia, Wikispecies, Wikitravel, etc., people writing citations using Cite Q (which is designed to be 'international') to books transcribed here would randomly break your shit, and have no way to notice, and meta:WikiCite is very much a thing. That it's "an issue" is very much due to the widespread use of the template here, and it's only an issue for here AFAIK (the templates at Commons understand either way, because the people that wrote them know that directly linking categories is not normal, and someone on some random wiki might have a 'functional need' for the Wikimedia category entity). If the "fix" requires editing the pages here, instead of just the template or module, en masse, that looks like a really easy bot (or even AWB) task. I'm not trying to be a jerk, it's not like I haven't edited on and off here for a very long time, and I'm trying to get cites in articles on en that link to shit over here, and hopefully setup so that your incoming interwiki links from those cites that will just work automatically.
TBH, if I had not 'broken this' (Cite Q still isn't used much at all) then someone else would have, probably before long, and the response from someone who was not familiar with or didn't care about Wikisource, or didn't even speak English, would probably have been far less patient and productive than it taking me five minutes to see the problem once I was finally told what template to look at. It's only that I had specifically made a list of entities with hathi ids, knowing it was likely to have wikisource books that were cited on enwiki and could be linked through the template, that made me hit this so soon. Jarnsax (talk) 21:41, 2 October 2022 (UTC)

Xover: Is there a reason you are using {{pbr}} on talk pages? You might consider to use the "legacy" experience for talk pages.--RaboKarbakian (talk) 17:10, 2 October 2022 (UTC)

Habit, I guess, mostly? In traditional talk pages it is sometimes needed to avoid breaking stuff, so it's ingrained reflex for me even when using the nifty new reply tool. Does it cause problems on your end? Xover (talk) 19:54, 2 October 2022 (UTC)

@Xover Regarding 'performance' when crawling dependency trees... one of the things I've been using "New York" to nail down as an example is tweaking properties and constraints to be able to make at least the basic info about versions and volumes able to be 'transitive' (a bad term for it) without triggering errors.... for instance, looking at History of New York State, 1523–1927 (Q114083307), the basic 'language date place and publisher' of the edition is stored as qualifiers to the "has edition" statement, and I'm not "expecting" you to trawl upward from the volumes to inherit stuff like author or copyright status from the edition or work, I'm explicitly stating it on each entity that it applies to. If we can get 'like that' (the 'documentation' doesn't go really go into that much detail yet) the way to 'do it', that should make it possible to do something like grab a meaningful list of editions from the work without having to crawl down to each one. We should also to be able to create usable a tree of "logical" divisions of the work (parts, chapters) if needed because of how stuff is transcluded here, to get linking to work, with them 'dependent from' the volume they are in. I just haven't tried it to see how well it works at that level of detail, since "New York" doesn't have stuff here to link to. Jarnsax (talk) 00:15, 3 October 2022 (UTC)

@Jarnsax: Well, there's two things here.
One is that MediaWiki has a couple of hard limits on Wikidata access: total Lua runtime for a given page is 10 seconds, and then the code will get terminated, and the maximum number of Wikidata entities that can be accessed from a single page (I don't recall the number ottomh, but we hit it in lists). mw.wikibase.getEntity is ok-ish when accessing the current page's connected Wikidata item. But when you ask for a different Wikidata item by qid, performance goes into the toilet.
The other is the complexity of the code. mw.wikibase.getSiteLinks just gives us the interwikis straight up, but to do the same "manually" means fetching Commons category (P373) and category's main topic (P301); checking that they are both not nil and not the empty string; then fetching the item they refer to; and then asking for that item's interwikis. If we're going to walk the tree up to the work we're adding many more steps, and accessing percentage-wise a lot more Wikidata items. So a lot more boilerplate code, and reducing by a third the number of items we can have in a list before hitting limits.
That's not to say we can't, won't, or shouldn't do that of course. I'm just explaining why I'm not immediately jumping for joy at hearing of the ontologically correct approach.
In (SQL) databases, this kind of issue is often addressed using "views": virtual tables that combine columns from many tables into a coherent view for a specific use cases. Alternately, one builds an API in front of the data layer to abstract away the details one does not care about. I think in the area of bibliographic data we need to start thinking along those lines if we're to let Wikidata stay ontologically pure within their data model and let other projects do easy stuff (like getting interwikis, and author info) in an easy way. Hopefully that would give me a saner JavaScript API to use to make specialised GUIs for Wikisource-relevant Wikidata data. Xover (talk) 08:23, 4 October 2022 (UTC)
@Jarnsax: But this is what that might look like. Does it look to you like I've understood how this is supposed to work to you? Xover (talk) 16:46, 4 October 2022 (UTC)
Within my incredibly limited understanding of Lua, yes. Don't ask me to debug it, but from how I follow the 'logic' that should give you the right results. There is actually a bot that keeps interwikis and "commons category" statements in sync on specific entities, it will create one if the other doesn't exist, regardless of if it's a "Wikimedia category" or not, so if you are willing to rely on that, and allow possible temporary breakage until the bot catches up it'll reduce what you need to check for.
If you look at the data model for editions, I really don't think all the choices are the best.... an egregious example being that the data model says you don't have to set the author on an edition entity if it's the same as on the work, since if you don't set it breaks Cite Q when actually trying to cite.... well, essentially anything, since it doesn't crawl up or down either, at least not yet. You can probably take that into account, the authors of Cite Q might just say 'no' to implementing that the way the data model says now, that trying to 'inherit' stuff just isn't workable at scale for anyone.
It's also ignoring that an edition, even if it has the same author as the work, still "has" an author, so.... yeah, a little broken. I'm sure it will eventually get sorted, but I think part of that will revolve around Cite Q and the same software limits. It's easier to persuade people when you can point at reality and say "this is what actually works." I suspect that what I was calling "transitive" up there is going to have to be a thing, if people want to use WD to actually create lists.
It's rather the point of my working with this to figure out how to write entities that actually produce correct citations from the current code, even if it means saying the same thing on multiple items, and maybe doing what's just literally wrong in the 'documentation', and having those discussions... it's going to take real world experience with edge cases to figure out what actually works while still making sense, given how weird books can be.
One thing that almost certainly isn't going to change, though, is the whole "work/edition/exemplar" distinction, it's based on the FRBR, and pretty essential to 'deconvolving' things that other databases have conflated. Jarnsax (talk) 09:19, 5 October 2022 (UTC)
@Xover Thinking about this, as far as 'optimization', and playing with it...
Using Portal:Aesop's Fables, comparing the existing code and the sandbox version in previews, the profiler doesn't show me any real difference other than a few 10's of kbs of Lua memory (and it tells me the entity limit is 400, FYI...I didn't know either) and the page runs in almost exactly the same time... which makes sense, since I think most of the items on that page have an interwiki, so the "extra stuff" never happens enough to matter much.
With the sandbox code, I don't see the "number" of loaded Wikibase entities "explode" (from 83 to 121) unless I actually remove the test for interwikis completely (and switching case 2 and 3 didn't really matter). This is on a page that invokes {{wdl}} 521 times, and the 80-90 number seems to be the ones that are falling through to reasonator with the existing code, and 40-ish times there is a 'category' entity to look at. The other several hundred are just finding "Commons category" instead of an interwiki, I guess.
These numbers seem to lie, unless it's "fetching" sitelinks/interwikis without actually "loading" the entity per the profiler, and being found that way doesn't count against the max # of entities loaded.
What this lead me to think is that the extra code is only going to really hurt performance at all for the specific invocations using an entity that does NOT have a valid interwiki (i.e., fails case 1), and only if there is actually a wikimedia category entity.... if you've already 'loaded the item' to look for P373, looking for P910 doesn't hurt unless you find (and need to load) it.
I'm also getting the impression, tbh, that it's going to take thousands of invocations of this on one page, or several hundred that don't find an interwiki via case 1, to break page rendering. The extra code doesn't make any difference unless case 1 falls though, also, so... I'm led to think that most pages that break this would have broken it anyhow. The module only loads once, and the template is so small that template expansion just doesn't matter, it's not like one of those tables where every line is an individual template, that recursively balloon.
I'm not saying you are wrong, and stuff won't break, but that Portal seems so far from having an issue that I can't figure out where the problem would happen. Is there a page that uses {{wdl}} some truly insane number of times you can point me at?
It seems telling that Portal is actually invoking {{wdl}}, and running the existing code, actually fetching sitelinks (or attempting to) 521 times.... loading more than 400 items would hit the limit, and it's apparently loading exactly one, and working fine.
I don't think (after really looking at and messing with it) that this will actually cause drama, other than on pages where the (the number of fails through to "Commons category") plus (the number of times it fails there and is "successful" at finding a second entity to load) is greater than 400, and even then "time" isn't going to be the failure point.
Is that really going to happen? Not trying to "dispute" that it can, it just seems to me like that would probably be an extreme edge case, unless there are some massive lists of stuff that is just on Commons...any case in which the existing code doesn't go to reasonator, it will make zero difference, and not go toward that limit of 400. If it did, the existing code wouldn't render on the Aesop's Fables portal (since it calls wdl way over 400 times). Jarnsax (talk) 04:13, 8 October 2022 (UTC)
It seems like a page it would break would have to 'currently' have 400 or more links to Reasonator. Jarnsax (talk) 04:28, 8 October 2022 (UTC)
@Jarnsax: I think your analysis here is correct, in the main, but I haven't had the spare cycles yet to dive into it (hence why it's still in the sandbox). The sandbox code doesn't load any extra items unless it first fails to find a regular interwiki, and then only when the main item actually has a property pointing at one of the two relevant category-type items. But what worries me is that once the commonswiki sitelinks are removed from the main item, all those {{wdl}} calls will pass the first gate; and if they are removed because they have been moved to a separate item pointed at by one of those two properties, then they will pass the second gate. At this point, every {{wdl}} call will have to fetch an extra item. And Portal:Aesop's Fables currently contains ~725 fables, it's just that not all of them are linked yet. At two entities fetched per template call, we'll fall over after less than a third of those.
In addition, the sandbox code currently doesn't walk the workeditioncopy tree, which, for some use cases, it probably ideally should. If it does that, pathological cases can end up fetching rather a lot of items from Wikidata and hitting the limit very soon.
That all being said, this is not a new problem, or one unique to categories. Hence the existence of ListeriaBot. I have on my todo list to figure out some way to handle things like the Æsop's portal, and whether a ListeriaBot-like approach can adequately serve our needs. It may not, simply because not all content or organizational elements of that portal are appropriate to represent on Wikidata. For example, the arbitrary bins (Perry 1–100) they are divided into, and the non-Perry classifications of the later listed fables that are onthology from the notoriously fuzzy humanities, not universally applied, and partly reflect local preference on Wikisource / the contributors involved.
I think we also need to seriously consider whether, given the current limitations of both Wikidata and and Mediawiki, we are simply going to have to forego Wikidata for this particular use case. We already, elsewhere, push the absolute limits of what Mediawiki is able to do in its current form, and fetching lots of Wikidata items on a single page is always going to be challenging in the current software architecture. In light of the indirection approach I now understand Wikidata prefers, in addition to our book-related use case (which means we need to deal with the workeditioncopy model), I am leaning towards concluding that this use case simply isn't possible currently.
PS. These numbers seem to lie, unless it's "fetching" sitelinks/interwikis without actually "loading" the entity per the profiler, and being found that way doesn't count against the max # of entities loaded. Yes, I have a sneaking suspicion sitelinks are somewhat magical in some way. I haven't found any docs that spell that out, but I've made similar observations in previous jousts with this problem. Xover (talk) 11:04, 8 October 2022 (UTC)

Reprint copyrights (esp. TOCs)

Before I start a whole "thing" at WS:S, could you cast a wise thought over the idea of reprints? Specifically, some issues of The Criterion are PD-US since they go back to 1922 (i.e. publication over 95 years ago). However, the only copies obvious extant (save one solitary issue from volume 1) are a 1967 reprint. Obviously the preface is copyright if it's from then (presumably slightly before as TSE died in 1965, but in any case, it's not from the 20s). However, what about the TOCs, title pages and publisher's info? (see https://archive.org/details/criterion19221930002unse/page/n7/mode/2up).

Especially the TOCs give me pause. The original issues had an issue-by-issue TOC on the cover. It's not clear to me if they were then issued in volumes with complete TOCs as is common, or if the reprint added a volume-wise TOC as "new" content.

What do you think? Redact it all for safety? Do we have a guideline here? Inductiveloadtalk/contribs 16:59, 9 October 2022 (UTC)

@Inductiveload: Quick reply without looking.
Only creative work is copyrightable. So a compilation of poems is creative, and copyrightable, in the selection included. Facts are not copyrightable (until you amass enough of them to hit database rights). A listing of article titles in alphabetical, chronological, or physical (observed) order is not copyrightable. Adding page numbers is not copyrightable. A toc with enough creativity in its ordering to be copyrightable would be useless. So… unless we're into stunning visual design or other graphical elements, I find it highly unlikely that anything described as a "table of contents" is copyrightable.
Publisher logos are copyrightable, though. The layout of a title page could conceivably be so, but I'm not sure I would care about that for most typical title pages (centered text of various sizes) absent obvious graphic design elements.And mere information (name of publisher etc.) is not copyrightable.
I'll try to take a closer look tomorrow, but I'd be very surprised if much more than the preface and publisher logo was an issue. Xover (talk) 18:45, 9 October 2022 (UTC)
@Xover thanks! That was kind of my thought (roughly following w:Feist v. Rural) but I'd thought I'd check in. Luckily the actual article that was requested is in volume 4, and I did find that volume in the original form: Index:The Criterion - Volume 4.djvu. However, the others seem sulkily disinclined to reveal themselves in the OG form. There are only an handful of pre-1927 volumes, so redaction isn't really an issue if it comes to it, but synthesising a TOC is a bit of a faff if it's not actually required! Inductiveloadtalk/contribs 19:51, 9 October 2022 (UTC)

Hathi and copyright

You had made a comment, when talking about the 'degree of confidence' in something, about Hathi not being 'determinative', and you're right. If it's post-1927, and they call it copyrighted, it means one of two things; they checked, and it is, or they didn't check. We have no way to tell the difference.... it just brings "maybe they checked" into question, and in the particular case the renewal might have been hard to find due to the "author" and the college name changing.

If you 'expect' it to be copyrighted, though (post-1927), and they say it isn't, you know they have actually done a copyright clearance on it.... see https://www.hathitrust.org/access_use#pd. Like they also say on that page, their "claim" that they cleared it doesn't absolve you from the responsibility of checking yourself, you should still look it up. They are "telling you" that they found evidence, and you need to find it. I've never found an instance of them being wrong, in such a case, though I'm sure they make mistakes.... like they say, you need to look it up. That's part of why I volunteered my services to do so, if needed.... the scans of the CCE are a pain in the ass if you are not familiar with them, way beyond what they would be as 'books', because of how they were issued indexes are often stuck in the middle. The VCC is far worse, while the NUC is easy. :)

The Stanford renewals, otoh, just "might" tell you where to look. They have made a point since the beginning out of telling people they are a "bad" source. Jarnsax (talk) 00:14, 4 October 2022 (UTC)

At you can probably infer from wikidata:Wikidata:Property proposal/National Union Catalog ID, it's part of my "plan" to systematically look them up while going down my list of Haith IDs on Wikidata. Jarnsax (talk) 01:04, 4 October 2022 (UTC)
Are y'all aware of the extraordinary work done at the Online Books Page tracking copyright for books and periodicals? You can often find a clear determination about, for instance, post-1927 renewals or the lack thereof on those pages. A little tricky to navigate, but there's a wealth of info there that can save you some searching. -Pete (talk) 05:14, 4 October 2022 (UTC)
@Peteforsyth: No, I don't recall ever seeing any copyright-determination-relevant info on the (otherwise excellent) Online Books Page. Do you happen to have an example at hand?
Not that we can blindly rely on any external entity for our own determinations, of course, but reusing existing research into publication dates etc. is a definite time-saver. Xover (talk) 07:31, 4 October 2022 (UTC)
I find it especially useful for periodicals. There's an overview page that lets you look up periodicals alphabetically, and also has a concise summary of some of the relevant law in the intro, and links to a couple other pages with further info. Here's an example with renewed issues and renewed contributions for the Oregonian: https://onlinebooks.library.upenn.edu/webbin/cinfo/oregonian -Pete (talk) 17:20, 4 October 2022 (UTC)
@Peteforsyth: Oh wow. I had no idea that existed. Thank you! This is going to be very helpful. Xover (talk) 19:17, 4 October 2022 (UTC)
@Jarnsax: HathTrust just gives me dumb year-based cutoffs; has some really crappy bibliographic data (and some very good data, lest I be misunderstood); and are not interested in corrections to either. In other words, I rely on them for pretty much nothing much more than as a slightly higher-quality version of the Internet Archive. The Stanford database is, for me, "good enough" for the straightforward cases: if you check with both title and author and don't find a renewal there, I'll accept that as sufficient searching to support a claim of {{PD-US-no renewal}} absent evidence to the contrary. Most of this stuff is "balance of probabilities" and risk assessment (to both ourselves and our reusers), especially when you get into cases where you need to know contract details between authors and publishers (or estates) to know who actually owns it and needed to make the renewal.
I think Commons (vs. us or HathiTrust) made a really smart call with c:COM:PRP and the US+country of origin policy. Those two in combination sets the bar (and expectations) to a level that will fend of most legal risk, and makes it possible to weed out all the really iffy cases without endless fights to save every little bit of content. It does get a few babies thrown out with the bathwater, but not too many; and, crucially, it makes keeping the vast majority of content much easier. Commons actually probably doesn't notice that that's the case due to sheer volume and variety of content (and chronic understaffing in every aspect of their management), but for our relatively narrow area of content it would have worked wonders. Oh well. US-only is another simplification that, in a lot of cases, makes things easier. But it does lend itself to rather interminably long discussions of the edge cases (as Pete can certainly attest to). Xover (talk) 07:50, 4 October 2022 (UTC)
The main issue I see with Hathi's bibliographic data, tbh, is just the kind of standard thing, that a lot of old stuff is cataloged with data from the Library of Congress's electronic card catalog, and it has the same issue the Library of Congress does with cards that were only partly copied into the computer, as well as the card actually describing the edition the LoC has instead of the ones the various Hathi libraries have.
The Stanford database has gotten better to use over time, it used to often be really hard to find stuff if you didn't know the registration number, because searches would give you renewal claimants instead of authors. The error rate isn't 'that' bad, it's like with the other stuff, you usually find it right where you expect, if it exists. I've just seen enough instances where I was 'sure' it would have been renewed, just because the copyright would still have been valuable, and it was unfindable in Stanford until I knew the renewal number. It's been a while since I used them a lot, though.
I know that we can't do 'real' copyright clearances on stuff... it's impossible without a level of information that is far beyond anything realistic for us. We just should at least make good faith effort to look in the normal places, and I tend to be a bit OCD, lol. I actually did participate in a lot of copyright discussions over on Commons, when I was active there (years ago, now) on my old account. I agree with you about the rules, there's just also a huge amount of stuff where nobody really appears to have even tried to realistically check... 'old sins', like you called them, and there at least used to be a massive pushback to even looking in the direction of the URAA backlog, and canvassing of people from "local" wikipedias to, essentially, rant about that Commons has to care about the URAA at all if anyone tried to discuss even the most obvious cases in order to find a consensus about the 'level of evidence' for them.
I'm far less 'bothered' by a long discussion about something that is actually a debatable point (is this 'original enough'?), or something where it's not provable as a 'fact' and so opinion really matters (is this an edict?), than when people just go on and on about irrelevancies. We're supposed to be looking for and discussing evidence in order to achieve a consensus, and actually listening to what other people say, not 'litigating a position' to get the 'verdict' we want from the closing admin. Lawyers are obligated to make the best case they can for their client even if they know for a fact it's bullshit.
I also apologize for the length of the Stalin thing, whoever closes it. (sign) Jarnsax (talk) 07:11, 5 October 2022 (UTC)
@Jarnsax: Closing admin is likely to be me, just because I seem to be the only one with the stubbornness and intestinal fortitude to wade through it all. But as I think I alluded to somewhere up above, the main challenge there is insufficient participation to neutrally assess consensus. We have some community members who regularly try to help out (for which I am eternally grateful, even when I vehemently disagree with their positions), but too few and, because they are a self-selected subset, tend towards entrenched positions. Having to dive into the details and being the closing admin is a rather uncomfortable position to be in, that engenders a lot of agonising over whether I have sufficiently juggled the hats correctly. Sheer volume and diversity of participation at both WS:CV and WS:PD would help immensely with that (which is why I am always recruiting when an opening presents itself).
Regarding bibliographic data I have despaired of any traditional catalogue, and especially any metacatalogue like Worldcat or, to a lesser degree, HathiTrust, achieving even the minimally acceptable level of quality. I think a "Worldcat" must be crowd-sourced and collaborative to have a hope of functioning. And my hope is that Wikidata can become the hub for that, but I despair at the lack of good tools and concerted effort for it. The community at Wikidata seems way way too fond of ingesting huge data sets and, to put it coarsely, ontological wankery, to get any traction on the bibliographic side. I so wish the Internet Archive would stop focussing all their effort on spamming links to their "properties" all over Wikimedia projects, take OpenLibrary out back and put a bullet in it, and then put their collaborative efforts with the WMF into developing the tools needed to let Wikidata kill Worldcat and VIAF. There's such a huge opportunity there, that's being wasted by focussing on selfish goals and competing for the same pool of volunteers.
And, yes, URAA seems to be as anathema at Commons as it always was. But then there's a certain set of our extended community that seem so focussed on the subjective desire to hoard "all the content" that both URAA and excluding fair use content appears like tyranny, villainy, and oppression. I think it may boil down to whether you view the projects as something we're doing for us, or something we're doing specifically to let others reuse it. Free Software (GPL) vs. Open Source (MIT/BSD), as approaches, is probably another side of that coin. I don't think those positions are reconcilable, and thus I think one side has to "win" in order to resolve it. My impression is that on Commons the anti-URAA side has really lost but are still fighting a guerrilla war that has been mostly extinguished elsewhere. And I think it would be better for everyone concerned if those last remnants were finally stamped out (the admins over there show clear signs of burnout). Xover (talk) 07:46, 5 October 2022 (UTC)
I figured you would be the closer, I'm not canvassing or anything, I just rather feel sorry for whomever has to read it. I will fully admit I got frustrated with someone, and probably went a bit overboard in my explanation of just how completely off base what they were saying was, but... like I said there, w:not even wrong, the people at the USCO know more than us, so listening to them is a good idea. You might decide I became a bit of a jerk with my tone, though I suspect you will understand my annoyance.
Your comment about OpenLibrary literally made me laugh out loud, for several minutes, and I'm still chuckling. I agree wholeheartedly, it's almost impossible to try to edit (the UI is atrocious) and full of garbage data that bots added, such as entries for ISBNs for 'non-bibliographic items' associated with a book (think a store display stand that comes with enough copies to fill it, it has it's own ISBN for ordering) that are just examples of why you need a human brain in the loop, obvious is not obvious to a bot.
I fall very much on the "for others" side of the spectrum, otherwise there's really no point, it's just an endless exercise in navelgazing, with the occasional explosion of useless drama. If you are just creating your own little walled garden, that you "own", on a public wiki, eventually someone is going to show up and kick down the walls, and redecorate to make it better for other people, because that's just how wikis work. Jarnsax (talk) 09:20, 5 October 2022 (UTC)
You might want to read/watch w:Talk:Edict of government#Georgia v. Public.Resource.Org Inc. and the public policy argument if you're at all vague about that. The conversation here made me look at that article, and it's not great. Hopefully it will get enough discussion that the article can become better, and it's kinda 'relevant' to copyright stuff here, at least for edicts. The vast majority of news articles and such talking about the case totally missed the point, because the context is so obscure. Jarnsax (talk) 04:10, 7 October 2022 (UTC)
@Xover There is a poem from 1932 by w:Beatrice Ward, that was widely distributed on a broadside to advertise Perpetua... it was framed on the wall at a lot of 'dead-tree' printers. It's actually on a bronze plaque at the entrance to the US GPO... something about them elsewhere made me think of it.
I've always thought it expresses what the projects are "doing", what this whole thing is supposed to be about, a lot better than anything any of "us" have written.
It's File:This is a Printing Office.jpg.... and yes, Commons has the wrong copyright template (she published the poem in England), but meh, it's still PD. Jarnsax (talk) 00:33, 12 October 2022 (UTC)
This is what the GPO says about it: "Warde’s benedictory words continue to ring true, because they express what we know to be the timeless importance of reproducing and transmitting the written word in our society. Setting down the written word for all to see—whether by applying ink to paper or locking it digitally via public key infrastructure—preserves it, authenticates it, and makes it official, the real thing. This act in turn makes it possible to replicate and disseminate the written word unchanged, providing a common foundation for literacy, education, commerce, the arts, and—perhaps most important of all—the conduct of government in a free society." Jarnsax (talk) 00:42, 12 October 2022 (UTC)
@Jarnsax: Oooh! I'd never actually run across that before. That's pretty perfect for Wikisource, indeed.
But how do you figure it's PD? First published in the UK by an author that died in 1969 doesn't scream "PD" to me (rather it screams "URAA"). And what's the typographic copyright status of Perpetua? And the plaque isn't a two-dimensional mechanical reproduction so the limits of Bridgeman v. Corel need to be considered for the photo of the plaque. But at least the GPO logo is correctly licensed as {{PD-USGov}}. :) Xover (talk) 05:16, 12 October 2022 (UTC)
@Xover I figured you would think it was really cool (I obviously do, lol), and it actually does fit here specifically really well. I'd just never thought about it in this context; it was just random historical trivia floating in my head, actually from reading a history of the GPO (that's where I grabbed the quote, which is absolutely PD, it's their history of themselves with lots of nice photos).
Typefaces themselves aren't copyrightable in the US, just as a rule. See w:Intellectual property protection of typefaces#United States. Not that it would ever really come up here.
I haven't tried to prove it's PD. I just happen to be really sure it is, because I know what it was. :) Her, and it, are actually 'famous' in printing, completely apart from this poem. See UC Press at 125 Years: This Is a Printing Office for a whole article about it and her... at the very bottom, it talks about the exact circumstances.
The typeface companies didn't really have any interest in copyrighting those broadsides, which were usually just random nonsense (the lorem ipsum of ads and title pages, for display type) chosen to show off the typeface. Nobody would ever bother to copy them. They would mail them to every printer on the planet, basically, as advertising. Perpetua was just supposed to somehow "embody democratic ideals"... it ended up on monuments and stuff, somewhat ironically in this case. Jarnsax (talk) 05:45, 12 October 2022 (UTC)
It's not URAA... it was published in the UK, and we've had "bilateral relations" with them since the end of the 1800's.
w:Lanston Monotype was actually a US company (though they had a UK subsidiary, that she worked for) and sold the machines worldwide... I'm doubtful it wasn't also "published" here simultaneously, just by sending it to printers, and would have had to 'comply with the formalities', which like I said they likely didn't bother... they were ephemera and usually nonsense.
It's actually become a 'thing' for people setting up a new letterpress to print copies of it... if it's copyrighted, a lot of people have violated that copyright... including (I looked back to verify) the US Government in that 2016 history, since they printed the actual text in a PD work, and there's no 'by permission' statement or anything to warn people off.
I think it's ok. :) Jarnsax (talk) 06:46, 12 October 2022 (UTC)
This is actually a catalog, and too old to have Perpetua or the poem, but it's the "kind of thing"... https://archive.org/details/monotypespecimen00lansrich/ that they didn't bother to copyright... it would have been 'new stuff' to go along with your catalog, the first page talks about updating "your loose-leaf book." Jarnsax (talk) 07:22, 12 October 2022 (UTC)

Did you mean to leave a stray DIV tag in this? ShakespeareFan00 (talk) 17:35, 14 October 2022 (UTC)

@ShakespeareFan00: No. Fixed. Thanks. Xover (talk) 17:37, 14 October 2022 (UTC)

Template:Header broken

I heard that you were working on such templates, and thought you should know. See here. It seems to only effect court cases, so it may be some specific formatting. The header gets pushed to the bottom of the pages. It’s not the “Notes” line, either, see here. TE(æ)A,ea. (talk) 14:50, 15 October 2022 (UTC)

@TE(æ)A,ea.: Ouch. Yes, that looks like something I broke. I'll start looking for the cause. Thanks for the headsup! Xover (talk) 14:53, 15 October 2022 (UTC)
@TE(æ)A,ea.: I think I've reverted the change that caused this. Could you verify? Xover (talk) 15:19, 15 October 2022 (UTC)

leftoutdent

I still have works to move to standard templates, like for Oxford Men and Their Colleges Done and The Indian Biographical Dictionary (1915) and migrate so we have large numbers of pages in play here. I have just converted {{TIWW}} calling the css of {{Oxon}}. Maybe I need to point these all to a short biographical entry .css rather than having them tucked away as a subpage of one. — billinghurst sDrewth 00:33, 16 October 2022 (UTC)

and works under special:prefixindex/The Catholic Encyclopedia and its makers/ Done which is just a real parking lot at the minute. — billinghurst sDrewth 00:37, 16 October 2022 (UTC)
and yes, the migrations of these compilation works has been so we can be more nimble and holistic; similarly why I converted my of the ... link and ... lkpl templates for these biographical works. — billinghurst sDrewth 00:41, 16 October 2022 (UTC)
Probably need to project manage this somewhere and guessing somewhere in WS:Maintenance is a better space. — billinghurst sDrewth 00:54, 16 October 2022 (UTC)
And I remember the article type, it was for birth marriage and deaths notices, primarily, as The Times used that style for a while. — billinghurst sDrewth 03:15, 16 October 2022 (UTC)
And it is equally useful in book indices as seen at Economic Development in Denmark Before and During the World War/Index, so there will need to be some sort of replacement that can inherit and follow on. — billinghurst sDrewth 03:17, 16 October 2022 (UTC)
Template:Men-at-the-Bar, Template:PDBP and Template:Men of Kent and Kentishmen now have their references removed, which should fix all existing templates — billinghurst sDrewth 03:48, 16 October 2022 (UTC)
@Billinghurst: We could make a dummy template (e.g. {{biographical dictionaries style}}) that just attaches a /doc page (explaining it) and hosts a /styles.css that can be either included directly in other templates; or we could make the template itself non-dummy and spit out the TemplateStyles. Or, as mentioned, since this seems relatively widely used, we could add a "Layout 5" with these settings and then just use the dynamic layouts system for those pages. I do very little work in biographical dictionaries and such so I don't know what makes the most sense for them. Xover (talk) 07:17, 16 October 2022 (UTC)

Don't forget to change all the links from Featured Texts, or the Versions page will get featured in January. --EncycloPetey (talk) 17:07, 16 October 2022 (UTC)

Oh. *blush* D'oh! Thanks for keeping me from a rather spectacular embarrassment for new years! :) Xover (talk) 17:15, 16 October 2022 (UTC)
There. I think I got all of them. Thanks again for the save! Xover (talk) 17:26, 16 October 2022 (UTC)

Stripped tags.. (and other LintErrors)

https://en.wikisource.org/w/index.php?title=Special:LintErrors/stripped-tag&dir=prev&offset=1518208&exactmatch=1&namespace=104&titlecategorysearch=

I know you have some higher priority tasks on right now, but I'd reached a limit in respect of items I felt comfortable editing in relation to the remaining items here. Any chance you could take a look, and resolve these or make appropriate requests to other forums as needed?

Once these are resolved, most of the 'content' namespace Lint-Errors are 'Missing tags' which are simpler to resolve. Most of the mainspace one's I've handled so far seem to be unpaired italic syntax, typically italics stared on one line and ending on a subsequent one, (Aside: Is this something that can be fixed in a semi-automatic way, to avoid long term carpal tunnel, as it's quite repetitive? ) ShakespeareFan00 (talk) 18:37, 16 October 2022 (UTC)

@ShakespeareFan00: There were just ~100 of them in that category? I fixed those, some very few of them bot-fixable (and actually left over from a previous incomplete cleanup of mine). But in the main the lint errors aren't easily susceptible to automatic fixes. Xover (talk) 20:33, 17 October 2022 (UTC)
The reason there are so few of them, is that a while back I put in a very determined effort in an attempt to clear the backlog.
(In respect of 'content' namespaces.) If you wanted to proceed to fixing up LintErrors in other namespaces feel free, but I was "strongly advised" to focus on content ones.
The means the biggest backlog is now Missing tags (mostly unclosed/unpaired italics.) :) ShakespeareFan00 (talk) 22:40, 17 October 2022 (UTC)
@ShakespeareFan00: Trust me, I have noticed your herculean effort at reducing this backlog. Very much appreciated! But I was surprised it was the last hundred or so you gave up on, rather than the multiple many thousand first. :)
In regards of other namespaces, so long as the lint errors do not cause actual current problems, fixing them up are going to remain a very low priority. That's both due to simply having to prioritize what we expend scarce volunteer resources on, but also because once you get into, say, User: pages or discussion archives you're probably going to need explicit community consensus for some things (not all fixes are going to be perfect replacements, "changing history" is potentially controversial, and editing within another user's user space should generally be avoided). Xover (talk) 05:52, 18 October 2022 (UTC)
Well the last few were as I said ones I didn't feel comfortable editing. Typically I've ended up with a residue because :-
  1. I'm UK based, meaning that some works I didn't feel comfortable in editing because of copyright concerns (I.e it needed actual proofreading, as opposed to a semi automated technical repair.)
  2. I wasn't able to determine a logical place for the fix, or that the technical fix on a specific page, meant that an entire work needed to be re-examined (as you found).
  3. The work potentially contained translations, or character scripts my browser had issues in editing (such as RTL languages.)
  4. The work contained material of a 'sensitive' nature, and required a Wikisource expert to ensure appropriate reproduction, or transcription.
ShakespeareFan00 (talk) 07:43, 18 October 2022 (UTC)

2 missing pages? ShakespeareFan00 (talk) 16:23, 17 October 2022 (UTC)

Would you mind sanity checking something, the AutoRefs part ofUser:Inductiveload/save_load_actions seems to have stopped functioning as documented. Did you change or remove something that it expects to see, when you updated parts of the Mediawiki namespace recently? ShakespeareFan00 (talk) 18:43, 17 October 2022 (UTC)

@ShakespeareFan00: None of the changes I made recently should affect that, but one never knows of course. There have been recent changes to ProofreadPage and MediaWiki that could conceivably affect it though. I see you've gotten it to work by fiddling with Gadgets, so it probably wasn't anything related to those other factors. In any case, in order for me to look into it I'll need to be able to reproduce it, so you'll have to figure out which combination of Gadgets and user scripts it is that triggers the problem. Xover (talk) 20:43, 17 October 2022 (UTC)
My best guess at present is that I had Match and Split tools enabled. ShakespeareFan00 (talk) 22:08, 17 October 2022 (UTC)
@ShakespeareFan00: M&S tools are currently a bit buggy (they've bitrotted and haven't been updated yet), but they haven't had any changes recently, so I don't see why they would suddenly interfere with save/load actions. The only obvious candidate I can think of is the automatic header + footer script. But if you're running a lot of user scripts (like the regex toolkit, Pathoschild's templatescript, etc.) it could be any number of things. You'll need to reduce it down to a reproducible combination before we'll have a hope in heck of identifying the problem. Xover (talk) 05:57, 18 October 2022 (UTC)

prose class for DIV's

<div class="prose">
...
</div>

Do we still need that construction in Mainspace given that increasingly works are using dynamic layouts, and {{pagenum}}? ShakespeareFan00 (talk) 11:06, 18 October 2022 (UTC)

Usage of this in mainspace should be balanced up, so removal should be straightforward if you wanted to make a script for doing that. :)

ShakespeareFan00 (talk) 11:07, 18 October 2022 (UTC)

Script Errors - Extensive {{ts}} use may be the cause for some?

I am suspecting the issue with Lua script errors you are seeing are due to extensive use of {{ts}}, which is causing a Lua equivalent of the 'template transclusion size limit exceeded' warning/error? In respect of some of the pages the soloution is to convert the {{ts}} usage over to CSS classing defined in an Indexstyle. For Bradshaw, it may be necessary to split the work into the individual tables? ShakespeareFan00 (talk) 09:12, 19 October 2022 (UTC)

@ShakespeareFan00: There are a few of those left, yes, but most of them are fixed. I'm not sure what to do about the remaining ones, because those I looked at were not good candidates for IndexStyles / CSS classes (too much variability). There are some rather extreme optimizations I can make in Lua, but they are a lot of work and have usability drawbacks, so I've not pulled the trigger yet.
However, my biggest concern right now is the use of TOC-row and friends (the dotted ones, in particular), because they bloat the output insanely. There's not much scope for further optimizing these, and little alternative so long as they insist on reproducing the darned dot leaders. I have no good ideas currently for how to approach these (oh, and these are what's currently filling the main. cats.). Xover (talk) 09:18, 19 October 2022 (UTC)
Hmm.. There was a working draft (https://www.w3.org/TR/css-content-3/) for CSS to have leaders natively. So very long term the current code could be dispensed with entirely. I am personally not that attached to retaining dot leaders in TOC (but with retention of the option to reimplement them once CSS supports them in a cleaner way.), and in some more recent efforts of mine I've done the TOC as a simple table. ShakespeareFan00 (talk) 09:30, 19 October 2022 (UTC)

Formatting pages using DIV in header/footer vs classing ...

Not the way I would have done this personally:- Page:Joseph Story, Commentaries on the Constitution of the United States (1st ed, 1833, vol I).djvu/522

I will fix the lint errors, but I'd like a second view on just stripping things like this out, or someone moving it to a proper ns104 class? ShakespeareFan00 (talk) 13:17, 19 October 2022 (UTC)

@TE(æ)A,ea.: Was there a specific reason you used raw <div> … </div> instead of {{larger block/s}} and {{larger block/e}} there? Xover (talk) 14:10, 19 October 2022 (UTC)

Uploaded in good faith (because the PDF glitched out , but IA-upload decided to mis-align the text layer. Any chance of shifting it? I'll have a look and see how shifted it is. :( ShakespeareFan00 (talk) 15:28, 19 October 2022 (UTC)

Page scan scan for pp. text for pp.
Page:She_(1887)_(shehistoryofadve1887hagg).djvu/16 1 4

ShakespeareFan00 (talk) 15:38, 19 October 2022 (UTC)

@ShakespeareFan00: Is this still needed if WS:LAB#She is fulfilled? Xover (talk) 17:58, 19 October 2022 (UTC)
The request there is related to why I uploaded a generated djvu.. Not sure why IA-upload misaligns the text layer though :( ShakespeareFan00 (talk) 18:16, 19 October 2022 (UTC)
@ShakespeareFan00: If you look at the processed scan images (the _jp2.zip file), you'll see several misscanned pages (duplicate page images, where the duplicates are uncropped etc.). The IA tags these pages in its metadata, but doesn't remove them. When IA upload processes such files it adds OCR sequentially as if these were not present, and thus gets misaligned by an equal number of pages. In addition, there is a combination of bugs—DjVuLibre accepts invalid OCR input, and then chokes when trying to output that same data; combined with a similar accumulating-offset bug in MediaWiki-DeJaVu—that also shows up as offset text layer. In either case, the only fix is to regenerate the file from scans using tools that detect or avoid both bugs (which my tools do, hence why I tend to always regenerate DjVu files from source scans rather than try to repair them in place). Xover (talk) 19:12, 19 October 2022 (UTC)

Something broke the UI at proofread page.

See .

But it seems currently to only manifest on pages that have table based markup? ShakespeareFan00 (talk) 08:39, 19 October 2022 (UTC)

@ShakespeareFan00: Hmm. I think this maybe is a PRP change.
@Samwilson: See above screenshot of Page:The London Gazette 28314.pdf/106 (I just reproduced in latest Safari on macOS). I've seen, but not really registered, the same phenomenon on pages where the scan image is not the typical portrait rectangle. There's also a report at WS:S#New ProofreadPage dynamic (viewport) height in the Page namespace that may be related (may also be user error and local CSS issues). I haven't dug into it, but gut feeling is that this may be related to the new sizing / layout approach for PRP Page: editor. Xover (talk) 08:48, 19 October 2022 (UTC)
@ShakespeareFan00, @Xover: That looks bad! Is it happening on other browsers? I'm not able to replicate. Is it the same with syntax highlighting on or off? Sam Wilson 08:53, 19 October 2022 (UTC)
@Samwilson: I reproduce in Safari, Firefox, and Chrome (all recentish), and both logged in and logged out. I suspect screen size may be a factor. These are all on a laptop, so small vertical size (I normally have to scroll vertically a lot to navigate the editor). Xover (talk) 09:00, 19 October 2022 (UTC)
I am using a Firefox Nightly build ( which is typically a few branches ahead of the main Firefox release.). Disabling/Enabling Syntax Highlighting does not change the botched UI rendering. I also disabled certain Gadgets and Scripts, to test for an interaction issue. The (IndexStyles) CSS for the Index is here : Index:The London Gazette 28314.pdf/styles.css but as that should be isolated from the UI, I am convinced it should not be affecting the UI. ShakespeareFan00 (talk) 09:09, 19 October 2022 (UTC)
@Xover, @ShakespeareFan00: Is this the same bug as phab:T321344? And I'm now able to replicate it... it's weird, I was testing the other day with small screen sizes etc. but couldn't make it do it, but now it's right there. Feels like a recent change and I'd still be using cached CSS. Sam Wilson 03:10, 21 October 2022 (UTC)
@Samwilson: Could be. But without really noticing noticing, I imagine I've seen this for much longer. Which is why I was connecting it with the change you made a while back that was supposed to peg the editor UI height to the image height (happened around the time of the OSD work I think). But, as mentioned, I've only vaguely noticed it, and usually when Commons was flaking out and not loading images, so there was a completely different issue that I was focussed on. The symptoms in the screenshot in T321344 certainly look like the same thing. Xover (talk) 06:09, 21 October 2022 (UTC)

Obadiah Poundage Letter

Hi Xover. You left a source needed template on the Obadiah Poundage page. I don't usually edit WikiSource so I don't know how to set things up. Here is a source for the letter: [1]. There will be others if you do a Google search for "Obadiah Poundage Letter". If you could pop that onto the page in the appropriate format that would be very useful. Cheers! SilkTork (talk) 01:00, 20 October 2022 (UTC)

@SilkTork: The problem is that, analogous to WP:V, we need an actual scan of the source in order to verify that the transcription is correct; and, analogous to WP:RS, we need that source to be something actually published by a publisher. The link you give is essentially "some dude on the Internet" for these purposes.
That being said, the {{no source}} tag is currently not really a pressing issue since we have about a gazillion old texts with the same problem. There's pretty much zero chance anyone will come along and delete it on that basis. On a really long time-scale it might end up there (community expectations for quality have been steadily increasing), but for the foreseeable future any actual action regarding it will be someone doing the work to find a scan and migrate it to that.
So… If you want to improve its quality / avoid the ugly maint tag / prevent it ever getting nuked by deletionist cleaner-uppers, the most effective effort would be tracking down a scan of the original publication in which it appeared. You can just stuff the link into the notes parameter of the {{header}} template. Extra credits for figuring out how to upload the scan in a suitable format (DjVu, or at a pinch PDF), creating an Index: page for it, migrating the text to Page:-namespace pages, and then transcluding those to Obadiah Poundage. But those bits require knowledge that's Wikisource-specific and even experienced Wikimedians usually need help figuring all that stuff out. The community is usually happy to help with all that—requests go on WS:S/H—if you want to go the extra mile. But the key is linking to a scan of the original publication (the London Chronicle for November 4, 1760, I take it). Xover (talk) 07:45, 20 October 2022 (UTC)
@SilkTork It's in The London Chronicle, vol. 8, pp. 436–437. A scan is available here; I have added this link to the header notes. The text currently on Wikisource appears to omit large portions of the letter. Shells-shells (talk) 23:11, 21 October 2022 (UTC)

Lint errors - Missing tags (ns0)

https://en.wikisource.org/wiki/Special:LintErrors/missing-end-tag?namespace=0&titlecategorysearch=&exactmatch=1

I estimate I'll run out of low hanging items I feel comfortable making repairs on in the next few days. Any chance you could give the residuals a glance? In some instance I haven't corrected because what i was seeing was an OCR dump, and it would be better to work from a scan for those works (and in many places I've added link to potential external scans to works I did corrections on)

I detailed my reasons for avoiding some pages previously. ShakespeareFan00 (talk) 19:46, 21 October 2022 (UTC)

PDF links

While diff is definitely removing Fugly with a capital Fug (though TBH I did quite like seeing "this is a PDF" for links, that selector is a crime), you may still like what I have for adding "IA" and "Worldcat" icons to relevant links:

a.external[href^="https://archive.org"] {
    background: url("//upload.wikimedia.org/wikipedia/commons/thumb/1/13/Internet_Archive_7x8px.svg/7px-Internet_Archive_7x8px.svg.png") no-repeat right;
    padding-right: 10px;
}

a.external[href^="https://www.worldcat.org"],
a.external[href^="https://worldcat.org"]  {
    background: url("//upload.wikimedia.org/wikipedia/commons/thumb/7/76/Worldcat_logo_16px.png/12px-Worldcat_logo_16px.png") no-repeat right;
    padding-right: 14px;
}

phab:F35611359 for an idea of how that works out. Inductiveloadtalk/contribs 14:48, 22 October 2022 (UTC)

Nifty. I wonder if there's a market for a Gadget to add those. Or possibly to teach {{esl}} about it. But I'm thinking this kind of thing is best as a hover action ala. Popups. Xover (talk) 16:10, 22 October 2022 (UTC)
I actually do have a pretty shonky script for IA links which takes you to the IA-Upload tool: User:Inductiveload/IaUploadPopup. I was gently wimbling towards adding it to popups as it doesn't need to do much. More extensive IA metadata would be pretty cool, but CORS might mean a need to proxy some of the cross-domain IA API calls. Inductiveloadtalk/contribs 18:18, 22 October 2022 (UTC)

Paired italics over newline..

I've found this works reasonably well with the replace fucntion added by Inductiveload's maintain.js script:-

Search string : \'\'((.)*)\n((.)*)\'\'
Replacement rule : ''$1 $3''

It's amazing how some tools become common place in a workflow that when you have to disable them you notice. :) ShakespeareFan00 (talk) 11:36, 23 October 2022 (UTC)

@ShakespeareFan00: Yes, that regex should work reasonably well (modulo the false positives). PS. Those inner parenthesis are pointless; you can drop them and just use ''$1 $2'' for the replacement. \'\'(.*)\n(.*)\'\'. I'm also not sure you need to escape the single quotes there. I haven't tested in IL's tool, but in this particular context they shouldn't need to be escaped. ''(.*)\n(.*)'' Xover (talk) 11:42, 23 October 2022 (UTC)

Wikisource:Administrators' noticeboard

@Xover: Hello, could you please review my page protection request on the admin noticeboard? Thanks, Matr1x-101 (talk) 21:12, 25 October 2022 (UTC)

Small-caps template change may affect the page space

Hi Xover, I notice a recent change that when using small-caps on a page e.g. Page:EB1911 - Volume 24.djvu/575 (via the {{1911link}} template which uses small-caps i.e. the word "Impressment" or "see Navy") that the text no longer displays in small-caps. When transcluded though, the display is in small-caps. This may be related to your recent change to the {{Small-caps}} template. Until recently the text would always display in small-caps in the Page space. DivermanAU (talk) 05:34, 28 October 2022 (UTC)

@DivermanAU: Not sure why it would display correctly when transcluded. But the lack of small-caps in Page: namespace was due to a bug in {{EB1911 footer initials}} that combined with a property of MediaWiki's style implementation to cause the small-caps styling to go missing. The problem would have occurred on any page where {{EB1911 footer initials}} was used before {{EB1911 article link}} (or any other use of {{sc}}), because MediaWiki deduplicates such style specifications (only the first use outputs the style specification, subsequent uses just refer back to the first) so when the first is broken the subsequent ones get no styling. The bug in {{EB1911 footer initials}} was that it put the formatting template (in this case {{sc}} inside the linking wikimarkup, where it is generally not permitted (inside link markup you should generally just have plain text; with some exceptions that do work). The fix was simply to wrap the entire link in {{sc}}. Xover (talk) 10:22, 28 October 2022 (UTC)
Thanks for explanation and the fix! DivermanAU (talk) 19:42, 28 October 2022 (UTC)

{{ref}} and {{note}}

Do these predate the Cite extension, and if so are these still needed, when most works can be converted to use ref tags anyway? ShakespeareFan00 (talk) 08:05, 2 November 2022 (UTC)

@ShakespeareFan00: Well, the current implementation may predate Cite, but in general, no, they're just wrappers around the functionality like many others (including {{smallrefs}}). Whether they should still be used is a better question. While they can be converted to Cite, mostly, they have some different properties, such as being uncoupled from Cite's automatic generation of fragment identifiers. For certain situations this may be necessary or desirable (cf. the authority reference / endnote stuff, that challenges the model Cite depends on). That being said, I haven't really looked at these in detail so it may be I'm missing something and/or would have a more specific opinion if I did. Xover (talk) 09:04, 2 November 2022 (UTC)
Ref / Note do effectively pre-date good <ref>/ here, not so muchc by time, but effective usability. Or maybe it is more accurate to say that the development of the mw:extension:Cite, and sustainable rollout of many extensions, was happening at similar time and at that time it was not suitable for our use. We do have a few realistic uses for it, though IMO they are few and far between. We should be using cite functionality where possible, and only use ref/note where we have a demonstrated good reason not to do so. — billinghurst sDrewth 21:46, 3 November 2022 (UTC)
^---- This. Xover (talk) 07:07, 4 November 2022 (UTC)

AF

Often you will find that added_lines is more effective that the new wikitext, as that is checking the + prepended lines rather than the whole presented page, especially where we are having troll behaviour. It usually gets fewer false positives. new_wikitext is most useful for where you are want to see the p+ve something in the produced page, or akin stop the removal of something on the page — billinghurst sDrewth 21:39, 3 November 2022 (UTC)

Ah. Thank you! Xover (talk) 07:05, 4 November 2022 (UTC)

Deprecated table attributes..

Namely bgcolor:- https://public.paws.wmcloud.org/User:ShakespeareFan00/bgcolor_ns0.txt

Any chance of doing an automated conversion along the same approach I had been doing manually? ShakespeareFan00 (talk) 14:45, 4 November 2022 (UTC)

FI and FIS ignoring imgwidth=param...

There's something screwed up with {{FI}} and {{FIS}}

{{FIS|file=Fig 29A. A complete course in dressmaking, (Vol. 2).png|caption={{smaller|''Fig. 29A. In laying a pattern on the goods, place the largest pieces on first and then fit in the small pieces''}}|float=left|width=100%|imgwidth=500px}}

{{FI|file=Fig 29A. A complete course in dressmaking, (Vol. 2).png|caption={{smaller|''Fig. 29A. In laying a pattern on the goods, place the largest pieces on first and then fit in the small pieces''}}|imgwidth=500px}}

Fig. 29A. In laying a pattern on the goods, place the largest pieces on first and then fit in the small pieces

Fig. 29A. In laying a pattern on the goods, place the largest pieces on first and then fit in the small pieces

In that it is expanding the image to fill the entire space, and seemingly ignoring the outright imgwidth I'm actually setting independently.

What should happen is that the container width is 100%, but the image is centered at the given img-width. with {{FI}} this reduced size image should be centred, with {{FIS}} it should be floated.

In my use case, Page:A complete course in dressmaking, (Vol. 2, Aprons and House Dresses) (IA completecoursein02cono).pdf/31 I actually need float=center, to do an FI style layout inline, which isn't currently supported other than by convoluted hacks. What float=center should be implented as is an inline way to center the image (text-align:center) possibly with the outer span being set to (display:block; width;100%) being 100% of the parent container (in the same way as I fake a blank DIV as a span in {{pbri}} ShakespeareFan00 (talk) 14:39, 5 November 2022 (UTC)

Also in looking into this I found that the underlying module is making assumptions about the widths being % or px based.
CSS supports many many more units than that, and there was an effort by others here to try and move away from fixed px based layouts.
Would it be possible for you to look into updating the relevant Module so it can properly support a fuller range of CSS style units, and for {{FIS}} to have a a float=center like behaviour if it was given as a specfic template argument?
Thanks.
ShakespeareFan00 (talk) 16:44, 5 November 2022 (UTC)
@ShakespeareFan00: Going by memory: imgwidth doesn't set the display width, but the size of the thumbnail to fetch from Commons. It's mainly intended for very large images so that you don't fetch ~24MB of image data and then scale it down to 200x200 pixels, not least because it makes ebook exports humongous. That's also why the unit is px: thumbnail size is always specified in pixels.
And we generally don't want to expose raw CSS in template arguments (reasoning on request), so even for display oriented parameters it is likely a hard-coded unit would be deliberate. This is orthogonal to the issue of avoid specifying things in pixels in order to make sizes adaptable. Xover (talk) 16:59, 5 November 2022 (UTC)
Thank you.. So currently , em-based image container sizing isn't possible, because mediawiki doesn't work that way. (HTML needs an instrinsci size (width/height), but does allow the img element to be rescaled in a browser with CSS. Mediawiki doesn't currently have a way to permit that rescaled img behaviour)
Maybe the Module should give a gentle warning, about the issue...
Longer term it would be nice if there was a way to set em based imgwidth's for {{FIS}}, so that the kind of layouts that need centered captioned images in a run of text, can be implemented without the kind of convoluted coding some works like St Nicholas have at present.
What I think the user here was trying to do on the relevant page was set up the image region to be 25em wide (centered), within a container that spanned 100% of the parent (page) i.e trying to do an inline FI, without getting a break in the content.
I found this problem, due to the fact that I'd been using some custom CSS, to inline H3 headings at the start of certain paragraphs. Without tweaking {{FI}} to {{FIS}}, In places Mediawiki wasn't doing P wrapping in the right places, and hence the headings weren't being displayed correctly.
Can I ask you to dig a little deeper, as I think this should be strightforawd to solve, if not a little time-consuming.
ShakespeareFan00 (talk) 17:41, 5 November 2022 (UTC)
@ShakespeareFan00 What are you trying to do??? At least can you add the page link please? — ineuw (talk) 19:56, 5 November 2022 (UTC) — ineuw (talk) 19:59, 5 November 2022 (UTC)
As a frequent user of the FI/FIS template, I recreated your page in my sandbox, and repeated under it with my FI template layout of the same. The div width of "34em" is the read view width of my page namespace, and matches very closely the main namespace layouts 2 and 4. I don't understand what you need to float? to center? I understand that you are most interested in the magic of programming, but the end result of our work may be printed. Before you consider such complexity, print the file to see if these complex layouts don't affect it. This seems too complex for what is needed to be done. — ineuw (talk) 20:47, 5 November 2022 (UTC)
@ShakespeareFan00, @Ineuw: I must be daft or something. What is it about that page that requires more than a plain image and a centred caption, or possibly a centred block? I see it uses FI, which is convenient shorthand for that, but otherwise unremarkable. What is the actual problem here? Xover (talk) 21:17, 5 November 2022 (UTC)
  1. Using {{FI}} within a paragraph, causes the text following that FI to not be wrapped (as it's DIV based), which is breaking some other formatting this work uses (such as the subheadings). (I was generating subheadings using {{ph/c}} so that I could set one consistent style for them, to put this as in the original meant they are inlined.) {{FIS}} cannot be directly centered at present, so it's not possible to get an FI like (centered) behaviour inline currently.
  2. Neither Mediawiki image syntax or an HTML 5 img-tag accepts em based sizing anyway. I am thinking what the other contributor wanted here was an img that was 25em wide on the page. However to get an FI like centered behaviour, with FIS (including the caption functionality) the width has to be set-to 100% to fake a block like behavior with a SPAN. I'd originally put these images in using a specific template for the work, but other contributors decided to use FI/FIS. I also note that in places elswehere {{img float}} was used, meaning that the caption sizing is now inconsistent.
I'm going to withdraw from working on this until someone else is prepared to force ONE consistent approach, which was partly why I had setp up the CSS in IndexStyles that I had, only for other contributors to do their own thing.ShakespeareFan00 (talk) 00:00, 6 November 2022 (UTC)
@ShakespeareFan00: I see no discussion among contributors on Index talk:A complete course in dressmaking, (Vol. 2, Aprons and House Dresses) (IA completecoursein02cono).pdf to outline the problem and agree on an approach. Why do you expect the approach to be consistent when no attempt at coordination has been made? If un-coordinated contributions create a problem then raise it on a suitable talk page and figure out a common approach. But be prepared for 1) explaining the problem you see so that others can understand it, and 2) the other contributors may not agree with your assessment of the severity of the problem or that any given solution is the best or necessary. You have an eye for spotting especially consistency problems, which not all contributors do, but not all consistency problems must be reconciled or at least not at any cost. Xover (talk) 05:57, 6 November 2022 (UTC)
I have however, commented out the IndexStyles I set, until someone is prepared to demonstrate how it SHOULD be done.ShakespeareFan00 (talk) 00:06, 6 November 2022 (UTC)
See Index:A_complete_course_in_dressmaking,_(Vol._2,_Aprons_and_House_Dresses)_(IA_completecoursein02cono).pdf/styles.cssShakespeareFan00 (talk) 00:25, 6 November 2022 (UTC)
I'm also now annoyed:- Page:A_complete_course_in_dressmaking,_(Vol._1,_Introduction)_(IA_completecoursein01cono).pdf/116 I was attempting to set up ONE common style for the captioning, Why then does the one supplied to {{FI}} and the one given to a {{tl|img_float differ so radically? ShakespeareFan00 (talk) 00:57, 6 November 2022 (UTC)
Enough, is enough. How do I have ONE consistent caption size? Thanks. ShakespeareFan00 (talk) 01:09, 6 November 2022 (UTC)
{{FIS}} floats everywhere, provided you create a float-center snippet see the {{ts}} snippets "fll" and "flr" for float left and float right. I can help you better if you add page link to your page of concern and point out what you want/don't want. I never used the CSS feature of the index page because it didn't exist at the time I inserted many images, but George Orwell III and User:Inductiveload helped me with these issues in major ways. — ineuw (talk) 03:01, 6 November 2022 (UTC)
PS: To continue, what do you mean by "consistent caption size"??? — ineuw (talk) 03:06, 6 November 2022 (UTC)
@ShakespeareFan00: Have you tried .img-floatleft, .img-floatright {font-size: 94%}? Xover (talk) 06:15, 6 November 2022 (UTC)
.img-floatleft, .img-floatright {font-size:83%; font-style:italic;} actually, but I'll try that on Volume 1. The aim of moving things to Index stlyes, so that I don't have to keep doing the back and forth taking in or taking out templates in cap/caption paramaters. It changes in ONE place, and ONE place only. ShakespeareFan00 (talk) 11:56, 6 November 2022 (UTC)


I've had enough of going round in circles on this. Please can someone revert all my changes,and impose again with ONE style effectively imposed and DOCUMENTED. That why I can be sure I (and other contributors) are actually following something consistently. All I wanted was ONE 'image' and caption approach across the entire set of volumes. In my original proofreading I thought had actually set that up, but it seems unreasonable of me to think that other contributors would at least have the sense to ask why it was set up that way, if it wasn't directly documented.
I've already attempted to explain this twice, but having linked more than a few pages previously, perhaps I am not competent to continue because I seem to breaking more stuff than I'm actually fixing. That's why I'd like someone else to set ONE caption and image approach, using ONE template/style rule across the entire work ( and other volumes in this set of books on Dressmaking.)
  1. The text shown for the figures on A_Complete_Course_in_Dressmaking/Lesson_2/Simple_finishes_for_edges is not consistent, i.e different figures/image have different sizes for the cpations. This is also true of A_Complete_Course_in_Dressmaking/Lesson_2/Patterns. I'd had attempted to set up some CSS (currently disabled) to handle this, which I would like to use across the entire work.
  2. I had attempted in A_Complete_Course_in_Dressmaking/Lesson_1/Seams_and_their_uses to give an example to set up the {{imgfloat}} (which can be placed in the center) with a tclass (set on the caption span in img_float) so that it was consistent by applying a tclass to the caption of the (which can be centered). That approach did not work because in a number of other works {{img float}} havd DIV based captions, and I had not checked for that specfic use case.
(Aside: Mediawiki doesn't use em based sizing for image related stuff. I'd updated the documentation for {{FI}} and {{FIS}} to note that.)
However that means that it is not (currently) possible to use an em based imgwidth for {{FIS}}, which means that it gets ignored, or set internally to some nonsensical value, meaning that the width value gets used instead. As for an inline FIS which placed to mimic an FI (inline but in the center), will have it's width set to 100% (ie across the whole page), it is this value that is used to setup the size of image, leading to the overly expanded image.
At this point I am not sure how to continue, because of the various differing approaches used, this 2 volumes of this work which were nominally are a complete mess of conflicting approach. I'd rather not make this was by another attempt at fixing it. That's why I need someone else to look at the various attempted approaches, choose ONE approach ( such as {{tl|imgfloat) for example, setting up any relevant CSS as appropriate. I had tried to do this is in volume by standardizing on {{imgfloat}} in volume 1, but that didn't work because my approach in relation to how to class the caption made too many assumptions.
::@Xover:, Can you attempt to summarize from this discussion what you think the actual issues are? There's more than one thing interacting here, and I now have a headache. :( ShakespeareFan00 (talk) 08:31, 6 November 2022 (UTC)
I've now in respect of Volume 1 reverted my attempts. It doesn't matter if it was the right approach or not, because it seems other contributors DO need to be DIRECTLY and EXLPICITLY told what approach to use, instead of the unreasonable expectation that they would pick up that a specfic contributor, had standardised things in a specfic way if they had encountered the same template's or approach, across many pages.(Aside:Other contributors have criticised me in the past for ignoring a standardisation 'they' had imposed on a work, even though there wasn't a style guide as such, stating that they reasonably expected me to have some common sense.) (Sigh) ShakespeareFan00 (talk) 08:52, 6 November 2022 (UTC)
And after a lot of effort, I think I've got a standarisation (apart form volume 9 which someone had already completed.), I've setup the styles and page header/footers for the remaining volumes and conformed any pages already proofread. This is now hopefully fixed ( apart from Vol. 9), but I'd appreciate feedback if you find anything amiss.

Img templates...

Okay deep breath everyone.

Xover, at some future date , would it be possible for someone to carefully review the various templates.modules Wikisource has for visual media handling (i.e images) in works, with a view to centralising the functionality into a single Module/Template? The goal would be to have one consistent way of doing it, that can be understood without needing to understand the quirks of each specific one as at present. ShakespeareFan00 (talk) 12:34, 6 November 2022 (UTC)

@ShakespeareFan00 I clearly recall that this was done years ago, {{FIS}} was the necessary result of {{FI}}. The text surrounding FIS flows unbroken as you can see on this page and hundreds of other images, left or right. Please look at the template fields used. — ineuw (talk) 17:07, 6 November 2022 (UTC)
So how do I set up an inline FI then? {{FIS}} has to be left or right. There isn't a mechanism for centering it currently. A now deleted template that existed, set this up in a somewhat hacky way by setting margins IIRC, which were substituted into FI/FIS as needed. I don't recall what it was called though.

ShakespeareFan00 (talk) 17:40, 6 November 2022 (UTC)

Look at this page please. and you can change float=center — ineuw (talk) 17:47, 6 November 2022 (UTC)
I also recall, a previous discussion- :: Wikisource:Scriptorium/Help/Archives/2019#Page:Tycho_brahe.djvu/111 from 2019. I seem to recall I eventually coded the suggested approach there into a subst template. ShakespeareFan00 (talk) 17:49, 6 November 2022 (UTC)
I was using this: - https://en.wikisource.org/w/index.php?title=Index:A_complete_course_in_dressmaking,_(Vol._1,_Introduction)_(IA_completecoursein01cono).pdf/styles.css&oldid=12725584 on which i had been using on Volume 2 onward. ShakespeareFan00 (talk) 17:57, 6 November 2022 (UTC)
I can't help that you are overthinking issues that have been resolved long ago.
Template Creation date history:
{{img float}} = 2010-08-13T11:56:40 by Inductiveload.
{{FreedImg}} = 2013-06-10T17:10:06‎ by Theornamentalist and completed by GOIII.
{{FreedImg/span}} = 2013-10-28T13:22:31‎ by Alex brollo and completed by GOIII. — ineuw (talk) 18:08, 6 November 2022 (UTC)
I've now attempted to explain this all 3 times, and it's STILL NOT apparently getting through, with various other issues getting conflated. There's a deficit of understanding somewhere. It's completely pointless bashing my head against a 'wall' any further.

ShakespeareFan00 (talk) 18:29, 6 November 2022 (UTC)

Apologies, I must be having a bad day. The deficit of understanding is on my part, by the way.ShakespeareFan00 (talk)

An Apology

I owe you an apology, I was trying to do something complicated, that required me to do a little bit more reading of the documentaion in depth. That is however no reason for me to react in the way I did elsewhere (now struck through). I hope that you can accept my views were out of frustration at my own lack of ability, and that I can continue contributing in a more positive manner. ShakespeareFan00 (talk) 20:04, 6 November 2022 (UTC)

@ShakespeareFan00: Don't worry about it. We all get frustrated from time to time, and by not only noticing that that had happened but even apologising for it, you are way ahead of most people on this site (myself included). But do let me tack on an unsolicited piece of advice: when you start getting that frustrated it is usually better to drop whatever the issue is, step back for an hour or a day, and then come back to it when you've regained your equilibrium. Nothing here is important enough to subject oneself to frustration and aggravation. If you are able to pursue an intractable problem without excessive frustration then great, otherwise it's usually better to go find some activity that does not cause you such grief. Contributing here is supposed to be fun, remember? :) Xover (talk) 20:52, 6 November 2022 (UTC)

Index page template changes..

Hi.. Something changed recently which means I get a "An unexpected error was detected. Please report this error to phabricator.wikimedia.org with logs from the console." when I try to do a pagelist preview with the relevant Button in the UI.

Which logs do you need to further diagnose, and how would I obtain them? ShakespeareFan00 (talk) 19:50, 6 November 2022 (UTC)

@ShakespeareFan00: On which page are you getting that error? ("all Index: pages" is one possible answer)
I'm guessing this is an error message from Soda's new visual pagelist editor, and if so it is probable that it is choking on some change I have made recently. If that's the case then it's something I need to fix and Phabricator won't be any help. Xover (talk) 20:55, 6 November 2022 (UTC)
@ShakespeareFan00: I did some digging and found a workaround. It should work again now. Xover (talk) 06:52, 7 November 2022 (UTC)

A Match and split that didn't quite work out. Any chance you could split this up whilst I take a time out? I'm not working at peak effectiveness at the moment. ShakespeareFan00 (talk) 20:57, 6 November 2022 (UTC)

@ShakespeareFan00: It looks like you figured this one out? Xover (talk) 06:50, 7 November 2022 (UTC)
Yep. Figured this out. This isn't just a move to pagenamespace, it's a Phe-Bot assisted M&S move to page namespace :)... (not that anyone will still get that meme). ShakespeareFan00 (talk) 11:00, 7 November 2022 (UTC)

Split column tables over many pages..

See Index:Algonquian Indian names of places in Northern Canada.djvu , The table in this have multiple columns across different pages.

This would be impossible to layout using mediawiki table syntax. Do we have a module that could render such tables, or is it a case of manually combining the data from 2 pages to a single one manually in Page ns? ShakespeareFan00 (talk) 00:22, 6 November 2022 (UTC)

When you need a table spanning many pages, use a single table for each page. I edited two pages and transcluded them here to show you that separate tables are OK and they look continuous on the main page. (The table and FI/FIS guy). — ineuw (talk) 02:41, 6 November 2022 (UTC)
@Ineuw: Just to check: you do realise this is a 6-column table that's just split vertically between recto and verso sheets, right? I ask because while the rendering here is a reasonable approximation, it is not displaying as a continuous table for me as I thought was your intent from your above comment. Xover (talk) 05:39, 6 November 2022 (UTC)
ShakespeareFan00: If you want the table to be continuous (as in the original), you can use section-by-section transclusion, but that can get complicated (see here for an example of something similar). TE(æ)A,ea. (talk) 02:59, 6 November 2022 (UTC)
@ShakespeareFan00: As you say, there's no actually good solution for this since either HTML table markup nor MediaWiki's wrapper for it actually supports this kind of thing. Both Ineuw and TE(æ)A,ea.'s solutions are reasonable approximations given imperfect tools. For myself, I think I would have just combined the two table halves on a single Page: page as a pragmatic compromise. The page-by-page fidelity in the Page: namespace is less important than getting a readable and coherent result to our readers. Xover (talk) 05:35, 6 November 2022 (UTC)
@Xover I chose the wrong consecutive pages because the 2nd page has a paragraph spacing above the title. I will proofread two consecutive pages containing the same table later today. I have done this many times, and checked the results in the main namespace page. — ineuw (talk) 16:51, 6 November 2022 (UTC)
Corrected the tables. Because of the odd and even tables, each set of two consecutive pages in the main namespace will look as in User:Ineuw/notes5.

I need to know how it's wanted to be be laid out in the main namespace? I can create two separate tables, one is the two columns appearing as a continuous page, and the other is a four column table? — ineuw (talk) 10:43, 7 November 2022 (UTC)

The layout of this table is complicated by the recto/verso split.
In mainspace what is desirable is a table laid out as
{{(!}}
{{%!}}Indian Name.!!Meaning.!!Present Name.!!Lat.!!Long.!!Remarks.
{{%-}}1. Aithinetōs′ekwǎn Saka′higan{{!!}}Indian Elbow Lake{{!!}}1. Elbow Lake{{!!}}54° 50′{{!!}}100° 50′{{!!}}Ithenootosequan (David Thompson)
{{%-}}...
{{!)}}
Indian Name. Meaning. Present Name. Lat. Long. Remarks.
1. Aithinetōs′ekwǎn Saka′higan Indian Elbow Lake 1. Elbow Lake 54° 50′ 100° 50′ Ithenootosequan (David Thompson)
...
But that would need some kind of module/scripted support as HTML?mediawiki does not do it natively.

ShakespeareFan00 (talk) 11:07, 7 November 2022 (UTC)

In case you are wondering {{%!}} {{%-}} were some shorthands for a starting a table-row without necessarily needing to worry about whitespace handling quirks in templates. (There are also a potential pattern to grep/match for, if a module is used to build an {{rvtable}} Module. Thanks.ShakespeareFan00 (talk) 11:07, 7 November 2022 (UTC)
Thanks, this is an important addition and a reminder I must study {{%-}} further. I have come across this in the {{ts}} module listing the codes. But I may have another solution. — ineuw (talk) 23:01, 7 November 2022 (UTC)
I completed the first eight alternate tables. On my 24" display it aligns accurately. see the results. I will start the rest tomorrow. Consistency was the key. However the text needs proofreading.
I also don't know what the etiquette is, but I think we should continue the discussion elsewhere unless, Xover doesn't mind? — ineuw (talk) 06:24, 9 November 2022 (UTC)
@Ineuw, @ShakespeareFan00: I don't mind. Mi talk page es su talk page. :) Xover (talk) 08:48, 9 November 2022 (UTC)

I've done a match and split in good faith. But I have concerns the page scans are a little poor in places. Do you know of any better scans? ShakespeareFan00 (talk) 19:13, 7 November 2022 (UTC)

Also - Page:Alexander and Dindimus (Skeat 1878).djvu/45 Looks nice in all layouts except Layout 1, where it's was unreadable. I've gone back to the pre split version in ns0, until someone else can suggest a layout that's stable. (Sigh) :(

ShakespeareFan00 (talk) 20:13, 7 November 2022 (UTC)

@ShakespeareFan00: Have you tried moving the {{sidenotes begin}} to the header? Xover (talk) 21:08, 7 November 2022 (UTC)
Yes I've tried that. It's still looks unreadable in Layout 1. I can set a default layout 2 for this, but I still have concerns that there should be cleaner way to handle this. It displays fine in page ns.
https://en.wikisource.org/w/index.php?title=Alexander_and_Dindimus/Text&oldid=12729274 if you want to test.
It's a convoluted as it is, because of the split-language on the page issue. the pages tag doesn't have an includesection portion only an onlysection, which means I have to set up for 45 or so pages manually :( sigh... ShakespeareFan00 (talk) 21:25, 7 November 2022 (UTC)
@ShakespeareFan00: Setting a default layout is exactly to be able to specify that this text needs this layout to function well. That it would also be nice to figure out what's going on with the sidenotes on that page in Layout 1 is a different matter. Xover (talk) 06:34, 8 November 2022 (UTC)

Modularise {{img float}} and merge with FI/FIS etc?

The problem is here.. Page:A complete course in dressmaking, (Vol. 4, Blouses) (IA completecoursein04cono).pdf/16 It would be nice to have the same file=missing behaviour that you added to {{FI}}.

I'm using img float here rather than {{FIS}} because {{FIS}} has to in it's current implementation be either left or right. It doesn't have the img-center fakery that {{img float}} does, due to different coding.

Of course the other solution is for the relevant module supporting {{FIS}} to be updated to permit a float=center style behaviour. ShakespeareFan00 (talk) 20:18, 8 November 2022 (UTC)

I just added "float:center;" as "flc" to the shortcuts. — ineuw (talk) 22:05, 9 November 2022 (UTC)
Where was this added, sorry? float:center doesn't exist in CSS. That's why has the additional class to do it. ShakespeareFan00 (talk) 22:29, 9 November 2022 (UTC)
My mistake, I will change it to "fln" "float:none;" is what you might be looking for. — ineuw (talk) 20:11, 10 November 2022 (UTC)

Missing FS/FSI codes

@Xover Are we working with two separate code lists for 'ts'? What happened to the old 'ts' codes??? Your list contains only 42 codes and the old one had 142. You dropped codes which are used with 100's of my images. Is it the number of codes? or the layout style? I would like to include all codes in one table, organized alphabetically. — ineuw (talk) 20:23, 10 November 2022 (UTC)

@Ineuw: What list are you referring to? Xover (talk) 21:01, 10 November 2022 (UTC)
I think it's the list of short codes in {{ts}} (which was modularised), or it could be {{span tag}}, {{p}}, {{heading}} etc.
Was something in another template specfically referencing part of the old version of {{ts}}?
An example page is needed to help figure out WHAT isn't as expected. ShakespeareFan00 (talk) 21:20, 10 November 2022 (UTC)
Also {{Table style/parse}} should probably be marked as deprecated as the current implementation of {{ts}} uses Module:Table style/data

instead. ShakespeareFan00 (talk) 21:27, 10 November 2022 (UTC)

There are less separate codes listed in the module version because it aliases. - I've implemented 'fic' for freedimg-center. The code is tweaked a little bit from {{img-float}} - See the bottom of my user page- User:ShakespeareFan00 :) ShakespeareFan00 (talk) 21:58, 10 November 2022 (UTC)
I don't care where the codes come from, and what form they are code formatted. There are many codes that are in use and do not exist on Xover's list. Perhaps you want convert them? In any case, you can't archive unless you re-create and redirect the codes. — ineuw (talk) 23:57, 10 November 2022 (UTC)
Okay let's slow right down..
  • Firstly WHAT template are we ACTAULLY talking about?
  • Are you SURE you haven't got {{ts}} confused with one of the others, or with writing CSS styles directly inline?
  • What codes do you say are in use, but not present in the lists for the SPECFIC template you are actually using? ShakespeareFan00 (talk) 00:18, 11 November 2022 (UTC)
As far as I can determine ALL the codes/alias in the original {{ts}} got migrated to the module version.ShakespeareFan00 (talk) 00:20, 11 November 2022 (UTC)
I pasted my copy which was copied last may. User:Ineuw/notes7 — ineuw (talk) 02:29, 11 November 2022 (UTC)
@Ineuw: Well, I certainly didn't change any codes in a list on your computer! :)
In other words, you need to take a couple of steps back and add explanations so the rest of us can follow you. When you refer to "Xover's list", where on-wiki can I find that? In what context (which wikipage) are you trying to use a code that no longer works? And how are you using that code? In the section heading here you refer to, I think, {{FreedImg}} and {{FreedImg/span}}; but in your message you refer to {{table style}}. If you are trying to use a code with one of these templates that is not working, you need to specify which template and which code. Xover (talk) 06:40, 11 November 2022 (UTC)
I am referring to this list. In this list, there exist a float:left and float:right. The copy on my computer came from an earlier layout of that page. What I called "Xover's table" is the Module:Table style/data. This is my first encounter with a Module and I missed it. Are they related? how is a new code added to the module? — ineuw (talk) 08:27, 11 November 2022 (UTC)
Some template code on Wikisource, was moved to being written in Lua (Module: namespace), because it was easy to make it easier to maintain, and to improve performance. Table style which is widely used was one of these templates.
The coding in the module is JSON like.
The coding for fll for example is as follows:
  ['fll'] = {
    style = 'float:left;',
    aliases = {'float left'}
  },
The first line is the short code, such as the al, ar, fll, flr etc...
The second line is the actual CSS. This can be of any length, but must be valid 'sanitised' CSS.
The third line is the aliases, that is the other codes that can be used for same styling.
My added implementation to float an image to the center is:-
  ['fic'] = {
    style ='display:table; margin:0 auto;text-align:center;max-width: 100%;height: auto',
    aliases = {}
  },
To add a code, you add the three line with the relevant parameters.
ShakespeareFan00 (talk) 09:25, 11 November 2022 (UTC)
@ShakespeareFan00: Your "fic" code is outside the scope of {{ts}}. For one thing it makes no sense to set display:table on a table element (which is what {{ts}} is for). For another it is trying to do too many things, and a mechanism like {{ts}} should only be used for one or two style rules to be manageable.
I suspect you're adding this one in order to use {{ts}} outside a table, which is not a supported use case, a bad idea, and will need to be migrated at some point. I would strongly urge you to not use this approach to solve the problem you're trying to solve. We may not have a quick fix for that problem, but it's probably better to wait until we do than to hack up something like this just to patch over it. Xover (talk) 15:15, 11 November 2022 (UTC)
@Ineuw: The list in Template:Table style/doc is generated automatically from the actual code in the module and should always (modulo caching) reflect the actual behaviour of {{ts}}. The codes supported in the module should reflect exactly the codes that were supported by the original template, barring a few codes that have been added subsequently. If any were lost during the conversion then that is simply a bug, and I am not aware that this is the case.
The instructions for adding new codes are in the docs at Module:Table style/data. But also please heed the caution there: the data is in actual Lua code (that is, it is programming language code) so please make sure you know what you are doing before modifying it, and consider requesting assistance if you are not certain you can do so safely.
I would also generally urge constraint in adding new codes. While it is often convenient, the {{ts}} approach to formatting things is not sustainable. It exists because we have no better way to handle table formatting, specifically, and not because this is actually a good way to do it. For things other than tables we usually do have have better ways to fill a given need, even if they have not been implemented yet, and (ab)using {{ts}} or one of the templates using a similar approach is a bad idea from a technical and long-term sustainability perspective.
And as I keep reminding ShakespeareFan00, some things are just simply not sufficiently well supported (in web standards, web browsers, or MediaWiki) that we should be relying on it, even if we can find some way to hack it together such that it seems to give the desired effect right now. Dot leaders, and to a lesser degree the TOC templates in general, are a prime example that has caused us no end of grief and will continue to do so for years and decades still. Not to mention things like {{center block}} and {{block center}} (which both do the same thing, but subtly differently, so that only massive manual labour can reconcile and merge the two); or {{brace}} and {{brace2}}; {{redact}} and {{redact2}}; or…
In any case… Speaking as someone who spends an inordinate amount of time trying to reconcile and clean up this mess, I urge restraint in making changes, or adding new templates, or using a template for a purpose different than the one for which it was designed just to solve some there-and-then problem without considering its wider implications and maintainability. We're not exactly spoilt for technical contributors so we have a strictly limited capacity for maintaining such things. Xover (talk) 15:05, 11 November 2022 (UTC)
Thank you.
What I was trying to do is to have {{FIS}} center an image+caption (which it can't do currently). Currently FIS set's "display: inline-block." If it instead set "display:table;margin:0 auto" on the wrapper containter for the very specfic float=center use case however (which is essentially what {{img_float}} DOES in the same use case).
See Page:A complete course in dressmaking, (Vol. 4, Blouses) (IA completecoursein04cono).pdf/24 for an example of the type of image+caption inside a paragrpah run that I am talking about.
I agree with you that HTML/CSS doesn't necessarily provide a long-term fix for this yet.
ShakespeareFan00 (talk) 16:36, 11 November 2022 (UTC)
@ShakespeareFan00: What I'm having trouble understanding is why this image needs to be inline? Xover (talk) 16:39, 11 November 2022 (UTC)
Because, if it isn't and I put {{FI}} instead, it breaks the paragraph, which meant an intended approach I had for subheadings didn't work. I ended up abandoning the sub-headings approach entirely. The SPECFIC image can be relocated. However, I am disapointed that there seems to be a resistance to an easy fix, for something that would make FIS and img-float behave with similar functionality, which would be beneficial longer term .... (Sigh)ShakespeareFan00 (talk) 16:44, 11 November 2022 (UTC)
@ShakespeareFan00: I am absolutely willing to listen and to be persuaded otherwise. But to me it currently looks like you're trying to use the inline version of FI as if it were the block version, instead of just using the block version directly. Shoehorning a block-oriented template into working as an inline-oriented template, and vice-versa, does not generally seem like a good idea. Xover (talk) 16:48, 11 November 2022 (UTC)
I can understand your technical objections, and in respect of SPECFIC images in the linked SPECFIC work, most of the images can be relocated to not being within paragaph runs.
As I've explained before REPEATEDLY, {{FI}} uses a DIV, {{FIS}} uses a SPAN. There isn't a way to get an image that displays like {{FI}} does (i.e float=center) using {{FIS}} currently. {{FI}} cannot be used inline, because DIV in P is badly formed HTML. That's now 4 TIMES I've explained it, and it gets more and more tiresome every time. :(
Do I actually have to BREAK some (sandboxed) pages personally, trying to get a float=center behaviour, before someone else implements what should be a relatively straightforward (it isn't trival though) set of if statements in the relevant Module? (namely a test for a FIS vs FI usage, and args["float"]="center", and change the display type used on the outer containing span from being an inline-block to being 'display:table' if it's float=center. {{FI}} being DIV based already, is already in effect float=center.
ShakespeareFan00 (talk) 17:04, 11 November 2022 (UTC)
Okay so I implemented a very dirty float=center here Module:FreedImg/sandbox line 183.
I'd appreciate someone writing some testcases for the intended new functionality.ShakespeareFan00 (talk) 17:26, 11 November 2022 (UTC)

┌─────────────────────────────────┘
@Xover Thanks for you message. I have no intention to add anything to the module. As long as the codes work, it's fine.

@ShakespeareFan00 There is no need to explain the issue. Perhaps what you want to do is not possible. After all there is no such a thing as a floating center, as you pointed out to me. What you are trying to achieve that may not be possible. You mention the caption is not centered? Perhaps you can show an actual printed image comparison? in this work, there are hundreds of images in center floating between columns. AFAIK we cannot do this. — ineuw (talk) 20:30, 11 November 2022 (UTC)

@ShakespeareFan00: My apologies. As mentioned IRL is being a bit recalcitrant just now so I am having trouble keeping up and am not able to spare a lot of brain-cycles for wiki-stuff. You have indeed explained several times, it's just that I haven't been able to understand it yet.
But… The point I am failing to get is why you need to use FIS as if it was a block-based template. None of the examples I've seen appear to require that. Page:A complete course in dressmaking, (Vol. 4, Blouses) (IA completecoursein04cono).pdf/23 for example, works perfectly fine with FI. Ditto for Page:A complete course in dressmaking, (Vol. 4, Blouses) (IA completecoursein04cono).pdf/24.
I perfectly well understand that block elements cannot be used inside inline elements, and that a div cannot appear inside a p element. But I don't see how this applies in any of the examples I've so far seen. And, in fact, that very issue makes FI and other block-based templates preferable to span-based ones: they can contain both, where the span-based ones can only contain other inline elements.
Is the need driven by something that is visually broken? Is it a lint error you're seeing? Is it breaking when used inside another template? When another template is used inside it?
I am truly sorry if I am being dense, but I just haven't been able to grasp what problem we are trying to solve. Xover (talk) 20:18, 11 November 2022 (UTC)
See the code I added to the sandbox version of the Module. I also put some test cases..
What I wanted was a version of FIS that did float=center. The sandbox version of FIS now does that.
This wasn't about flowing text on either side (which is indeed NOT possible in simple HTML/CSS). It was always about an image centred in the page with blank space either side, without needing to worry about re-locating the image outside of where a paragraph was to cope with where mediawiki put P tags.
I'm too tired to go into any more depth on it right now.. but as nothing visually broke in terms of relocating images, that's the pragamatic way to move forward.
I'd still appreciate someone looking at the sandbox code. ShakespeareFan00 (talk) 22:10, 11 November 2022 (UTC)
As an aside, The FI/FIS module probably should throw an error if a non-sensical float value is used.?
ShakespeareFan00 (talk) 22:13, 11 November 2022 (UTC)
Errors: yes, most of our templates should ideally give visible errors when given detectably non-sensical input, but in the general case it'd be a lot of boilerplate code (harder to maintain and prone to both false positives and false negatives etc.) for relatively little practical benefit. It's possible we can find a somewhat sane way to do it for our Lua code long term (I have some ideas), but I don't think it'll percolate to the top of the priority list any time soon. Xover (talk) 23:18, 11 November 2022 (UTC)

two scans

  1. The first is at Hathi. It is https://babel.hathitrust.org/cgi/pt?id=mdp.39015071565991 (The Road to Wellville). They included a 1904 version of this in boxes of Grape-Nuts and, I think it should be here. So, I am uncertain if you have access to Hathi or not. If you do have access and are willing to download/upload, then great. If either of those conditions are not met, I can take it to the Scan Lab.
  2. I downloaded a pdf from somewhere in Deutchland and uploaded it here and it is not acting like a pdf. The site has a download of tiff of the scan. Perhaps you could sent it through the Djvu machine? The book is goregeous, really. The files live at https://digital.staatsbibliothek-berlin.de/werkansicht?PPN=PPN780028635&PHYSID=PHYS_0007&DMDID= Same conditions, if both are not met, I can take it to scan lab. Oh, the broken pdf is at File:Grammar Japanese Ornament-PPN780028635.pdf

Further, I am not alone proofing these, if this is a matter. Also, I hope all is well with you.--RaboKarbakian (talk) 16:15, 19 November 2022 (UTC)

@RaboKarbakian: File:A Grammar of Japanese Ornament and Design (1880).djvu (minimal checking, so caveat emptor).
I'm afraid I have no special HathiTrust access. If it's geolocked for you it'll be geolocked for me too. Xover (talk) 22:07, 19 November 2022 (UTC)
Xover: Index:A Grammar of Japanese Ornament and Design (1880).djvu <--Looks great! Thank you! Hathitrust is not geolocked, but as a guest, I can only download certain things. I am considering downloading it page by page and getting rid of the watermarks. It could be done by tomorrow, but to upload, it would be Monday. I am thinking it will be worth that extra work for the watermarkless pages. Maybe I will be dropping by with a link to a pack of pages on Monday? Thanks again!--RaboKarbakian (talk) 00:43, 20 November 2022 (UTC)
@RaboKarbakian: Feel free to drop such a link. I just can't promise how quickly I'll get to it.
HathiTrust geolocks individual scans based on whether they think it is in copyright etc. So to access these you can either access them from a US IP address, or you can be logged in through one of their member institutions. Xover (talk) 09:06, 20 November 2022 (UTC)
Xover https://drive.google.com/file/d/1OivWjDBM9d24jgt8Td9SzK_FObp12tmb/view?usp=share_link Access at Hathi makes download easier.--RaboKarbakian (talk) 15:16, 20 November 2022 (UTC)
@RaboKarbakian: File:The Road to Wellville (1926).djvu Xover (talk) 17:19, 20 November 2022 (UTC)

replicating tracker into NIE

Hi. Recently you put a tracker into {{Appletons'}}. {{NIE}} probably has a similar need. Truth be known we probably have an amount of work to do here, and I almost think that we should be spinning in an intermediary template for biographical compendiums for those listed in herebillinghurst sDrewth 03:48, 20 November 2022 (UTC)

@Billinghurst: I added a similar tracking cat for NIE: Category:NIE template with Wikipedia parameter (5,038).
Work on migrating {{header}} to Lua stopped a while back due to an issue with TemplateStyles, but I think we've got that one licked and can continue. Once that is done I think we can more easily add at least coarse tracking categories for these parameters (since all the bio templates end up calling {{header}}). If we need per-work categories we'll probably need them to pass the module a prefix to use (something like |catprefix = NIE), but it should be doable. Xover (talk) 09:24, 20 November 2022 (UTC)
Thanks, it is a rough brain fart at this stage, though I was thinking that having a better track of templates that leverage header, and components of header would be useful. My brain harks back to the conversations when we went to the current look of header (c. 2009), and its components, and how we have developed it since. There are things that are a little quirky maintenance category around application of formatting. As you know I have a specific interest in the biographical compendiums which have specific constructions around volumes (some as a note, some as an active link), contributors, editions, etc., as such that I think that we should consider an intermediate template build (or similar) to make something like {{NIE}} more a shell to build, and stylistically able to be modified from the intermediate (though that may be your wish from the module), and here I am thinking of {{authority/link}} <=> {{CFB link}}. [My hope with "[[{{{title}}}/{{{1}}}|{{{1}}}]]," in [[{{{title}}}]] is that at some point when we get around to having a definitive style review it becomes easy to update. [ Wishes! :-) ] — billinghurst sDrewth 22:23, 20 November 2022 (UTC)
@Billinghurst: I'm not sure I understand the requirements for the biographical compendiums well enough to be able to reason out the majority of their needs (so you'll need to spoonfeed me the details). But at least {{header}} is likely to become just an #invoke of the Lua module, which means per-work wrappers can just manipulate the parameters and then #invoke the same Lua function. Not sure what we'll do with {{process header}}, {{translation header}}, etc., but I'm hoping they can just be different entry points to the same function. If there are commonalities across the different per-work wrappers it's certainly possible to make helper functions for that (possibly just like {{process header}} etc.). We could also introduce an intermediary template and/or Lua module specifically for these, but my initial assumption (which may well be wrong) is that this would only make sense if they provided an alternate interface for per-work template builders. And with my limited understanding of this area I can't immediately see what that would look like. I'd be happy to look for ways we could reduce pain points, standardise features (like tracking cats), etc. though.
Could you perhaps give a couple of examples of things that are hard or tedious to do with per-work templates today? Tracking cats for |wikipedia= (and possibly the other sister project params) I'm guessing. What else? Xover (talk) 20:08, 24 November 2022 (UTC)

header update thoughts

The author field in header and the override_author are on old solution concept that maybe too simple (old?) if we are moving to a module, especially when override_author was meant to capture multiple authors, then we at sometime suborned it for disambiguated author display. We probably have the scope with module to look to author, author1, ... authorn. We maybe can get to a better solution for recording and presentation. The display label for the author and the underlying link were done that way for good reasons, but while it can be a complicating factor to explain, it may have positive benefits in a range of spaces. ESPECIALLY where we have anonymous works, or things that are corporate authors so hanging off a portal page.

We probably then have similar scope for translators, section contributors, section translators stepping onwards. — billinghurst sDrewth 23:41, 23 November 2022 (UTC)

@Billinghurst: Hmm. |author1=, |author2=, etc. paired with |author1-link=, |author2-link=, etc.? And the same for |translator1= / |translator1-link=, |contributor1= / |contributor1-link=, |editor1= / |editor1-link= and so forth? Yeah, I think we could do that in a fairly straightforward way. If so it would be good to (eventually) migrate all existing uses of |override_author= etc. so we could remove that code from the module. Probably bot'able for most cases, the exceptions being when people have gotten creative with what they put in that field.
Do we want to try for structured data like |author1-last= / |author1-first= for the benefit of web scrapers / metadata, or would that be taking it a step too far? Would probably be finicky given the wide variability of names and auto-linking authors, but for simple cases it might provide some benefit. How about supporting Wikidata QIDs? We'd then get whatever the name at Wikidata was, and be dependent on Wikidata site links for linking the Author: page, but it'd be a nice and tidy way to link our stuff to Wikidata in a machine readable way. We could also conceivably support just giving the QID for an edition item at Wikidata and fetch everything else from there, but I think that would probably require some better (visual, smart) tools for creating Wikidata items and adding bibliographic details to them (and I haven't found any sane API for that as yet; plus there's no real supported GUI library just now while we're waiting for Codex/Vue to get properly deployed).
I plan to have a go at {{header}} again at some point not too distant, so do feel free to drop thoughts here and I'll keep them in mind. Xover (talk) 19:10, 24 November 2022 (UTC)

Something went badly wrong on a match and split... Can you find the CLEAN version and delete the pages in the Index: that got created in error thanks?

It would be nice to have a way to 'undo' a bad match and split :( ShakespeareFan00 (talk) 23:23, 25 November 2022 (UTC)

Resolved. As I said it would be NICE to have a way to 'undo' a bad one. ShakespeareFan00 (talk) 10:07, 26 November 2022 (UTC)

Removing MediaWiki rules

I poked around Wikisource:Scriptorium and elsewhere, but I can't seem to find why you and other users are replacing the native MediaWiki ---- for a horizontal rule with {{rule}}. My understanding is that MediaWiki > templates/modules > raw (X)HTML. Is there something that I'm missing here? —Justin (koavf)TCM 17:15, 28 November 2022 (UTC)

@Koavf: Yes. We generally prefer templates to raw MediaWiki wikimarkup, because MW and the skin own the styling of the wikimarkup (which can change at any time) and to override it we have to host style rules in site-wide CSS (meaning it gets loaded for every single page load for every single user, even though it has effect only on a fraction of pages). With templates we can add such styling to that template's TemplateStyles (meaning it gets loaded only when needed), and we control the styling, so we can make sure it stays consistent over time. Plus we can add tracking categories and other such facilities as needed.
Note that this is different from the calculus on, say, Wikipedia, where it is desirable to have consistency across articles, but having the specific styling change over time is usually not a problem; vs. our need to have each text be consistent over time, but each text can be completely different.
No rule without exceptions—some changes over time are inevitable, and for some things wikimarkup gives more robust and maintainable results—but as a general rule of thumb it is templates/modules > MediaWiki > raw (X)HTML. Xover (talk) 19:30, 28 November 2022 (UTC)

Hybrid footnotes..

Most of the items in the following residue seem to be using a hybrid footnotes method with SUP.

https://en.wikisource.org/w/index.php?search=insource%3A%2F%5C%3Csu%2F&title=Special:Search&profile=advanced&fulltext=1&ns0=1

Any chance of taking a look with a view to converting them to be <ref>...</ref> tags or something more appropriate directly? ShakespeareFan00 (talk) 19:03, 28 November 2022 (UTC)

The aim here was to migrate content uses of sup, so that one font-size could be set-up in the template, as opposed to in the Global CSS, or indeed on a per-work basis when the ones in is ns104 were migrated over time. ShakespeareFan00 (talk) 19:05, 28 November 2022 (UTC)

Migrating stuff...

Check my contributions.. I'd been trying to migrate some pages away from using <HR /> directly. ( I've tried to replace usages in ns0, and some other non talk namespaces... (I'm not sure if I checked ns104 (Page:), but that's less of a priority as we can set up Indexstyles as needed on a per work basis.)

Next step is attempting to migrate uses of <SUP>...</SUP> , <SUB>...</SUB> and <U>...</U> in ns0.. and some other namespaces that are quick to do.

Any chance you could do a review of these efforts and migrate things in ns10 as needed? Thanks. ShakespeareFan00 (talk) 20:36, 26 November 2022 (UTC)

Also, I recall having attempted to migrate some ---- rules over to {{rule}}. ShakespeareFan00 (talk) 12:02, 27 November 2022 (UTC)
You may also find some of the queries here of interest - https://public.paws.wmcloud.org/User:ShakespeareFan00/linthints.ipynb

ShakespeareFan00 (talk) 14:48, 27 November 2022 (UTC)

@Xover: Can you have a look into this at some point? I may have to take a break from massive contributions on Wikisource for a bit? ShakespeareFan00 (talk) 21:11, 30 November 2022 (UTC)

Remaining <FONT>...</FONT> tag usage (outside of user/talk/Wikisource namespaces.)

https://en.wikisource.org/w/index.php?title=Special:Search&limit=500&offset=0&ns0=1&ns6=1&ns8=1&ns10=1&ns12=1&ns14=1&ns100=1&ns102=1&ns104=1&ns106=1&ns114=1&ns710=1&ns828=1&ns2300=1&ns2302=1&search=insource%3A%2F%5C%3Cfont%2F

I suspect some of these will be bot-able.. Although I'm not quite sure whats going on with the 'invisible' POTM markers in Index namespace ShakespeareFan00 (talk) 20:27, 28 November 2022 (UTC)

@Billinghurst: cf. this. Do you happen to recall what the <font color=white>POTM</font> stuff is about? Is it still needed? If so, is it something we could migrate to a different mechanism (a tracking cat maybe)? Its genesis seems to be WT:PotM#November 2010 is Validation month, so was used mainly as a search key for Special:IndexPages it looks like? Xover (talk) 07:21, 29 November 2022 (UTC)
Yes, searchable and hide visibility. Definitely bot'able, and probably should be templated, rather than how I did it way back then. — billinghurst sDrewth 23:50, 29 November 2022 (UTC)
@Billinghurst: But do we need to keep these, or should they just be removed? I'm not really up to date on POTM so I could be way off, but I hadn't noticed these still being used so I had assumed they were obsolete. If we're still using such tagging we should whip up a template or make a dedicated Index field for it. And I'm happy to do it, but then someone needs to explain the key points about what it should do first. Xover (talk) 15:53, 30 November 2022 (UTC)

valign in Templates...

Is there a way of sorting these by usage:- https://en.wikisource.org/w/index.php?title=Special:Search&limit=500&offset=0&ns10=1&search=insource%3A%2Fvalign%5C%3D%2F

It would seem sensible converting templates people actually use first? ShakespeareFan00 (talk) 12:21, 29 November 2022 (UTC)

@ShakespeareFan00: Not without writing custom code, I don't think. Xover (talk) 15:48, 30 November 2022 (UTC)

Re: Table style template codes

I studied the module and the template shorthand codes of {{ts}}, and I propose to lock the table style template code list, so that new codes could only be added to the module. Does this sound reasonable? — ineuw (talk) 03:02, 29 November 2022 (UTC)

@Ineuw: Which list are you referring to here? The list in Template:Table style/doc is generated automatically by the module, based on its live configuration, so it will only ever contain codes that are actually extant. Xover (talk) 07:01, 29 November 2022 (UTC)
@Xover, Do you mean that if I currently use a code from the template's existing codes, it is added to the module automatically? — ineuw (talk) 06:06, 30 November 2022 (UTC)
No, I mean that what codes the module (and thus also the template) supports is defined in the configuration file at Module:Table style/data. In the template's documentation page—Template:Table style/doc—there is a list of supported codes. This list used to be manually maintained (and was therefore woefully inaccurate and out of date), but is now actually the output of a call to a special function in the module that just takes the configuration data and formats it as a wiki table. In other words, if you try to edit that section of the documentation you'll find {{#invoke:table style|supported_codes}} instead of the actual list of codes.
That was why I asked what list you were referring to in your initial message in this thread. There is only one way to add new codes (or remove existing ones), and that is to edit Module:Table style/data. Xover (talk) 14:38, 30 November 2022 (UTC)

@Xover I can clarify, but have a question as well.

  1. I never knew about Module:Table style/data until ShakespeareFan00 pointed it out to me. I had no reason to know about it because all shortcut codes I use work fine and these past years my contributions were strictly text editing. I was content.
  2. This list to which you referred to above was based on Template:Table_style/parse.
  3. I recognized this as a re-design of my earlier version, which was also based on a previous contribution. The history of contributions between 2015 and 2021 are gone.
  4. I don't know why this happened. I noticed it as I was trying to piece together the developments I missed over the years, and reply to your question with a clear understanding, which I still lack. Are the module codes and the template codes relate to one another in any way with a single call of "ts"? — ineuw (talk) 07:48, 1 December 2022 (UTC)
@Ineuw: It looks like the template wikipage that was at Template:Table style was moved to Template:Table style/parse, and then an entirely new template wikipage was created in the old location (Template:Table style). The original page history (before the page move) is therefore at Template:Table style/parse.
{{ts}} now contains no real logic: all it does is call the main function in Module:Table style. Module:Table style reads Module:Table style/data, looks up what parameters were passed to the template, constructs CSS rules based on the arguments and the configuration, and returns the result to the template, which in turn outputs it into the page it was used in. (I am simplifying away some irrelevant technical details here). The module isn't meant to be used directly from wikipages; it is just a backend for the template.
Template:Table style/parse is no longer used. It just hasn't been deleted yet for various reasons (mostly that I forgot about it). When the module version was created, its list of codes was extracted from Template:Table style/parse, so at that time the intent was that the old "plain template" implementation and the new "template calling a Lua module" implementation should support exactly the same codes (which it did, cf. Template:Table style/testcases). There have been some additions to the module since then, but no removals that I am aware of. Xover (talk) 08:10, 1 December 2022 (UTC)
Ty, ty, ty. Now, I understand now how it works. Mystery solved. I was using an older offline list of the doc. Thanks again. — ineuw (talk) 08:19, 1 December 2022 (UTC)
@Ineuw: You're very welcome! All the new facilities (like Lua modules) have many advantages, but they are certainly quite a bit more technically complex. Navigating this stuff can be a real challenge even for primarily technical people. Xover (talk) 09:12, 1 December 2022 (UTC)

I can't work out what to look for in respect of the image markup changes...

https://www.mediawiki.org/wiki/Parsoid/Parser_Unification/Media_structure/FAQ

I did ask if there was a table of what's changed but just got a referral back to the FAQ which did not answer the question I asked. (sigh) ShakespeareFan00 (talk) 14:43, 4 December 2022 (UTC)

@ShakespeareFan00: So far as I can tell the FAQ documents all the changes. What is it you're wondering that you didn't find there? Xover (talk) 14:55, 4 December 2022 (UTC)
I was trying to figure out what might be affected locally..

https://en.wikisource.org/w/index.php?title=Special:Search&limit=20&offset=40&ns2=1&ns8=1&ns9=1&ns10=1&ns11=1&ns828=1&ns829=1&search=insource%3A%2Fthumb%2F+intitle%3A%2Fjs%2F https://en.wikisource.org/w/index.php?search=insource%3A%2Fthumb%2F+intitle%3A%2Fcss%2F&title=Special:Search&profile=advanced&fulltext=1&ns2=1&ns8=1&ns9=1&ns10=1&ns11=1&ns828=1&ns829=1

At a rough estimation it's about 10-20 items all told.. (Module:Chessboard being one of the bigger ones, as it generates a wrapper based on the old-style of media wrapper.)

Of course the markup update hasn't happened on Wikisource yet, so a competent person (not me) has some time to look into it. ShakespeareFan00 (talk) 15:28, 4 December 2022 (UTC)

@ShakespeareFan00: The old classes aren't going away yet, so when it deploys nothing much will happen. Before they remove the classes they will make a migration path of some kind, and at that point we can decide what we need to do. IOW, unless something breaks this is a backburner item for now. Xover (talk) 15:34, 4 December 2022 (UTC)

what is the problem with global classes

I am not understanding the issue with the use of having global classes like .tablecolhdborder? I utilise it regularly as it is just convenient and standard. — billinghurst sDrewth 01:06, 30 November 2022 (UTC)

@Billinghurst: Well, to be fair it does get kinda esoteric and technical. The gist is:
Things in site-wide CSS get loaded for every single user, on every single page view on every single page, regardless of whether that code is actually used there or not. As such it's actually a measurable fraction of the WMF's bandwidth budget that's pure waste (for people on metered access or slow computers it's also bad, but mostly it's an issue when aggregated). We used to have everything in MediaWiki:Common.css, which doesn't even compress the data for transfer, but GOIII started (and we've now finished) a move to having it in CSS-only Gadgets instead. That alone was a big improvement since ResourceLoader performs concatenation and minification. But we're still loading several hundred kb unnecessarily on every page load.
In addition, things in site-wide CSS can only be edited by people with the Interface Admin right, and most of what we're talking about is not really something that needs to be, or should be, restricted beyond autoconfirmed or sometimes regular adminstrator. And site-wide styles must be so restricted because there are no real limits to what you can do there: you can completely change the user interface, replace the "Login" link, etc.
And then there's the way style rules in site-wide CSS interacts with other sources of styling. For example, we're seeing bad interactions between our site-wide styles and style rules from the skin, from various extensions (WikiEditor, Score, Math, etc.), from templates that we import from enWP (or sometimes Commons); and even figuring out that the site-wide styles are a factor, much less figuring out what's going on, is a real pain because they're not visible anywhere on the page. It doesn't help that we have such a lot of site-wide style, with very little in the way of documentation. Why is that rule there? What is it intended to do? What's going to break if we change it?
By moving this to (mostly) TemplateStyles we solve all these problems. No unnecessary page load overhead because it only gets loaded on pages that actually use that template. We can set protection as strict or as permissive as we need it to be. Most of the rules are eliminated as a factor for any given problem because they just simply are not present on whatever page is relevant. And any styles are immediately obvious in that you can see the template invocation in the actual page. TemplateStyles are automatically scoped to the content area of a page, so they can't affect the site chrome (menus, side bar, etc.), so they're much safer even if access to edit them is much easier. And in most cases it keeps the style in a unit with the code that uses it, rather than act as a "magic" keyword from "somewhere".
There are some drawbacks too, of course. TemplateStyles can't do everything a CSS Gadget can. Extension tags (like <score>, <math>, <poem>) can't be styled like this without actually wrapping them in a template (even if that is otherwise not needed). And some CSS classes from site-wide styles (i.e. .tablecolhdborder) are used in a context where using a template is going to be more difficult/effort/overhead/less optimal for the user. But the short version is that we ideally want zero styles in site-wide styles (Gadgets), and definitely nothing in MediaWiki:Common.css, but there are some things we can't actually move out any time soon, so everything that we can move we should. Xover (talk) 15:40, 30 November 2022 (UTC)
THOUGH site-wide standard and work-easing css are also beneficial, so we need to find the balance of overhead and value in simplicity and effort. Give me a practical solution so that a class like .tablecolhdborder can be easily, universally added and available, then we can talk. Having to have it added as css as the code by each individual for each used, is a rubbish solution ... throwing the baby out with the bath water! And I will note that this style and others are documented in Wikisource:Style guide/Tables
Then that someone is converting a plethora of templates from some simple code into css styles is problematic and adding next to no value and a ridiculous number of edits. Add the fact that where the templates have some level of protection, and then the resulting css styles do not. So please let us add value, consideration and explanation to why we do things. I wish for there still to be an ease of access, and classes can do that, making people do css coding and additions is not. Need a middle way. — billinghurst sDrewth 11:03, 1 December 2022 (UTC)
@Billinghurst: I absolutely do not want users in general to go around adding raw CSS to random pages. That's why we have templates: we can add styles to the template and have it reused everywhere. The problem is the subset of cases where a template is not an obvious and easy replacement, and table formatting is one such. Whether to protect TableStyles or not is… well, I have no strong opinion off the cuff. But by having it in TableStyles we can protect it or not depending on what we need.
That being said, just because someone once had a whim to add a global CSS class and that it looks convenient for end users does not mean it s actually a good idea, it just means we have existing uses and ingrained habits to deal with. Sometimes that means we can't actually clean things up and will have to live with the status quo. Sometimes it means we will have to break things and inconvenience those users that were used to the old way. And sometimes we have to design technically complicated solutions in order to achieve the cleanup without either breaking stuff or inconveniencing users. But we can't not address technical debt because while that may be less annoying short term, in the long term it will cause more and more and more problems.
I wrote somewhere that .tablecolhdborder as it currently exists will have to go away. But I don't yet know what the path to that is because I haven't started looking into that yet. One possible path could be adding it as a supported keyword in {{ts}} so that you'd replace class="tablecolhdborder" with {{ts|tablecolhdborder}}. Another thing to look into is creating a {{tablecolhdborder}} template that you'd use instead of {|. There are probably others I haven't thought of yet. Xover (talk) 12:44, 1 December 2022 (UTC)
Well when there is a plan for a better and universal way, then I don't have an issue with a better way. It is, however, a problem when users feel encouraged to just go and deconstruct existing templates that leverage global css and start to embed that code it into subsidiary style sheets; and these users feel that they have a tacit approval to do it because all they hear is "global css is bad". The message clearly needs to be, we should be moving away from global css, rationally and logically. AND the reason why we have stuff in global css is due to the mess and complexity that we were seeing coded at the time. And yes, it pre-dates templatestyles, and yes it pre-dates tablestyles, and when we can have something equally easy to implement, universally, you will hear me cheer. — billinghurst sDrewth 10:17, 5 December 2022 (UTC)

Superscripts..

https://en.wikisource.org/w/index.php?title=MediaWiki:Gadget-enws-tweaks.css&diff=next&oldid=12775570

Did you intend to remove the generic font-size alteration when you updated the stuff about footnotes and references classes? ShakespeareFan00 (talk) 07:17, 4 December 2022 (UTC)

@ShakespeareFan00: I intended to remove all that was removed in that edit, yes. But your description "the generic font-size alteration" doesn't quite match what I would use to describe that, so I'm wondering if we are talking about the same thing? Is the removal causing problems somewhere? Xover (talk) 09:01, 4 December 2022 (UTC)
I've not noticed any major concerns, but I have added a font-size on the conversions of SUP to SPAN I did in templates.ShakespeareFan00 (talk) 10:04, 4 December 2022 (UTC)

The removal of indentedpage has however seemingly affected {{verse}}, The align=left option now overlapping the numbers on the text. ShakespeareFan00 (talk) 10:04, 4 December 2022 (UTC)

@ShakespeareFan00: If so that happened long ago. The Dynamic Layouts Gadget has been removing .indented-page (and .prose) for a long time, so those classes will have had no effect in the interval. Xover (talk) 10:10, 4 December 2022 (UTC)
Hmm. The next question is how to restyle {{verse}} as this as you say has been there a while. ShakespeareFan00 (talk) 11:10, 4 December 2022 (UTC)
@ShakespeareFan00: Do you have a page where it's broken? Xover (talk) 11:13, 4 December 2022 (UTC)
https://en.wikisource.org/w/index.php?search=insource%3A%2F%5C%7B%5C%7Bverse%2F+insource%3A%2Falign%5C%3Dleft%2F&title=Special%3ASearch&profile=advanced&fulltext=1&ns0=1&ns2=1&ns4=1&ns6=1&ns8=1&ns10=1&ns12=1&ns14=1&ns100=1&ns102=1&ns104=1&ns106=1&ns114=1&ns710=1 lists the broken cases..
https://en.wikisource.org/w/index.php?title=Page:Poeticedda00belluoft.djvu/41&oldid=12545677

was where I noted the problem first. ShakespeareFan00 (talk) 11:14, 4 December 2022 (UTC)

@ShakespeareFan00: The way the Poetic Edda pages are set up is probably not optimal, but I haven't come up with a better approach yet. The other ~217 uses are in bibles and there it appears to work fine.
The Edda is awkward in that the stuff in ref tags aren't really footnotes in the traditional sense. They're more running commentary for each verse, ala. what you'd find in a critical edition of Shakespeare (see e.g. here). The challenge is that the notes sometimes span multiple pages, meaning ref=follow is the obvious approach. But I suspect labelled section transclusion is going to be more appropriate there.
Anyways… I'm not seeing general breakage of {{verse}}, just edge cases where it's equally likely that a different approach is the solution. Xover (talk) 06:44, 5 December 2022 (UTC)

the listen template and the several shades of lazy

So, according to {{listen}} if, when using, filesize isn't mentioned it is considered to be lazy. So, I thought, okiedokie, I will just write a script that fills the template in and includes filesize. And, of course, the file size my simple script gets is in bites, or bytes (and here the dragons lurk). Once a decade or so, the 1000 vs 1027 problem arises in my life and it is too long in between times for me to maintain a grasp on that situation.

So, I was wondering if you might be interested in changing "listen" to accept the bs and convert them to Ms and kbs. And then in addition to your interest or not, there is the question of it being something that even should be done.

An example of my scripts listen template can be found here -> Titan of Chasms/Titan of Chasms. And, as always, thank you for your consideration of this.--RaboKarbakian (talk) 17:38, 4 December 2022 (UTC)

@RaboKarbakian: Making {{listen}} able to convert bytes to SI-prefixed units is a relatively big job, so nothing I'll prioritise short term. But if you already have a script to get it from the file, converting the value to kB or MB should be relatively straightforward. Is this a (JS) user script or something you're running offline? Xover (talk) 20:05, 4 December 2022 (UTC)
Oh, also, I'm not a big fan of the {{listen}} template for any number of reasons, so for my part I have no particularly strong opinion on whether or not to add the file size. Xover (talk) 20:06, 4 December 2022 (UTC)
No javascript here. My parrot is dead!! I have this whole suite of scripts that work together to make broken (single channel, dubiously labelled ((though not as bad lately))) librivox mp3s into 2 channel, nicely labeled, replay gain adjusted (and in my own opinion) beautiful oggs. Making the template within that mess/suite was easy. If you have a template that you like better that will serve the same purpose, let me know. Until then I will be dividing by 1000, etc. soon, I suppose. Yes, offline and pesky as it uses the html and rss from librivox. The Red Fairy oggs/mp3s straight from librivox did not play on my mp3 player, some few years ago, being the mother of the invention.--RaboKarbakian (talk) 20:39, 4 December 2022 (UTC)
Template:listen was a magnificent step forward for where we were at the time and the implementations of the tagging back them, and well before they improved the available metadata in files, and it was just enabling more simply what was being implemented back when. Peek at the api mw:API:Imageinfo and see what you can manipulate. — billinghurst sDrewth 10:25, 5 December 2022 (UTC)

class=pagetext directly in pages...

I was originally intending to use the following: pwb.py listpages -ns:104 -linter:"missing-end-tag" -start:* -grep:"class\=(\"|)pagetext(\"|)\>" -intersect -lang:en -family:wikisource -format:"[[{page.loc_title}]]" >pagetext.txt

To trim back some easily removed stray DIV tags using AWB, once I had a list.. However, the query is taking a very long time to run.. Know of way to do it in a faster way? (against a Database dump maybe?) ShakespeareFan00 (talk) 20:01, 5 December 2022 (UTC)

Oversight request..

I leave it your discretion, but I seemingly got a bit confused here:- https://en.wikisource.org/w/index.php?title=Wikisource:Scriptorium&oldid=12837081 https://en.wikisource.org/w/index.php?title=Wikisource:Scriptorium&oldid=12837083

I did go back and edit it carefully, on a re-reading to make sure it said what I wanted. Not my evening, and as an experienced long term contributor, this was unprofessional of me. Thanks ShakespeareFan00 (talk) 23:58, 8 December 2022 (UTC)

@ShakespeareFan00: I don't understand what you mean. I see nothing to complain about with your edits to WS:S over the last ~12 hours. You politely and constructively asked a question and, implicitly, stated an opinion on an issue of general community interest on which input had been solicited. Opening a deletion discussion was also the correct way to handle the issue, and, indeed, I was planning to do so myself had you not beat me to it. So unless I'm misunderstanding what you are referring to I see no cause for self-flagellation. Xover (talk) 07:06, 9 December 2022 (UTC)
It was that I got confused (in the specific diffs I linked) about what another contributor had actually been mentioning, and thus my initial response was poorly drafted, and might also be inaccurate. ShakespeareFan00 (talk) 07:30, 9 December 2022 (UTC)
@ShakespeareFan00: Find me one contributor to any Wikimedia project over its entire history that has not at one point or another been confused about what another user was saying… You're being waaay too critical of yourself here. Xover (talk) 07:39, 9 December 2022 (UTC)

Emphasis on index page

Hey,

Is there any reason why within the this edit you've enclosed the "Pages" string in <em> tag (line 526)? I don't know if accessibility in the Index namespace is important, but it irks me personally since I'm color coding to differentiate between <em>, <i>, <cite>, etc. I think the semantically correct tag would be some header, like <h2>. Alnaling (talk) 16:07, 12 December 2022 (UTC)

@Alnaling: Well, in that version of the code it'd probably properly be a separate th for the whole thing there; but as I'm in the process of migrating this away from the table-based layout it will soon become a div containing a classed span. But using em there isn't wrong: it's a legend for the pagelist as much as a heading, and the word "Pages" is emphasised within it. If you have a personal style sheet that assumes em will only ever appear in prose you'll spend a lot of time frustrated. :) Xover (talk) 18:20, 12 December 2022 (UTC)
Huh, I haven't noticed any other ems in the wild yet, but I'll make my CSS more selective if you say so. I always assumed that ems do only occur in prose, since it's hard to put a stress emphasis if you are not imitating a speech to some degree. If we are in the table then I would consider strong or b to be more fitting, since the purpose is more to draw attention to where the pages start, (and you even style it as bold :) ) but I don't really care that much if it's just temporary. Alnaling (talk) 19:11, 12 December 2022 (UTC)
@Alnaling: If it'll help you I can try to switch it to some alternate markup (like strong) the next time I sit down with that code (please do remind me if it looks like I've forgotten about it). Permanent switchover to the new implementation will probably require quite a bit of testing so that won't land soon. Xover (talk) 11:19, 13 December 2022 (UTC)

Recent contributions..

Whilst I await the patch to fix LintError updates with respect to Page namespace, I did some cleanup in another namespace.

However, I'd appreciate someone doing a review of my efforts (which I will desist from if you deem them contentious or find incomptence naturally.)

ShakespeareFan00 (talk) 16:05, 17 December 2022 (UTC)

This appears to be a User translation of a work on Spanish Wikisource, so I am not sure what to do.

It doesn't seem to be an obvious copyvio, and as an in scope work it doesn't meet the criteria for a proposed deletion. If it's a valid translation it should be in the Translation: namespace, but felt it was more sensible to ask for an admin/sysop decision. ShakespeareFan00 (talk) 16:12, 17 December 2022 (UTC)

@ShakespeareFan00: It's in user-space so just leave it (it's a sandbox, essentially). If it's throwing up errors or something we can just blank the page. Xover (talk) 16:49, 17 December 2022 (UTC)
It producing relatively minor LintErrors. It's a shame there isn't a process to periodically review 'drafts' in Userspace to see if they are ready to put in ns0 again. ShakespeareFan00 (talk) 20:19, 17 December 2022 (UTC)

Lint errrors...

https://en.wikisource.org/w/index.php?title=Special:LintErrors/missing-end-tag&offset=1378734&exactmatch=1&namespace=2&titlecategorysearch=

Any chance you could help reduce these? ShakespeareFan00 (talk) 16:19, 17 December 2022 (UTC)

Do font-size templates must span pages?

I am working on the six volume 5,400 page project of History of Woman Suffrage - a sample page, with thousands of small font paragraphs. I am using the {{fs85/s}} set of templates. If a paragraph spans two pages, is it necessary to bury the end code in the footer and begin code in the header? Instead, I want to terminate the paragraph in the textbox? Would two adjacent calls to the template cause a problem in the main namespace? I use a keyboard macro to surround the selected text with the template codes. — ineuw (talk) 22:12, 22 December 2022 (UTC)

I have my answer which is no. Thanks for listening. :-) — ineuw (talk) 01:37, 23 December 2022 (UTC)

@Ineuw: {{fs85/s}} is div (block) based, so it will always insert a new paragraph. For lines that span Page: pages this will break when transcluded if you do not use the header/footer. Xover (talk) 09:59, 23 December 2022 (UTC)
From now on, I will check before asking. :-) Happy holidays. — ineuw (talk) 17:25, 23 December 2022 (UTC)

a couple of pamphlets jp2/pbm --> djvu?

This is another Post pamphlet, I think: https://drive.google.com/file/d/1f328kLYyn-86LsLq050ZA8oS4yo9FGm0/view?usp=share_link

And this is a tourist guide for the "Lewis and Clark Exposition" https://drive.google.com/file/d/15FHT2BBDVGxhI9bY5X3YOc5GdBGaSpvb/view?usp=sharing

Or not, I can go to the scan lab....--RaboKarbakian (talk) 21:29, 27 December 2022 (UTC)

@RaboKarbakian: File:Lewis and Clark Centennial Exposition (1905).djvu and File:The Door Unbolted (1906).djvu. Let me know if you need to file names adjusted (before transferring to Commons). Uploaded locally without info and license templates. Please add these and then transfer to Commons (using the "Export to Wikimedia Commons" tab/button thingie; it'll verify that you have a compatible license before transferring). Xover (talk) 10:20, 28 December 2022 (UTC)
Quick! Thanks! There is a problem though. File:Lewis and Clark Centennial Exposition (1905).djvu is missing the 5th page of the scan, which is also the title page and also three pages (after scan page 48). I have a copy of the missing pages here, I can upload them if you need it....--RaboKarbakian (talk) 15:42, 28 December 2022 (UTC)
@RaboKarbakian:
Files the archive
Lewis_and_Clark_Exposition-Northern_Pacific-001-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-002-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-003-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-004-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-006-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-008-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-009-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-010-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-011-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-012-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-013-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-014-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-015-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-016-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-017-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-018-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-019-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-020-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-021-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-022-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-023-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-024-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-025-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-026-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-027-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-028-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-029-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-030-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-031-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-032-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-033-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-034-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-035-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-036-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-037-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-038-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-039-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-040-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-041-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-042-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-043-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-044-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-045-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-046-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-048-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-049-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-050-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-051-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-055-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-061-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-062-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-064-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-065-000.jp2
Lewis_and_Clark_Exposition-Northern_Pacific-066-000.jp2
If the missing sequence numbers (5, 7, 47, 52–54, 56–60, 63) represent missing pages then I do not have those. BTW, you may want to stick with regular old ZIP files instead of tar'ed and xz'ed archives. ZIP isn't the greatest at anything in particular, but it tends to "just work" and has mature implementations built into pretty much every operating system out there. --Xover (talk) 15:56, 28 December 2022 (UTC)
Okay, so that the files are here and not there, I can think of several steps along the way they could have been lost, but never the why for that -- but, I uploaded them File:Lewis and Clark Exposition-Northern Pacific-054-000.png, File:Lewis and Clark Exposition-Northern Pacific-053-000.png, File:Lewis and Clark Exposition-Northern Pacific-052-000.png, and File:Lewis and Clark Exposition-Northern Pacific-005-000.png. Hmm. I wonder if there are two more missing (due to the page names.--RaboKarbakian (talk) 15:58, 28 December 2022 (UTC)
And the others File:Lewis and Clark Exposition-Northern Pacific-056-000.png, File:Lewis and Clark Exposition-Northern Pacific-057-000.png, File:Lewis and Clark Exposition-Northern Pacific-058-000.png, File:Lewis and Clark Exposition-Northern Pacific-059-000.png, File:Lewis and Clark Exposition-Northern Pacific-060-000.png (a blank page). I was delighted when you handled the xz files. There were all of the Linux curmudgeons who said that it was a mistake to let the linux software work on windows and me, who said we could work together. The more I have to give up Linux software, the more they were right. They were right, a lot of the time. Sorry for this trouble.--RaboKarbakian (talk) 16:12, 28 December 2022 (UTC)
@RaboKarbakian: New version uploaded. Xover (talk) 16:22, 28 December 2022 (UTC)
File:Lewis and Clark Exposition-Northern Pacific-047-000.png and right now, I am so confused! It feels off still, but it is a feeling I cannot prove. I can zip up the whole set and redo the google drop, if it would help....--RaboKarbakian (talk) 16:44, 28 December 2022 (UTC)
@RaboKarbakian: New version with p. 47 uploaded. Xover (talk) 09:49, 29 December 2022 (UTC)
Looks to be all here! Thank you!--RaboKarbakian (talk) 16:23, 29 December 2022 (UTC)