Wikisource:Administrators' noticeboard

From Wikisource
(Redirected from Wikisource:AN)
Jump to navigation Jump to search
Administrators' noticeboard
This is a discussion page for coordinating and discussing administrative tasks on Wikisource. Although its target audience is administrators, any user is welcome to leave a message or join the discussion here. This is also the place to report vandalism or request an administrator's help.
  • Please make your comments concise. Editors and administrators are less likely to pay attention to long diatribes.
  • This is not the place for general discussion. For that, see the community discussion page.
  • Administrators please use template {{closed}} to identify completed discussions that can be archived
Report abuse of editing privileges: Admin noticeboard | Open proxies
Wikisource snapshot

No. of pages = 3,243,046
No. of articles = 822,688
No. of files = 19,747
No. of edits = 10,592,345

No. of pages in Main = 488,602
No. of pages in Page: = 2,337,381
No. validated in Page: = 461,948
No. proofread in Page: = 763,627
No. not proofread in Page: = 907,581
No. problematic in Page: = 33,281
No. of validated works = 4,275
No. of proofread only works = 3,326
No. of pages in Main
with transclusions = 280,621
% transcluded pages in Main = 57.43
Σ pages in Main

No. of users = 2,966,152
No. of active users = 373
No. of group:autopatrolled = 471
No. in group:sysop = 26
No. in group:bureaucrat = 2
No. in group:bot = 22

Checkuser requests[edit]

  • Wikisource:checkuser policy
  • At this point of time, English Wikisource has no checkusers and requests need to be undertaken by stewards
    • it would be expected that requests on authentic users would be discussed on this wiki prior to progressing to stewards
    • requests by administrators for identification and blocking of IP ranges to manage spambots and longer term nuisance-only editing can be progressed directly to the stewards
    • requests for checkuser

Bureaucrat requests[edit]

Request for interface admin rights[edit]

  • @Hesperian:, @Mpaa: I'd like to ask for interface admin right so I can lend a hand untangling some of the CSS and JS "history". For example, I'd like to make this fix: MediaWiki_talk:Mobile.css, as well a few tidy-ups for gadgets and assisting with the vestiges of the main page CSS makeover. I already have 2FA enabled. I'm not sure what time-boxing is deemed appropriate for such activities, I have no preference and I don't mind re-applying if it lapses before everything is done. Inductiveloadtalk/contribs 16:38, 10 July 2020 (UTC)
    • I don't think there's much policy / established process for this kind of thing, but you're a long-time trusted admin so... granted for six months. If there is still an interest and need after that, I suggest granting ad-hoc for another six months, and then folding it into your annual admin confirmation. Hesperian 02:49, 11 July 2020 (UTC)
      • Thank you! Inductiveloadtalk/contribs 20:46, 11 July 2020 (UTC)
      • Just for future reference, there kinda sorta is an established process; that's been kinda sorta used twice (including this time; the previous was Billinghurst in September 2018). The discussions from which it is derivable are:
    The summary is that rights are requested somewhere here on WS:AN, and granted by local `crats analogously to other such temporary rights. We don't actually have any facility for permanently granting the right, but I don't imagine anyone will object in practice if you roll it into Inductiveload's next confirmation. I am a little worried that we only have one active `crat right now, and no formal written policy to point the Stewards at if we need them, but it seems to be as much formal stuff as the community has the appetite for so far. --Xover (talk) 06:55, 16 July 2020 (UTC)

Bot flag for User:350bot[edit]

See User talk:Suzukaze-c#350bot and Wikisource:Scriptorium/Archives/2020-07#User:350bot. Suzukaze-c (talk) 23:38, 17 September 2020 (UTC)

Sorry, I have been away too long, I am not updated and I will not be able to be around to much for a while, 2nd opinion welcome. IMO, no need for a special bot for such a small task.Mpaa (talk) 20:23, 19 September 2020 (UTC)
@Mpaa: The way I understand it (I could be wrong), Suzukaze-c is actually just requesting permission to mass-create those subpages and wants a bot flag for the operation in order to not flood recent changes. I have not understood this to be a proposal for a specific ongoing bot, and would presume removing the flag once the pages have been created to be ok. --Xover (talk) 18:41, 20 September 2020 (UTC)
@Suzukaze-c:, I have given you the bot flag to your account for this task. Please try to do small batches at first, just to make sure the outcome is as wanted. Let us know when you are done, so we can revoke the flag. 20:55, 20 September 2020 (UTC)Mpaa (talk)
I have set one month expiry, hope it is enough and ok for the community. Mpaa (talk) 20:57, 20 September 2020 (UTC)
@Mpaa: Lovely, thank you. Suzukaze-c (talk) 22:00, 20 September 2020 (UTC)
(And I check all output because I don't trust my code, BTW. So no worries there. Suzukaze-c (talk) 22:25, 20 September 2020 (UTC))
@Mpaa: Yes check.svg Done . Suzukaze-c (talk) 05:29, 26 September 2020 (UTC)

Page (un)protection requests[edit]

Request protection of Main Page templates[edit]

According to the very first point under Wikisource:Protection_policy#Special_cases “The main page should always be protected…”, yet this edit took place today. Some care and attention, please? (Normally Phe-bot is the sole updater of Template:ALL TEXTS!) 17:57, 15 October 2018 (UTC)

Thanks. I have place soft protection on the page. — billinghurst sDrewth 20:09, 15 October 2018 (UTC)
Pictogram voting comment.svg Comment To fellow administrators, I have up'd the protection on a couple of templates that won't need updating. I have a question about Template:Highlights, should this be sitting at semi/soft? If we are unlikely to change it, then we should be protecting it further. — billinghurst sDrewth 20:27, 15 October 2018 (UTC)
Nobody has edited that template since 2013, so it doesn't get changed much. I have therefore upped the protection to +sysop. If anybody disagrees feel free to undo. --Xover (talk) 09:57, 25 July 2020 (UTC)


Resource Loader issue needs outside guidance[edit]

The more I read up on this RL change and the subsequent actions needed (or taken?), the more I get the feeling some of my approach to site wide & gadget .js/.css organization over the months is going to behind this week's latest problems. If that winds up to be the case, then I'm truly, truly sorry for that. Let me try to document those steps and the reasoning behind them in hopes someone (@Krinkle:) can made sense of our current state and put us on the right path post RL change(s).

Originally, we not only had a ridiculous amount of scripting and .css definitions in our primary site-wide MediaWiki files to begin with but also called a number of stand-alone .js/.css files within those primary MediaWiki files called unnecessarily in addition to calls to various sub-scripts on top of any User: selected gadgets being called -- some of which eventually became default loaded per concensus, etc..

A simple depiction of the key files mentioned minus any Gadgets basically went like this...

Over several months with help of other folks, I began to consolidate and/or eliminate as much scripting calls as I could -- creating optional Gadgets whenever possible -- and tried much the same for the .css class definitions. The rationale behind doing this can be found in several places, most importantly: Wikipedia. The premise to keep the MediaWiki site-wide files "lean" goes like this....

 * Keep code in MediaWiki:Common.js to a minimum as it is unconditionally
 * loaded for all users on every wiki page. If possible create a gadget that is
 * enabled by default instead of adding it here (since gadgets are fully
 * optimized ResourceLoader modules with possibility to add dependencies etc.)
 * Since Common.js isn't a gadget, there is no place to declare its
 * dependencies, so we have to lazy load them with mw.loader.using on demand and
 * then execute the rest in the callback. In most cases these dependencies will
 * be loaded (or loading) already and the callback will not be delayed. In case a
 * dependency hasn't arrived yet it'll make sure those are loaded before this.

The result of that effort as it stands today can be depicted basically like this....

The predominant change in order to move towards the previously cited rationale & approach is that the bulk of the scripting and class definitions now reside in the default-enabled Site gadget files, MediaWiki:Gadget-Site.js & MediaWiki:Gadget-Site.css. And by no means is the current state the desired final approach; its been a work in progress as time allowed over several months.

Obviously, now with the recent change to Gadgets and ResourceLoader, either the existing rationale or my attempts (or both) are no longer in harmony -- if they ever were. In my view, we need someone like Krinkle (or maybe the collective minds of Wikitech-l?) to take the time and attention needed to come in here and straighten all this out -- one way or the other. My gut tells me THAT will resolve the reported loss of one thing or another post-RL change(s). Again, if I'm right about my actions exacerbating problems for other, I apologize and take full responsibility. -- George Orwell III (talk) 20:54, 8 August 2015 (UTC)

I've made a few minor changes in addition to yours that hopefully make things work a bit more like you intended. I'm happy to provide further guidance but that probably works better for a more specific need or question. Perhaps bring it up on Wikitech-l or on IRC so we I can help you move forward with any unresolved issues. Krinkle (talk) 21:37, 20 August 2015 (UTC)

Interface administrators[edit]

Hi. Please see I do not remember if this was already discussed and how it is going to be addressed. Comments and suggestions welcome. Pictogram voting comment.svg Comment As far as I am concerned I would trust any admin who feels skilled and confident enough to tackle such edits.— Mpaa (talk) 21:05, 29 October 2018 (UTC)

I can handle the technical aspects of it. However, it can take me a while to get around to tasks that take longer than a few minutes, so I don't want to create a false expectation of being able to handle time sensitive matters on my own. —Beleg Tâl (talk) 02:35, 30 October 2018 (UTC)

We should decide how to address the fact that EnWS has no m:interface administrators. I see basically the following options. Please add/amend as you feel appropriate.

Option A - Assign right on demand when needed

Option B - Assign right permanently to willing Admins, to be reviewed in the confirmation process

As I said above, I am for the simplest one.— Mpaa (talk) 21:28, 30 October 2018 (UTC)

Option C - Assign right permanently to selected Admins, after approval process, to be reviewed in the confirmation process

Option C sounds like you're being volunteered (based on the lack of the word 'willing'). ;) --Mukkakukaku (talk) 06:27, 31 October 2018 (UTC)

Option D - assign the rights to all the admins, who have already been vetted for community approval, and then whoever has the ability and desire can make use of it as they will and as needed. —Beleg Tâl (talk) 13:33, 31 October 2018 (UTC)

Option D would make the most sense for us. For anyone to get themselves to the point that we trust them with the admin tools just so that they can mess around in the interface, they would be playing a very long game. Beeswaxcandle (talk) 22:05, 2 November 2018 (UTC)
I agree with Beeswaxcandle, Option D, although I would also be fine with the right only going to admins who express an interest. BD2412 T 23:00, 2 November 2018 (UTC)
It is so rare I disagree with Beeswaxcandle but this must be one of those times. The whole point of this change is to prevent the ignorant from accidentally screwing up - insulting as the implications undoubtedly are! As such under the new regime trust is no longer enough; perhaps somebody ought to draw up some kind of eligibility examination…? 23:03, 2 November 2018 (UTC)
That hasn't been an issue for us yet, and accidental changes are easily reversed. If we had more users it would be more of a problem, but as it stands this kind of distinction is more cumbersome than helpful in my opinion. —Beleg Tâl (talk) 00:08, 3 November 2018 (UTC)
As much as I like the idea of making all existing admin interface admin, IA were separated from regular adminship specifically to reduce attack surface(from hackers), and it was pretty dangerous if the access fell into the wrong hand, I'd rather propose having existing admin request right from bureaucrat and could be granted at the bureaucrat's discretion, and should be automatically removed if no action after two month.Viztor (talk) 02:13, 10 August 2019 (UTC)
  • Pictogram voting comment.svg Comment we discussed it when the rights were split, and it was agreed that it could be assigned on a needs basis. That has been done at least once for me with the temporary assignation of the IA rights. — billinghurst sDrewth 05:58, 10 August 2019 (UTC)
    Note that WMF Legal requires 2FA to be enabled for users who are to be assigned this right, so bureaucrats will have to verify this before doing so. MediaWiki's 2FA implementation is also sufficiently finicky that one may not want to enable it without proper consideration. --Xover (talk) 08:21, 10 August 2019 (UTC)
    What's wrong with the 2FA implementation? I haven't had any issues with it at all. —Beleg Tâl (talk) 22:17, 10 August 2019 (UTC)
    Ah, sorry, I should have been more clear. I am going on hearsay, mostly from admins on enwp (a crotchety bunch if ever there was one), and my own assessment of the documentation at meta. The main complaints are that the implementation in general is a little bit primitive (as is to be expected since WMF rolled their own instead of federating with one of the big providers), and that there is no way to regain access to your account if something goes wrong with the 2FA stuff (if your phone is stolen etc.) unless you happen to know one of the developers personally. None of these are in themselves showstoppers, and many people are using it entirely without issue. The phrasing sufficiently finicky that one may not want to enable it without proper consideration was not intended to discourage use, but merely to suggest that it is worthwhile actually giving it a little thought before requesting it be turned on. --Xover (talk) 17:52, 11 August 2019 (UTC)
    Okay, gotcha. As it happens, Wikimedia 2FA does include emergency access codes for use when your phone is unavailable. —Beleg Tâl (talk) 19:56, 11 August 2019 (UTC)

Formal requirements related to 2FA[edit]

Picking up this again…

I finally got so annoyed by our inability to fix even simple stuff stuff that requires Interface Admin permissions that I hopped over to meta to figure out what the actual requirements are (versus the should stuff). As it turns out, the 2FA stuff is (surprise surprise) as half-baked as most such Papal bulls from the WMF: 2FA is required for intadmin, but there is no way for bureaucrats to actually check whether an account has that enabled. The result of this is that even on enwp (where they take this stuff really seriously) they do not actually try to verify that 2FA is enabled before they hand the permission out: they check that the user is in the right group so that they can turn on 2FA, remind the person in question of the requirement, but otherwise take it on faith (trust). There's a request in for the technical capability to verify 2FA (and I think Danny is even working on it), but it seems mostly everyone's waiting for 2FA to be enforced by the software.

Meanwhile, anyone with existing advanced permissions (i.e. +sysop) have the capability to enable 2FA, and anyone with a particular reason (e.g. that they need it to get Interface Administrator permission) can apply to be a "2FA Tester" and thus gain the ability to turn it on.

The net result is that our bureaucrats (ping Hesperian and Mpaa) can assign this permission so long as we somehow somewhere make at least a token effort to make sure those getting the bit have 2FA enabled. Whether that's an addition to, or footnote on, Wikisource:Adminship, or the bureaucrats asking/reminding the user when it comes up, or… whatever… I have no particular opinion on. Since the previous community discussions have been actively adverse to regulating this stuff in detail, and absent objections, I think "Whatever Hesperian and Mpaa agree on" is a reasonable enough summary of consensus.

I still think we should have an actual policy for Interface Administrators (or section on it in Wikisource:Adminship) and some facility for permanently assigning the permission (ala. +sysop; but intadmin tasks are not one-and-done like +sysop tasks, they often require iterative changes over time and need to fit into a overall architecture), but so long as there is no appetite for that, something that we can point to and say "That's how we handle the 2FA requirement" if the WMF should ever come asking. --Xover (talk) 07:37, 10 February 2020 (UTC)

Judicious cleaning required from Special:UnusedFiles[edit]

I was just poking my head into Special:UnusedFiles. There are a significant number of images that utilise {{raw page scan}} that should be checked and if truly unused, we can delete as the file has been transwiki'd to Commons. And I note that always physically check their usage as I have previously seen that the NOT USED assessment is not always accurate.

Checking and deleting process:

  • Use the Page:… link
  • At Page:… check that there is a Commons loaded image in place (and no use of template:raw image)
  • grab the new filename
  • click back to the local image, delete the image, noting "File transwiki'd" and paste in the new filename (preferred not mandatory)

If admins could do 10 to 20 a session, we should get through them in a month or so. — billinghurst sDrewth 09:26, 14 November 2019 (UTC)

Pictogram voting comment.svg Comment TIP: when doing the image check you can even take the time to validate proofread page with the image (very often sittin gin proofread status). — billinghurst sDrewth 09:31, 14 November 2019 (UTC)
@Billinghurst: Considering the list is maxed out at 5k, so there's no telling how many more of them there are (we have over 20k files tagged as raw page scans, possibly more that are untagged), it's highly unlikely we'll get through that in a month. But it's certainly something we need to start chipping away at.
And we should possibly even start considering more drastic measures, like periodically bot-deleting anything in Category:Raw page scans for missing images that isn't used anywhere (including inbound links). The manual processing is tedious and time-consuming, and provides very little additional value compared to an automated approach (linking to the replacement image in the deletion log, mostly, and that has marginal value at best). We'd need to check closely whether the category contains files that could be caught as false positives in such a run, but barring such pitfalls automation may be both the best option and the only realistic way to ever clear out this backlog (we have plenty of other image-related backlogs where human attention is necessary).
Oh, PS, DannyS712 has a neat user script at User:DannyS712/Change status.js that makes cases such as this a lot quicker. I'm not sure they consider it ready for prime-time (I don't think it's been advertised anywhere), so caveat emptor, but I've been using it a good bit today and seen no problems. To use, add importScript('User:DannyS712/Change status.js'); to your common.js. --Xover (talk) 20:18, 14 November 2019 (UTC)
If you want to use the script, it adds a link to the function (next to the "move" function) that will, if the page is "not proofread" or "problematic", mark it as "proofread". If it is already "proofread", and the user can mark it as validated, it marks it as validated instead. Let me know if there are any questions. Thanks, --DannyS712 (talk) 20:34, 14 November 2019 (UTC)
@Billinghurst: I agree with @Xover: that this kind of tasks should be automated.Mpaa (talk) 21:29, 15 November 2019 (UTC)
Pictogram voting comment.svg Comment At this point I would think that the task is to start to chip away. I don't see that there is urgency in cleaning this space, so as long as we start. So what if it takes three months, heck I have works that I dip in and out of for years. As I said I have seen multiple issues of the tool being wrong in the past, if we can demonstrate that this is no longer the issue, then maybe we can look to bot removal. I though the admin review, and process of validating was beneficial.

P.S. Those quiescent admins, and those who find it hard to identify tasks to undertake are given a gift here! — billinghurst sDrewth 22:50, 14 November 2019 (UTC)

Having done about a hundred of these by hand… I'd say the realistic best case sustained rate here is something like 5 admins doing 5 files per day for 5 days a week. That's an aggregate rate of about 500 per month. If the number is 5k that means 10 months to get through it. If the number is 20k that's 40 months, or just shy of 3.5 years. I don't have sufficient data for an accurate estimate of net time, but assuming a range of 30-60 seconds per file, at 5k files that's an aggregate ~40–80 net admin-hours expended. At 20k files that's ~160–320 net admin hours. Assuming an 8 hour work day, that's one dedicated admin working flat out only on this for between one week (5k/30s) and 2 months (20k/60s). With no lunch break, by the way. That's a pretty high cost.
On the upside we have tagging the logs for deleted files with a link to the replacement images. But that only matters if you're actually looking at the deleted file, and for these raw page scans that is essentially never going to happen. Having a human in the loop also helps guard against Mediawiki bugs in categorising etc., but while, yes, that does happen, it's been years since I've run into that kind of bug anywhere that would matter here. What usually happens is that counts and references fail to update properly when pages are deleted, so you get categories saying they have members, but in reality the relevant items have already been deleted; and these eventually get cleared out by periodic maintenance tasks.
In other words, doing this manually is expensive and with a significant opportunity cost, and without a concomitant value. Automating it obviously carries risks (automatically deleting up to 5-20k files should never be done lightly). But with appropriate checks—for example, all files listed on Special:UnusedFiles who are also in category Raw page scans and who have no incoming links in WhatLinksHere—manual spot checks, and going in batches… the risk should be eminently manageable. --Xover (talk) 09:16, 16 November 2019 (UTC)
I can work out shortly a script that can scan category Raw page scans and checks for the conditions for deletion (and in case deletes). If you are OK to test small batches, let me know.Mpaa (talk) 14:52, 16 November 2019 (UTC)
@Mpaa: Can you have it run up a list of files and dump it in a sandbox somewhere so we can spot check the logic? Maybe a hundred or so files that the script thinks should be deleted, and, if relevant, the ones it thinks shouldn't be. Better to find any holes in the logic before we start deleting stuff. --Xover (talk) 15:10, 16 November 2019 (UTC)
Here: res sandbox.Mpaa (talk) 16:36, 16 November 2019 (UTC)
@Mpaa: Excellent! I've spot-checked pages from most of the works represented in that list and found none incorrect. I'd have no objection to running that (in batches so it can be checked; there're bound to be some pathological edge-case out there somewhere). --Xover (talk) 17:12, 16 November 2019 (UTC)
@Xover: I have done a small test batch of 45 pages as Mpaa.Mpaa (talk) 18:29, 17 November 2019 (UTC)
@Mpaa: Ok, I've spot-checked 2–3 files from each work in those 45, and find no real problems. The only issue I see is that the deletion log for File:A book of the west; being an introduction to Devon and Cornwall.djvu-453.png links to c:File:A book of the west; being an introduction to Devon and Cornwall.djvu instead of c:File:A Book of the West - ALMS HOUSES, S GERMANS.png; and ditto for File:A book of the west; being an introduction to Devon and Cornwall.djvu-223.png that points to c:File:A book of the west; being an introduction to Devon and Cornwall.djvu instead of c:File:A Book of the West - LAKEHEAD, KISTVAEN.png. --Xover (talk) 21:28, 17 November 2019 (UTC)
@Xover: Thanks, I fixed it, I ran another ~40 pages.Mpaa (talk) 18:33, 18 November 2019 (UTC)
I'm wondering if we can set up a scrolling gallery that does nothing but compare our page image side-by-side with the comparable Commons file. An editor could scroll through and eyeball any differences fairly quickly. BD2412 T 22:52, 18 November 2019 (UTC)

Poking at this again…

I found no problems with Mpaa's test bot run, and we still have potentially ~20k files sitting there that it would be a waste of admin resources to process manually. Can we pull the trigger on a mass delete of these? If not, what are the concerns? --Xover (talk) 08:26, 18 February 2020 (UTC)

I ran about 15 pages to check that everything is still OK. Mpaa (talk) 21:47, 18 February 2020 (UTC)
Watch out for files unacceptable on Commons, especially if still copyright-restricted at home.--Jusjih (talk) 05:16, 17 June 2020 (UTC)

Title blacklist updated to prevent invisible characters in page names[edit]

Based on this discussion I have added rules to the title blacklist to prevent the creation of pages with invisible Unicode characters in the name. Users hitting this rule should see the custom error message at MediaWiki:titleblacklist-invisible-characters-edit. --Xover (talk) 09:05, 5 December 2019 (UTC)

Should the “Emojis, etc. Very few characters outside the Basic Multilingual Plane are useful in titles” section be there? First, it’s clear that MediaWiki:titleblacklist-invisible-characters-edit is not an helpful message in that case. Secondly, there seems like some quite useful characters there; the Mathematical Alphanumeric Symbols is basically just so we can include mathematical titles in a plain text format. As for the rest of them, if we allow Chinese, we should allow all of Chinese; if we do enough academic work, we’re going to have Hieroglyphics and ancient Chinese in article titles.--Prosfilaes (talk) 11:26, 5 December 2019 (UTC)
@Prosfilaes: In page names? Where we recently had community consensus to not even permit curly quotes (as part of the discussion to permit them in page content) because that’d be too fiddly? In any case, both the error message and the rules can be tweaked if needed. The current rules are an attempt to prevent stuff like WORD JOINER and friends that are invisible and cause issues for people trying to work with such pages. I (currently) have no particularly strong opinions on the issue above a vague inclination towards limiting page names to roughly ASCII (on enWS, of course, other language projects have different needs). --Xover (talk) 11:52, 5 December 2019 (UTC)
What’s magical about page names? Page names need to match the names of the works they contain. Curly quotes are a special case; note that there was no restriction on character in pages, except for curly quotes. If an article title is "Čapek’s works in English” or “Injections from 𝕎 to 𝕁", then why should the page name be any different?--Prosfilaes (talk) 13:55, 5 December 2019 (UTC)
Page names have technical and practical concerns (peoples' ability to enter them, display, them, search for them, etc.) that means we should constrain them in some fashions; and at the same time we already do stuff like drop “The” from page titles to facilitate automatic sorting (which I actually disagree with, but that’s neither here nor there). The characters we disallow in the current rules are also exceedingly rare in practice, and the blacklist can be overridden by any admin at need, so I don’t think it is a problem we should expend too much effort on until and unless we start seeing actual cases where it causes problems.
Č (U+010D: LATIN SMALL LETTER C WITH CARON), is in the Latin Extended-A block which is a part of the Basic Multilingual Plane which the current ruleset lets through. 𝕎 (U+1D54E: MATHEMATICAL DOUBLE-STRUCK CAPITAL W) and 𝕁 (U+1D541: … J) are part of the Mathematical Alphanumeric Symbols block of the Supplementary Multilingual Plane (most common mathematical symbols are in the BMP; these are essentially font variations: bold, italic, fractur, etc.) that contains all the exotic and ancient stuff (Linear B, Coptic, Hieroglyphs, etc.) plus a good chunk of Emoji, so they would be disallowed by the current rules but we can whitelist ranges if we discover that we need them.
That all being said, I am by no means married to the current rules so whatever is the consensus is is fine by me; and I would, of course, be happy to help implement whatever that consensus is if needed. Most of the examples you mention above (the extended maths stuff, hieroglyphics, etc.) are contained in distinct blocks (Emoji aren’t, and I believe Chinese is also split up in inconvenient ways when you want to handle everything, but most of the rest) that should be relatively straightforward to whitelist. --Xover (talk) 13:32, 6 December 2019 (UTC)
I agree that "it is {not} a problem we should expend too much effort on until and unless we start seeing actual cases where it causes problems." I think that we should remove the restriction; it is easy to deal with a few poorly named or spammish pages on patrol, and bad to frustrate innocent users with a misleading and likely irrelevant error message.--Prosfilaes (talk) 01:16, 10 December 2019 (UTC)

Request for an interface admin to edit MediaWiki:Gadget-ocr.js[edit]


Since hOCR is currently buggy, please see mul:Wikisource:Scriptorium#Request_for_an_interface_admin_to_edit_MediaWiki:OCR.js : you can edit your local gadget to use fallback OCR as default one, just commenting the if condition in hocr_callback() function. —Pols12 (talk) 12:58, 21 December 2019 (UTC)

Speedy policy on raw OCR text?[edit]

There are currently 30 pages in Category:Speedy deletion requests, that are posts of raw OCR, requested for deletion by someone (@Ratte:) who wants to recreate them properly. This seems reasonable to me: posts of raw OCR are generally unhelpful, and doubly so if they are interrupting the workflow of someone who wants to proofread.

Does this fall within a speedy delete criterion? Maybe "process deletion"? If so, are edits needed to make that explicit?

(I'm not sure I would support a 'nuke them from orbit' approach to raw OCR posts; I'm just talking about situations like this where the pages are interrupting a workflow)

Hesperian 00:07, 4 March 2020 (UTC)

But is that the raw OCR or the text layer of the file? If it's the text layer in the file, then exactly the same text will show up upon recreation of the page. --EncycloPetey (talk) 00:58, 4 March 2020 (UTC)
In the past when I've deleted pages to restore the text layer, I've used M1-Process deletion. However, @EncycloPetey: is correct, this is the text layer, so there's no point. Beeswaxcandle (talk) 05:27, 4 March 2020 (UTC)
I have scripts that only trigger when creating a page. I could tweak them, sure, but on the rare occasion when I hit a text layer dump, it's easier to delete and re-create the pages. So I am open to the possibility that these pages are interfering with Ratte's workflow.
At this point I am inclined to action the deletions under M1, but leave the policy as it is.
Hesperian 23:40, 4 March 2020 (UTC)
@Ratte: Are you using special scripts, or just working from the existing text layer? Your input would be welcome. --01:13, 5 March 2020 (UTC)

Pictogram voting comment.svg Comment These are "not proofread" pages without formatting just delete them, what is the loss? How it is different for a new version uploaded or anything similar where we just man-handle the pages to meet needs? — billinghurst sDrewth 04:55, 5 March 2020 (UTC)

(None of the pings were successful, I've seen this discussion by chance). This is my usual practice since ru.wikisource: I nominate for SD non-proofread pages created by other user and then recreate it myself with proof-reading. Why? Because there's no any warranty that other user has created them without losing text, that's all. Maybe it's the text layer of the file; maybe not, you cannot know. It's just for ease in work. EncycloPetey has rejected my nominations, so there's no any subject for discussion. Thanks. Ratte (talk) 11:32, 5 March 2020 (UTC)

There's no guarantee that recreating the text layer will have all the text either. All proofreading should be done by comparing the scan with the edited text, regardless of its source. You cannot rely on the text layer to have all the text, the correct punctuation, or anything. If you are simply using spellcheck, and not comparing against the original, then you are not proofreading. --EncycloPetey (talk) 01:27, 6 March 2020 (UTC)
Does my contribution lead to the conclusion that I am simply using spellcheck, and not comparing against the original? I just wanted an original raw material (without possible outside interference) for further proofreading. It’s sad that I couldn't get any comprehension and help. Ratte (talk) 07:32, 6 March 2020 (UTC)
@Ratte: I feel your frustration. I also find such "not proofread" pages that are just raw OCR dumps a hindrance to proofreading, and see little if any value in them. But the issue is that we do not have a policy to directly address this specific issue. Nowhere that I have found (not even in help pages or style guidance) do we discourage or prohibit these, and there are long-standing community members that have the, at least occasional, practice of doing so (their motivation is incomprehensible to me, but that may just be my failing). Absent that it is not clear that administrators actually are permitted to delete such pages. We sometimes play fast and loose with such strictures when that seems to benefit the project, and the community has generally supported that, but there are limits to how far we can stretch that, and, speaking only for myself, this is an instance that would have given me pause (but I really wouldn't have batted an eye if someone had deleted them either). For that reason I would argue that we should have a policy addressing this, but that would take a community discussion and a formal proposal that may be felt to be more effort than the issue is worth. In any case, frustrating as it may feel, there's actually a reason why you didn't get the help you needed. Please do not be discouraged by this outcome and hopefully we'll do better the next time! --Xover (talk) 10:51, 6 March 2020 (UTC)
Thanks. It's just technical deletions — pages that needs to be deleted to perform non-controversial technical tasks. I am surprised that administrator’s decision is not enough for this. Ratte (talk) 12:03, 6 March 2020 (UTC)

Category:WikiProject NLS[edit]

A large number of indexes in this hidden category are being marked as “validated,” even though some pages have not been validated. I have fixed some of them, although it appears there are many more that still have this problem. It would be more helpful for an administrator to notify the editors involved. TE(æ)A,ea. (talk) 21:51, 23 June 2020 (UTC).

@TE(æ)A,ea.: I'm not sure this is something that particularly requires an administrator, but…
I took a look at the first ten indexes in this Petscan search (intersection of indexes that are in Category:WikiProject NLS and Category:Index Validated) and of these 7 had one or two pages marked as "Problematic" due to a missing image, but were otherwise all Validated. Most of the indexes were marked as validated by Kathleen.wright5, and one by Annalang13 at the NLS.
Kathleen: I'm not sure I see the reasoning behind marking these as Validated. Could you elaborate?
Courtesy ping and a question to @Gweduni: is this part of the workflow you had sketched out with Beeswaxcandle? And what are your plans for these images? --Xover (talk) 04:12, 24 June 2020 (UTC)
I thought that it should receive the attention of an administrator, at least. The indexes I found and fixed, perhaps 60 or so, were just those which were recently added to Category:Index Validated. I fear this may be similar to the problem with the Indonesian Wikisource works, where the editors involved are not proofreading/validating a page to the standards required on Wikisource. The main reason I mentioned this here was so that an administrator could correct these changes, and notify the editors who made the mistakes—it would be more appropriate for an administrator to do such, rather than a normal editor such as myself. TE(æ)A,ea. (talk) 14:30, 24 June 2020 (UTC).
we never discussed the image element as part of the original workflow - my understanding was that we could validate and transclude without including the images - it wasn't brought up so I suppose we assumed it was not a critical part of the process and more of a nice extra (we don't have any resource to add these in retrosepctively, so not sure what the best approach is now) Gweduni (talk) 12:40, 26 June 2020 (UTC)
@Gweduni: Individual pages can be progressed through Proofread to Validated independently, and works can be transcluded before they are complete, but the work as a whole (through the status on the Index: page) should not be marked as Validated until all its component pages have been Validated. That in particular goes for pages marked Problematic due to missing images. And there are several reasons for this, but perhaps most apposite here is that when marked as fully validated the work disappears from maintenance backlogs so there is little chance they will ever get finished, much less in a systematic way. --Xover (talk) 13:01, 26 June 2020 (UTC)
@Xover: OK, so for now we can keep going as we are as long as we don't set the status on the index page to Validated (for the work as a whole)? Gweduni (talk) 13:05, 26 June 2020 (UTC)
@Gweduni: Without going into nuances or the inevitable exceptions… In principle the Progress field of an Index: should not be set to "To be validated" until all its Page: pages have been Proofread, and it should not be set to "Done" until all pages are Validated. But the most critical part is that the Index: isn't marked as "Done" until the work is actually finished. --Xover (talk) 13:13, 26 June 2020 (UTC)
Also some works like Index:Story teller (4).pdf are marked validated, but not all the pages are validated, and the contents page probably isn't even at proofread level, despite being marked so. It's OK if works don't get validated, they can still be "complete" and marked "proofread". It's less ideal if incomplete works are marked validated without actually being validated. Inductiveloadtalk/contribs 10:10, 24 June 2020 (UTC)
not sure what is wrong with the contents page above? Gweduni (talk) 12:40, 26 June 2020 (UTC)
@Gweduni: At the time Inductiveload posted the above, the page in question looked like this. --Xover (talk) 12:54, 26 June 2020 (UTC)
ah, can see the problem - will feed back to the team Gweduni (talk) 12:59, 26 June 2020 (UTC)
Hi, I see from the discussion above this work shouldn't be marked as 'Done' anyway due to the 'Missing Image', but just wanted to apologise for this slightly different issue, this one's my fault. I marked the work as 'Done' while I was editing the index to create the main-page transclusion link, as I wanted to check how things would look when transcluded, and to make sure the sub-page links on the contents page were correct. I didn't realise it would cause a problem and look like these pages hadn't been worked on, sorry, won't do it again! --Annalang13 (talk) 14:48, 26 June 2020 (UTC)
@Annalang13: There's no real harm done, as long as the pages get done eventually (and that is certainly not a criticism anyone can level at NLS, the amount of completed work is, as they say, uh-may-zing). It's just in general, a "proofread" page that's not actually proofread may well loiter in that state for years or practically forever, and not even show in the immense backlog of pages that need proofreading.
FYI, there's no technical requirement for an Index to be "done", or for individual pages to be "proofread" to be able to transclude to mainspace. The "rules", such as they are, say pages that aren't "proofread" or above shouldn't generally be transcluded, but that is a process thing and not for any technical reason: they're still just pages. A not "proofread" page generally consists of raw OCR or is otherwise not presentable in mainspace, but that's not always true. There are times it's fine to transclude a "not proofread" or "problematic" page, for example if something is just missing and there's no alternative source, or if only one article on a page is proofread. Inductiveloadtalk/contribs 18:16, 26 June 2020 (UTC)

Marking some more users as Autopatrolled[edit]

Some good candidates for being marked as autopatrolled include:

I don't know, anybody who'd do this for fun bears watching... :-) Shenme (talk) 04:29, 28 June 2020 (UTC)

All four of them have recent edits that show clear understanding of how to contribute. Hopefully this is the right way to propose them for being autopatrolled. JesseW (talk) 22:10, 27 June 2020 (UTC)

X mark.svg Not done we allocate autopatrolled based on our criteria for knowledge of our editing in multiple namespaces, not a trust issue. Any autoconfirmed user here can mark something as patrolled, so that restriction is not an issue compared with other wikis. — billinghurst sDrewth 06:09, 13 August 2020 (UTC)

Time to prune the bot accounts?[edit]

The following accounts have the +bot flag but have not been active since ~2015, and have listed operators that have similarly not edited on enWS in the same period or are not so active here that it seems likely they will resume bot operations any time soon.

Since bot accounts' edits do not show up in recent changes, and are exempt from some restrictions on high volume editing, they are high value targets for hijacking (full list of permissions here). Inactive and possibly abandoned bot accounts are also high risk of actually being hijacked (for example, a user that has moved on from enWS sells an old computer where the bot account credentials are saved).

Bots who have not edited in 5+ years, and whose bot approval is consequently equally old, are also at significant risk of no longer being up to date with current standards and practices, and cannot safely be assumed to still have consensus for their task (the policy actually says these should have admin-style periodic reconfirmations, but, you know…).

I therefore propose that we prune these 7 bot accounts (of 22 total) by removing the +bot flag and blocking the accounts (with a suitable log message making clear that it is a preventative technical measure only and no form of indication the operator has done anything wrong).

  1. BenchBot (last edit: 2011-04-09) operated by Slaporte (last edit: 2016-07-03)
  2. CandalBot (last edit: 2014-01-15) operated by Candalua (last edit: 2018-12-12)
  3. DougBot (last edit: 2011-08-10) operated by DeirdreAnne (last edit: 2018-10-29)
  4. JVbot (last edit: 2011-04-23) operated by John Vandenberg (last edit: 2018-08-11)
  5. JackBot (last edit: 2014-09-17) operated by JackPotte (last edit: 2020-07-15)
  6. LA2-bot (last edit: 2012-03-05) operated by LA2 (last edit: 2020-07-15)
  7. Robbie the Robot (last edit: 2015-04-28) operated by AdamBMorgan (last edit: 2016-04-01)

Operators who are still active here or on other projects (e.g. JackPotte and LA2) and either have plans to resume bot operation or want to hold on to the account just in case (for example if the bot is used for ad hoc tasks) should comment to that effect here. I propose that for any operator that's sufficiently active and interested to respond here that should be sufficient grounds to leave the bot account active.

For any bot whose operator is not currently active and where the bot has not edited in ~5 years, I suggest we should require a quick recheck with the community (in WS:S#Bot approval requests) before resuming operations; but unblocking and re-adding +bot should otherwise be just a simple `crat request. Or put another way, it's the bot's actions that need rechecking, not the mere technical unlocking and adding the +bot flag. --Xover (talk) 09:48, 25 July 2020 (UTC)

(Last time we did this on WS:S.) 1 through 4 and 7 as bots of inactive users I would support the rights removal through inactivity. For 5 and 6, if operators say they expect to use their bot then the rights can be retained, otherwise in lieu of that comment, then remove the rights. — billinghurst sDrewth 06:01, 13 August 2020 (UTC)

Need help shifting group of pages[edit]

Hi there, I've been transcribing the Social Security Act 2018. Each time this act is amended, a new version is published. I had made these versions sub-pages of the parent act. i.e. Social Security Act 2018/Version 56 and Version 59.

I've been advised the appropriate way to format this would be something like Social Security Act 2018 (Version 56) as its own parent page. I've also been told that the best way to get this page move done is to post here as it's easier for an admin to shift all the necessary pages.

Is this something you can help with? Each version has its own sub-subpages. e.g. Social Security Act 2018/Version 59/Section 8 which should become Social Security Act 2018 (Version 59)/Section 8. I'm unsure if you need me to provide a full list of all sub-pages. But they can be found in the TOC on the Version 56 and Version 59 pages as the non-red links.

Thus far only Version 56 and 59 exist.

Appreciate any assistance you can provide. Supertrinko (talk) 22:39, 14 September 2020 (UTC)

Yes check.svg Done Pages and subpages have been moved. Re-directs have been supressed, so any links pointing to the old names will now be redlinks. Beeswaxcandle (talk) 03:13, 15 September 2020 (UTC)


Spam, please delete. Thanks! -- CptViraj (talk) 04:37, 26 September 2020 (UTC)

Yes check.svg Done Beeswaxcandle (talk) 07:43, 26 September 2020 (UTC)

Adolf Hitler[edit]

Three years ago the page Author:Adolf Hitler was protected from editing because of edit warring of a user who stopped contributing two years ago. Would it be possible either to end the protection or at least to allow e.g. autopatrollers to edit it? --Jan Kameníček (talk) 11:47, 7 October 2020 (UTC)

@Jan.Kamenicek: I've dropped it down to only autoconfirmed (vs. only admins) so we can see how it goes. But I'll have a hair-trigger on the fully protect button if it starts drawing edit warring again. @EncycloPetey: I modified the protection settings you set in 2016/2017. Please do revert if you disagree! --Xover (talk) 12:45, 7 October 2020 (UTC)

The Constitution of the Czechoslovak Republic locked[edit]

I would like to fix the link of Declaration of Independence of the Czechoslovak Nation by its Provisional Government to Declaration of Independence of the Czechoslovak Nation by Its Provisional Government in the header of The Constitution of the Czechoslovak Republic, but it is locked. Is it possible to unlock it? If not, can some admin fix the link, please?

Thanks. --Jan Kameníček (talk) 13:09, 14 October 2020 (UTC)

@Jan.Kamenicek: updated. Not sure what the correct protections should be for non-current FT's, admin-only seems a bit strong to me, especially since only the mainspace page is protected. Inductiveloadtalk/contribs 13:32, 14 October 2020 (UTC)
Thanks. As for the protection itself: The featured status of the work can be understood that the work was so well proofread that no other changes should be necessary. If so, it would make sense if the work in the page namespace were locked. But the main namespace page is never "finished" because there can be added newly created categories, portals, notes, links to other versions… and so it should not be locked for the sole reason of featured status. --Jan Kameníček (talk) 14:10, 14 October 2020 (UTC)

OCR change?[edit]

Has something changed with the OCR since yesterday? I’m getting alot of &#(number); showing up that weren’t there before. Especially &# 59; for semicolons. Cheers, Zoeannl (talk) 01:17, 15 October 2020 (UTC)

@Zoeannl: Which OCR button (we have two), and on which pages are you seeing this? --Xover (talk) 06:17, 15 October 2020 (UTC)
Oh, hmm, I see. It looks like this happens to the existing OCR text layer in at least DjVu files. Which means this is a change in how MediaWiki extracts the text layer when displaying it in the Page: namespace. In the case I ran across it is displaying   (that's the syntax for a HTML character entity, and 32 is the decimal code for the ASCII space character) on lines containing a single space character, but not for other instances of space characters (in running text say). This looks like a bug to me, and probably introduced in the scheduled MediaWiki release that should have rolled out yesterday. --Xover (talk) 06:38, 15 October 2020 (UTC)
I've filed a bug report for this at phab:T265571. --Xover (talk) 07:38, 15 October 2020 (UTC)
Thanks, it’s been really bugging me… Zoeannl (talk) 09:14, 15 October 2020 (UTC)
  • Just to give an update on this. The change was a rapid-response patch to address a reported security issue. After some discussion in the Phabricator task the current consensus seems to be that that particular security problem is not actually relevant for our use case. There is a patch in place that rolls back these changes, but the discussion hasn't quite concluded yet so it hasn't been pushed to the release train. Provided the current consensus stands up, the patch will eventually be pushed to the release train and then get rolled out to all the Wikisources in the weekly updates (which hits Wikisource roughly every Wednesday). This could happen as soon as next Wednesday, or it could happen in a later update cycle. It can technically also be pushed out out of cycle (think "emergency fix"), but I don't think that's currently in the cards. If I were to guess I would guess either this coming Wednesday or the Wednesday after. --Xover (talk) 08:14, 16 October 2020 (UTC)
    @Xover: is it worth making a quick default gadget or something to replace the codes in the wikitext in the meantime? Something like the Save-Load-Action gadget? Inductiveloadtalk/contribs 08:40, 16 October 2020 (UTC)
    @Inductiveload: I was planning to hold off until I got a better feel for the timeline. If it goes out with the next train or sooner there's probably no point; but if drags on…
    We know exactly what transformation is applied here (it's a limited subset of characters that are entity encoded) so reversing it should be reliable and relatively straightforward. If you have the time and inclination, go for it; at worst it'll be a useful experience and reference to have if we need something similar in the future. --Xover (talk) 08:55, 16 October 2020 (UTC)
    @Xover: I've cannibalised the Save-Load action script as MediaWiki:Gadget-T265571_fixer.js to do a set of replacements automatically, only in Page namespace. It's under page proofreading preferences, but it's not a default gadget. I'll only make it default if there's consensus to do so. Thoughts? Inductiveloadtalk/contribs 09:53, 16 October 2020 (UTC)
    @Inductiveload: Already testing it, and finding no problems. I also skimmed through the code and it looks good to me.
    This works around an immediate problem that is highly annoying to all users: so I'd say make it default, announce it, and we can drop it if anybody objects. Which I can't imagine anybody will unless a serious problem crops up. --Xover (talk) 10:00, 16 October 2020 (UTC)
    @Xover:, The only issue I can foresee is if someone has used an HTML entity on purpose and we nuke it. But probably highly unlikely in Pagespace. But actually, if we check wgCurRevisionId == 0, that should only apply the fix on pages that don't currently exist and load the OCR text? Inductiveloadtalk/contribs 10:06, 16 October 2020 (UTC)
    Announcement: Wikisource:Scriptorium#Gadget to resolve issues with HTML entities like ' in Page OCR. Feel free to make it clearer if it is not clear enough! Inductiveloadtalk/contribs 10:23, 16 October 2020 (UTC)
    @Inductiveload: I don't think that's necessary. I don't think I've ever ran across a raw HTML character entity reference in Page:, and we should not be using them so any occurrence will be an expression of a different problem that we should fix. I'm not 100% on the semantics of wgCurRevisionId == 0: it may be true in other cases as well. --Xover (talk) 10:27, 16 October 2020 (UTC)
    They can very occasionally be used for something like a quote or apostrophe at the start/end of an italic section which then becomes bold, but something like ''' works. Perhaps wgArticleId is better semantics, but I suspect they are the same. Inductiveloadtalk/contribs 10:38, 16 October 2020 (UTC)
    @Inductiveload: It looks like the revert was just deployed out of cycle, and my testing suggests this is now no longer an issue. --Xover (talk) 11:54, 16 October 2020 (UTC)
    Well, maybe a handful of page creations got the benefit in the meantime! :-D Inductiveloadtalk/contribs 12:04, 16 October 2020 (UTC)

In what way is (was) this an administrator issue? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:56, 16 October 2020 (UTC)