Wikisource:Bot requests

From Wikisource
(Redirected from Wikisource:BR)
Jump to navigation Jump to search
Bot requests

This page allows users to request that an existing bot accomplish a given task. Note that some tasks may require that an entirely new bot or script be written. This is not the place to ask for help running or writing a bot.

A bot operating performing a task should make note of it so that other bots don't attempt to do the same. Tasks that are permanently assigned or scheduled for long-term execution are listed on Persistent tasks. See also Wikisource:Bots.

Move all subpages of Who's Who in the Far East to use title case[edit]

I was informed by User:Beeswaxcandle that I should use title case instead of all caps in article names. So I request to move all subpages of Who's Who in the Far East to use title case. Although I can use a bot to move it myself, that would leave tons of redirects for admins to delete. But if an admin can easily batch-delete a list of pages, I can move it myself and then provide the list of pages to delete. I'm sorry for the inconvenience. Thanks, --Stevenliuyi (talk) 08:58, 6 May 2021 (UTC)[reply]

@Stevenliuyi: Please review the list at Wikisource:Bot requests/sandbox. I notice that there is at least one English name that needs to be fixed, and the Chinese names didn't convert on the regex that I used. Would you fix or create the target (only) in the list in the pair list, and I will get it done. No need to fix those that are broken though you should fix the previous/next links of the articles either side. To note that as I did for your other work, I will look to get a work specific template in place, though will do that afterwards. — billinghurst sDrewth 13:10, 24 May 2021 (UTC)[reply]
I suppose that I really to want to ensure that the Chinese names are capitalised properly. — billinghurst sDrewth 02:57, 25 May 2021 (UTC)[reply]
@Stevenliuyi and @Billinghurst: Has this request been actioned (i.e. can it be closed as resolved)? Xover (talk) 10:34, 10 April 2022 (UTC)[reply]
@Stevenliuyi: Please see Billinghurst's request (above) for quality control of the list of targets in Wikisource:Bot requests/sandbox. They have done the legwork to prepare for the move, but it is unable to progress until you've checked and corrected the target page names. Xover (talk) 05:33, 3 September 2022 (UTC)[reply]
@Stevenliuyi: This is blocked on your input here. Xover (talk) 10:19, 7 November 2023 (UTC)[reply]

Wikidata bulk edit[edit]

I made a query for works on enWS that have WD items with no "instance of" statement. The criteria I used are:

  • Pages in mainspace
  • No redirects or disambiguation pages (this includes Versions and Translations btw)
  • Does not contain a forward slash in the page name (in order to exclude subpages)
  • Is linked to Wikidata, and linked Wikidata item does not have a P31 statement

This query returns 13889 results, which is more than even QuickStatements can handle. Would it be possible for a bot to update these Wikidata items with P31=Q3331189 (instance of = version, edition, or translation)?

Thanks :) —Beleg Tâl (talk) 13:22, 1 November 2021 (UTC)[reply]

I think we could be more specific for certain groups, e.g I have addressed "Presidential Radio Address" articles as "instance of speech". There are several groups of articles that can be identified and then addressed with QuickStatements. After that, the bot can be run on what is left. Mpaa (talk) 23:13, 1 November 2021 (UTC)[reply]
@Mpaa: Except they are editions as we host them, the speech would be the parent to the item, per d:WD:Books as there may be other published editions of the same speech. — billinghurst sDrewth 12:17, 5 September 2022 (UTC)[reply]
@Billinghurst I see. I saw other were linked that way and I followed along. If it is not correct, it should be cleaned up but I do not master wikidata tools enough to write a bot for it. Mpaa (talk) 21:34, 5 September 2022 (UTC)[reply]
We desperately need better Wikidata tools (so we're not dependent on Billinghurst to be on eternal vigilance here). But the current gadget we have for this is loaded from some user's personal page on Russian Wikisource (which is kinda iffy in itself these days), and its code is completely incomprehensible. If anybody knows of or runs across good API docs for how to talk to Wikidata I'd be very interested. As far as I can tell, the only existing API is the main MW:API with some very minor additions for WD, and that's way way too painful to use for our purposes. Xover (talk) 06:15, 6 September 2022 (UTC)[reply]
@Xover: Maybe we should just be bold and create a phabricator task and see where we go. We probably should have put this into the desired toys to be built for 2023, though we have missed that boat as it is currently in final stages of voting (I think). — billinghurst sDrewth 05:40, 22 February 2023 (UTC)[reply]
User:Beleg Tâl why not just do it with Petscan itself, from memory it could additions. Also note that there is the interwiki Petscan: for these. — billinghurst sDrewth 12:14, 5 September 2022 (UTC)[reply]

 Comment wondering whether we need to chip out components of this task. For example, something like petscan:23959659 shows works using {{Act of Congress}} which would not be edition, and would instead by another item, and they also have components that could have other elements added through QuickStatements. Yes, this will still need a large slab of works that need version, edition or translation (Q3331189) added, though at least it will allow for something less than the blunderbuss approach. — billinghurst sDrewth 05:24, 27 February 2023 (UTC)[reply]

The three volumes of The Last Man only have a different title page between the first and second edition, could the proofread text of the three-volumes of the second edition be copied to the scans of the first edition. Languageseeker (talk) 23:29, 16 July 2022 (UTC)[reply]

@Mpaa If it is OK to copy also the Page status, better wait for all 3 vols to be validated. Mpaa (talk) 13:53, 18 July 2022 (UTC)[reply]
Makes Sense. Languageseeker (talk) 13:49, 22 July 2022 (UTC)[reply]

Migrate more one- and two-parameter invocations of Template:RunningHeader[edit]

Replacements:

  • {{[running header]|[text]}} to {{rh|[text]||}}
  • {{[running header]||[text]}} to {{rh||[text]|}}
  • {{[running header]|[text]|}} to {{[running header|[text]||}}
  • {{[running header]|[left text]|[center text]}} to {{rh|[left text]|[center text]|}}
  • {{[running header]|[left text]|[center text]|{{[template]|[space or nothing]}}}} to {{rh|[left text]|[center text]|}}
  • {{[running header]|{{[template]|[space or nothing]}}|[center text]|[right text]}} to {{rh||[center text]|[right text]}}
  • {{[running header]|{{[template]|[space or nothing]}}|[text]|{{[template]|[space or nothing]}}}} to {{rh||[text]|}}
  • delete {{rh|| }}
Indexes:

CalendulaAsteraceae (talkcontribs) 03:38, 21 November 2023 (UTC)[reply]

Just to note... Since there's no pre-made page generator to give pywikibot all these pages to work on I am going to have to make custom bot to loop over them, and since I don't have any similar code lying around that's going to have to wait until I have the time to sit down to figure out how to do that. Pywikibot also at some point seems to have dropped the ReplaceBot class, so I may have to reimplement a lot of the basic logic for that too. @Mpaa: you wouldn't happen to have any code like this handy that you could share? Or any advice on how to approach this? Xover (talk) 07:02, 26 November 2023 (UTC)[reply]
@Xover it's called ReplaceRobot it this what you mean. I would use replace.py and make it in chunks; note that you can feed several -prefixindex:<prefix> at at time.

As a side comment, it seems you are changing the signature for {{rh}}, isn't it? If so, I haven't followed the discussion and the rational, but given the wide and long established use of the template, I am a bit skeptical (e.g. I have never specified the last | in {{rh|10|SOME WORK}}, so I expect a learning time, with more of this kind of replacement to be done. Mpaa (talk) 11:52, 26 November 2023 (UTC)[reply]
@Mpaa: Isn't that an internal class for the replace.py script? There used to be a base pywikibot ReplaceBot class for making custom replace bots, but it got dropped at some point (or at least I wasn't able to find it). My Python-fu is pretty weak sauce so I could just be confused.
I've been using stock replace.py with options, but the list of indexes above is going to be a bear to do that way so I was looking for some way to do it in a foreach (and wrapping replace.py in perl, natch, failed due to PAWS' funky handling of stdin).
And, yes, the bot runs are part of changing the call signature of {{rh}} such that the number of args determines how many cells you get. The change should (I think) be immediately obvious so the learning curve might not be so bad. The most tricky case is when rh only gets two params, because that used to give you left+center and will now give you left+right. Xover (talk) 18:46, 26 November 2023 (UTC)[reply]
@Xover: Would it be easier for you to work with a tracking category? If so, we should talk in more detail about the tracking categories I've added to Module:Running header and my thoughts on a two-stage migration. —CalendulaAsteraceae (talkcontribs) 01:06, 27 November 2023 (UTC)[reply]
@CalendulaAsteraceae: Yes. Any list of pages I can reliably get from one of the Page Generators is much easier to work with because then I can just fire off the stock replace.py script. Getting the regexen right still requires some tweaking, but I speak regex natively so that's usually not a big problem.
Making a custom script for stuff like this isn't really that hard either, it's just that I have never done it—or worked with other custom pywikibot code—and combined with not being a Python coder it means it takes sustained attention and effort to learn first (which is the kind of time I have trouble finding).
Incidentally, Mpaa, being able to specify an Index: page in order to work on all the Page:es associated with it, might be a nice convenience for pywikibot. -pagesinindex:"Foo.djvu" or something. -prefixindex:"Page:Foo.djvu/" with manual fiddling works fine, but it is a couple of extra steps. Xover (talk) 07:06, 28 November 2023 (UTC)[reply]
@CalendulaAsteraceae: How did you come up with this list of indexes? Is it something we can automate adding a tracking category to? Or maybe just list all the pages somewhere? It's going to take a while for me to get around to figuring out how to do a custom PWB bot for this, and I have several higher-priority projects. If we can find some way to make the page selection in a way the pywikibot's standard generators can consume it'd be much easier (read: faster). Xover (talk) 09:05, 30 January 2024 (UTC)[reply]
@Xover: I came up with this list of indexes by doing a regex insource search and checking the results manually, but if we switch to using Module:Running header (in its current form, which doesn't change functionality) it will add tracking categories automatically. —CalendulaAsteraceae (talkcontribs) 15:18, 30 January 2024 (UTC)[reply]
@Xover: The category's still filling up, but here you go!
CalendulaAsteraceae (talkcontribs) 00:05, 7 February 2024 (UTC)[reply]
@Xover: Updated request now that there are tracking categories:
  • Category:Running headers with one entry I took care of these manually!
  • Category:Running headers with two entries
    • {{[running header]|left=[left text]|center=[center text]}}{{[running header]|[left text]|[center text]|}}
    • {{[running header]|center=[center text]|left=[left text]}}{{[running header]|[left text]|[center text]|}}
    • {{[running header]|left=[left text]|right=[right text]}}{{[running header]|[left text]||[right text]}}
    • {{[running header]|right=[right text]|left=[left text]}}{{[running header]|[left text]||[right text]}}
    • {{[running header]|center=[center text]|right=[right text]}}{{[running header]||[center text]|[right text]}}
    • {{[running header]|right=[right text]|center=[center text]}}{{[running header]||[center text]|[right text]}}
    • {{[running header]|1=[left text]|2=[center text]}}{{[running header]|[left text]|[center text]|}}
    • {{[running header]|2=[center text]|1=[left text]}}{{[running header]|[left text]|[center text]|}}
    • {{[running header]|1=[left text]|3=[right text]}}{{[running header]|[left text]||[right text]}}
    • {{[running header]|3=[right text]|1=[left text]}}{{[running header]|[left text]||[right text]}}
    • {{[running header]|2=[center text]|3=[right text]}}{{[running header]||[center text]|[right text]}}
    • {{[running header]|3=[right text]|2=[center text]}}{{[running header]||[center text]|[right text]}}
    • {{[running header]|[left text]|[center text]}}{{[running header]|[left text]|[center text]|}}
CalendulaAsteraceae (talkcontribs) 18:04, 20 February 2024 (UTC)[reply]
@CalendulaAsteraceae this is a huge backlog. To make things easier with regexes, is it possible to create different tracking categories: no named parameters, left-center-right parameters, 1-2-3 parameters) Mpaa (talk) 15:36, 13 April 2024 (UTC)[reply]
@Mpaa, I've created Category:Running headers using explicit parameter names, but I'm not aware of any way to distinguish explicit versus implicit numbered parameters. Breaking these down a bit:
  • Category:Running headers with two entries and Category:Running headers using explicit parameter names: https://petscan.wmflabs.org/?psid=28015778
    • {{[running header]|left=[left text]|center=[center text]}}{{[running header]|[left text]|[center text]|}}
    • {{[running header]|center=[center text]|left=[left text]}}{{[running header]|[left text]|[center text]|}}
    • {{[running header]|left=[left text]|right=[right text]}}{{[running header]|[left text]||[right text]}}
    • {{[running header]|right=[right text]|left=[left text]}}{{[running header]|[left text]||[right text]}}
    • {{[running header]|center=[center text]|right=[right text]}}{{[running header]||[center text]|[right text]}}
    • {{[running header]|right=[right text]|center=[center text]}}{{[running header]||[center text]|[right text]}}
  • Category:Running headers with two entries and not Category:Running headers using explicit parameter names: https://petscan.wmflabs.org/?psid=28015774
    • {{[running header]|1=[left text]|2=[center text]}}{{[running header]|[left text]|[center text]|}}
    • {{[running header]|2=[center text]|1=[left text]}}{{[running header]|[left text]|[center text]|}}
    • {{[running header]|1=[left text]|3=[right text]}}{{[running header]|[left text]||[right text]}}
    • {{[running header]|3=[right text]|1=[left text]}}{{[running header]|[left text]||[right text]}}
    • {{[running header]|2=[center text]|3=[right text]}}{{[running header]||[center text]|[right text]}}
    • {{[running header]|3=[right text]|2=[center text]}}{{[running header]||[center text]|[right text]}}
    • {{[running header]|[left text]|[center text]}}{{[running header]|[left text]|[center text]|}}
CalendulaAsteraceae (talkcontribs) 22:09, 13 April 2024 (UTC)[reply]
  • Xover: This should not be done; it goes against the consensus stated here. This is especially so as, if I remember correctly, only you and the bot action requestor were the only editors who expressed support of the proposal. TE(æ)A,ea. (talk) 22:39, 13 April 2024 (UTC)[reply]

CalendulaAsteraceae, Xover could you remind me what happens to explicit params (left=|center=|right=)? Are they going to be phased out and so Category:Running_headers_using_explicit_parameter_names shall be emptied? Mpaa (talk) 21:09, 27 April 2024 (UTC)[reply]

@Mpaa: left is an alias of 1, center/centre is an alias of 2, and right is an alias of 3. I would like to phase these parameters out, since their use is prone to cause errors with duplicate parameters, but it's not a high priority. —CalendulaAsteraceae (talkcontribs) 20:42, 28 April 2024 (UTC)[reply]
cf. this edit. Why are we converting these to {{c}}? These should be perfectly fine running headers both before and after the migration, no? Xover (talk) 05:30, 29 April 2024 (UTC)[reply]
In any case rh/1 behaviour was planned in the new version. It generates a single centered field IIRC. ShakespeareFan00 (talk) 18:11, 29 April 2024 (UTC)[reply]
@ShakespeareFan00 which in my opinion is a bad idea. What does it achieve vs. having {{rh|xxx}}? It will only generate one more template, we will have works that on one page will use rh, on the next rv/1, then rh again ... I thought you valued consistency. Mpaa (talk) 08:36, 1 May 2024 (UTC)[reply]
Perhaps someone can site down and decide on ONE approach then? and then make all existing usage consistent? ShakespeareFan00 (talk) 09:04, 2 May 2024 (UTC)[reply]
Perhaps someone should do that and DOCUMENT it so tightly there cannot be arguments BEFORE any futher conversions or "repairs" are undertaken? ShakespeareFan00 (talk) 09:04, 2 May 2024 (UTC)[reply]
The eventual goal is to have {{rh|<centered heading>}}, so that it's possible to have single field headings, albiet with a stylesheet to set the 'style' of that heading, something that's not possible with {{c}}. {{rh/1}} has a class param, and my understanding was that once current uses are standardised. {{rh/1}} get the /1 dropped.
The need to expand 2 param calls to 3 param calls, is because the deign intent was to make 2 param calls behave as {{rh|<left heading>|<right aligned heading>}}) which is different from the current implied 2 param usage of {{rh|<left aligned heading>|<center aligned heading>}} (and hence what causes the creation of {{rh/1}}{{rh/2}}, {{rh/3}} etc.. It's to do with CSS hooks for IndexStyles as well..
Once current use is consistently more than 3 params, rh/3, rh/2 , rh/1 (with their different behadviours etc can be re-merged.
As I said it needs someone else to sit down and look at all the variants.
(Aside: It would also be a LOT simply if RH wasn't supplied with formatting templates in it's input, The 'styles' applied should be set in Indexstyles, with the params being pure content. The styles are then set via IndexStyles and appropriate CSS selectors.)
ShakespeareFan00 (talk) 14:14, 2 May 2024 (UTC)[reply]
@ShakespeareFan00 I 100% support your last point! I can't stand templates in rh, they make parsing much more complex. Mpaa (talk) 22:23, 3 May 2024 (UTC)[reply]
@Xover sure it would work; I was going through recent changes and it seemed to me that {{c}}<nowiki>}} is the right template in that case. It is not that we need rh everywhere just because it exists.
Anyhow, some of my edits were reverted e.g. this, so it seems @TE(æ)A,ea. is not happy with this rh migration; I am going to stop until people are happy with it. Mpaa (talk) 20:39, 30 April 2024 (UTC)[reply]
Hmm. Well, for the diff linked I'd used {{rh}} deliberately because it is a running header, and using {{c}} doesn't seem to provide any additional benefit or prevent any problem. {{rh||pageno|}} works fine and is, AIUI, supposed to keep working. So unless I'm missing something I don't think that particular case needs (or should) be changed.
As for the reverts, TE(æ)A,ea. is objecting to {{rh}} (or any other template) being implemented in Lua, because they are somehow under the impression that any random contributor can maintain and modify templates like this if they are implemented in MediaWiki template syntax; and that a mainstream programming language like Lua is somehow harder to maintain. This is of course entirely unrelated to the functionality of the template, which is what we're discussing here. Xover (talk) 07:33, 1 May 2024 (UTC)[reply]
@Xover It is not a big deal, I can revert them back if you prefer that. My line of thinking is that up to yesterday also {{rh||pageno}} was supposed to be OK until it was not ... and I am sure tomorrow someone will like to replace {{rh||pageno|}} with {{rh/1|pageno}} and so on. If we already had support for {{rh|pageno}} keeping consitency between 'name' and'function' would probably be OK. Mpaa (talk) 08:29, 1 May 2024 (UTC)[reply]
@Mpaa: No no, no need to revert anything. Right now it doesn't matter whether it uses {{c}} or {{rh}}. It's just that it seemed unnecessary, and longer term I have some ideas for how the difference in semantics might be employed for actual functionality (hence why I've elsewhere recommended that we avoid (ab)using {{rh}} for columns or margins in the page body). For example, a Gadget that manages the running headers (that would get confused by {{rh}} in the body, and unable to detect a running header using {{c}}) or a "reader mode" that gives you a book-like view for reading page-by-page that also fetches the running headers and displays them even though it operates in mainspace. Just loose thoughts, so nothing that matters near-term, but that's the context. Xover (talk) 09:00, 1 May 2024 (UTC)[reply]
@TE(æ)A,ea. are you ok with these changes or not? Because your reverted something that is not related to how the template is implemented, but more to the idea behind it. My edit would be acceptable also with the old implementation. Mpaa (talk) 08:40, 1 May 2024 (UTC)[reply]
  • Mpaa: As I mentioned earlier in this discussion, I do not believe that the running header conversion scheme (particularly the one where using unnamed parameters 1 and 2 becomes left and right, as opposed to the long-running standard practice of left and center) is justified by community consensus. Bots cannot operate without justification, and here that justification is explicitly lacking. When the changes currently being made were brought up at the Scriptorium, widespread objection was held in regard to the change in unnamed parameters 1 and 2. Despite this, the users who proposed the change are continuing to enact it in the face of community consensus—no new feat for at least one of them. In any case, I vehemently object to any such change to any project in which I am involved and in which I am working, such as Kojiki (by your example). TE(æ)A,ea. (talk) 17:03, 2 May 2024 (UTC)[reply]
    Okay let me blunt, What is your "actual" concern with expanding all current usages of rh to 3 or more parameters?
    ShakespeareFan00 (talk) 18:56, 2 May 2024 (UTC)[reply]
    • ShakespeareFan00: It’s completely pointless and a massive waste of resources. People who deal with coding and other Wikisource backend (the above contributors) could do something productive; instead they spend days and weeks trying to change this template, which will require 165,000+ pages to be edited, and will produce absolutely no benefit. This isn’t even about the Lua change, at this point; the two-parameter system currently in place could be carried over. It’s basically two users, without (if not against) consensus, undertaking to make hundreds of thousands of edits and break workflow for, I would say, most active editors, for no benefit to the project. I do, also, object to the change to Lua, for the reasons I have stated before (and which Xover mocked above). The use of modules should be avoided as much as possible, and templates should certainly not be changed to modules when there is no need to do so, as here. TE(æ)A,ea. (talk) 19:30, 2 May 2024 (UTC)[reply]

Remove some uses of override_author and override_contributor[edit]

It may be impractical to remove all uses of override_author and co., but we can at least get some of the easy cases. To that end, I would like the following replacements:

  1. Pages matching insource:/override_author[ ]*=[ ]*\[\[Author:[^\|]*\|[^\|]*\]\][ ]*.[ ]*\|/
    1. \|[ \n]*author[ \n]*=[ \n]*\|[ \n]*override_author[ \n]*=[ \n]*\[\[Author:<author> \(<disambig>\)\|<author>\]\]\| author = <author> \(<disambig>\)
    2. \|[ \n]*author[ \n]*=[ \n]*\|[ \n]*override_author[ \n]*=[ \n]*\[\[Author:<author>\|<authordisplay>\]\]\| author = <author> \| author_display = <authordisplay>
  2. Pages matching insource:/override_(contributor|section_author)[ ]*=[ ]*\[\[Author:[^\|]*\|[^\|]*\]\][ ]*and[ ]*\[\[Author:[^\|]*\|[^\|]*\]\].[ ]*[\|}]/
    1. |[ ]*(contributor|section_author)[ ]*=[ ]*|[ ]*override_(contributor|section_author)[ \n]*=[ \n]*\[\[Author:<author> \(<disambig>\)\|<author>\]\]\| section_author = <author> \(<disambig>\)
    2. [ ]*(contributor|section_author)[ ]*=[ ]*|[ ]*override_(contributor|section_author)[ \n]*=[ \n]*\[\[Author:<author>\|<authordisplay>\]\]\| section_author = <author> \| section_author_display = <authordisplay>
  3. Pages matching insource:/override_author[ ]*=[ ]*\[\[Author:[^\|]*\|[^\|]*\]\][ ]*and[ ]*\[\[Author:[^\|]*\|[^\|]*\]\][ ]*.[ ]*[\|}]/
    1. |[ \n]*author[ \n]*=[ \n]*|[ \n]*override_author[ \n]*=[ \n]*\[\[Author:<author1>\|<author1>\]\][ \n]*and[ \n]*\[\[Author:<author2>\|<author2>\]\]\| author1 = <author1> \| author2 = <author2>
    2. |[ \n]*author[ \n]*=[ \n]*|[ \n]*override_author[ \n]*=[ \n]*\[\[Author:<author1>\|<author1display>\]\][ \n]*and[ \n]*\[\[Author:<author2>\|<author2>\]\]\| author1 = <author1> \| author1_display = <author1display> \| author2 = <author2>
    3. |[ \n]*author[ \n]*=[ \n]*|[ \n]*override_author[ \n]*=[ \n]*\[\[Author:<author1>\|<author1>\]\][ \n]*and[ \n]*\[\[Author:<author2>\|<author2display>\]\]\| author1 = <author1> \| author2 = <author2> \| author2_display = <author2display>
    4. |[ \n]*author[ \n]*=[ \n]*|[ \n]*override_author[ \n]*=[ \n]*\[\[Author:<author1>\|<author1display>\]\][ \n]*and[ \n]*\[\[Author:<author2>\|<author2display>\]\]\| author1 = <author1> \| author1_display = <author1display> \| author2 = <author2> \| author2_display = <author2display>
  4. Pages matching insource:/override_(contributor|section_author)[ ]*=[ ]*\[\[Author:[^\|]*\|[^\|]*\]\][ ]*and[ ]*\[\[Author:[^\|]*\|[^\|]*\]\][ ]*.[ ]*[\|}]/
    1. |[ \n]*(contributor|section_author)[ \n]*=[ \n]*|[ \n]*override_(contributor|section_author)[ \n]*=[ \n]*\[\[Author:<author1>\|<author1>\]\][ \n]*and[ \n]*\[\[Author:<author2>\|<author2>\]\]\| section_author1 = <author1> \| section_author2 = <author2>
    2. |[ \n]*(contributor|section_author)[ \n]*=[ \n]*|[ \n]*override_(contributor|section_author)[ \n]*=[ \n]*\[\[Author:<author1>\|<author1display>\]\][ \n]*and[ \n]*\[\[Author:<author2>\|<author2>\]\]\| section_author1 = <author1> \| section_author1_display = <author1display> \| section_author2 = <author2>
    3. |[ \n]*(contributor|section_author)[ \n]*=[ \n]*|[ \n]*override_(contributor|section_author)[ \n]*=[ \n]*\[\[Author:<author1>\|<author1>\]\][ \n]*and[ \n]*\[\[Author:<author2>\|<author2display>\]\]\| section_author1 = <author1> \| section_author2 = <author2> \| section_author2_display = <author2display>
    4. |[ \n]*(contributor|section_author)[ \n]*=[ \n]*|[ \n]*override_(contributor|section_author)[ \n]*=[ \n]*\[\[Author:<author1>\|<author1display>\]\][ \n]*and[ \n]*\[\[Author:<author2>\|<author2display>\]\]\| section_author1 = <author1> \| section_author1_display = <author1display> \| section_author2 = <author2> \| section_author2_display = <author2display>
  5. Pages matching insource:/override_contributor[ ]*=[ ]*\[\[Author:[^\|]*\|[^\|]*\]\], translated by \[\[Author:[^\|]*\|[^\|]*\]\]. \|/
    1. override_contributor[ ]*=[ ]*\[\[Author:([^\|]*)\|[^\|]*\]\], translated by \[\[Author:([^\|]*)\|[^\|]*\]\]contributor = %1 \| section_translator = %2
  6. Pages in Category:Pages with override author with {{anon}}: insource:/\{\{[Aa]non}}.[ ]*\|/
    1. \|[ \n]*author[ \n]*=[ \n]*\|[ \n]*override_author[ \n]*=[ \n]*{{anon}}\| author = anon
  7. Pages in Category:Pages with override contributor with {{anon}}: insource:/\{\{[Aa]non\}\}[^,] \|/:
    1. [ ]*(contributor|section_author)[ ]*=[ ]*|[ ]*\|[ \n]*(contributor|section_author)[ \n]*=[ \n]*\|[ \n]*override_contributor[ \n]*=[ \n]*{{anon}}\| section_author = anon

Thank you! —CalendulaAsteraceae (talkcontribs) 20:34, 26 February 2024 (UTC)[reply]

@CalendulaAsteraceae: This is somewhat overwhelming to untangle enough to execute. The standard replace script of pywikibot has a facility for performing multiple Python regex replacements for each page processed. Could you maybe try to break your replacements up into more atomic steps?
You can think of the process as 1) select what wikipages to operate on, and 2) perform this series of regex replacements on those pages. insource: searches can be used for selecting pages but is finicky, so prefer criteria like "all pages in Category:Foo", "All pages transcluding Template:Bar", "All pages linked from the wikipage Baz", etc. (I'm not sure what set intersection, disunion, etc. is available for these generator functions, but I'm pretty sure you can dp some combinatorial stuff with them). And replacements that are more than trivial benefit from being broken up into a series of match pattern + replacement pattern. In typical regex style, parenthesis in the match pattern saves away the matched bit and can be accessed by numerical reference in the replacement pattern: the stuff matched by the first parenthesis is in \1 in Python (and in $1 in JavaScript). Xover (talk) 08:08, 1 March 2024 (UTC)[reply]
@Xover: yep, I'm happy to work on breaking up these replacements into more steps. How can I target multiple instances of the same pattern, like \[\[Author:<author> \(<disambig>\)\|<author>\]\]? —CalendulaAsteraceae (talkcontribs) 17:03, 1 March 2024 (UTC)[reply]
Replacements to make in this message once I learn the appropriate syntax:
  • <AUTHORPATTERN> to pattern using [^\n\|]*
  • <DIGITPATTERN> to pattern using [\d]*
  • <PARAMPATTERN> to pattern using (author|section_author)
Replacements to make in work pages:
  1. All pages in Category:Pages with override author or Category:Pages with override contributor:
    1. \s+\n\n
  2. All pages in Category:Pages with override author:
    1. \|[\s\n]*author[\s\n]*=[\s\n]*\|[\s\n]*override_author[\s\n]*=[\s\n]*([^\n]*)\n\| override_author = $1\n
  3. All pages in Category:Pages with override contributor:
    1. \|[\s\n]*(contributor|section_author)[\s\n]*=[\s\n]*|[\s\n]*override_(contributor|section_author)[\s\n]*=[\s\n]*([^\n]*)\n\| override_section_author = $3\n
  4. All pages in Category:Pages with override author or Category:Pages with override contributor which transclude {{anon}}:
    1. \| override_(author|section_author) = {{anon}}\n\| $1 = anon\n
  5. All pages in Category:Pages with override author or Category:Pages with override contributor:
    1. \| override_(author|section_author) = \[\[Author:([^\n\|]*)\|([^\n\|]*)\]\]\n\| $1 = $2 \| $1_display = $3\n
    2. \| override_(author|section_author) = \[\[Author:([^\n\|]*)\|([^\n\|]*)\]\] and \[\[Author:([^\n\|]*)\|([^\n\|]*)\]\]\n\| ($1)1 = $2 \| ($1)1_display = $3\n\| ($1)2 = $4 \| ($1)2_display = $5\n
    3. \| <PARAMPATTERN><DIGITPATTERN> = <AUTHORPATTERN> \| <PARAMPATTERN><DIGITPATTERN>_display = <AUTHORPATTERN>\n\| <PARAMPATTERN><DIGITPATTERN> = <AUTHORPATTERN>\n
    4. \| <PARAMPATTERN><DIGITPATTERN> = <AUTHORPATTERN> (\([^\n\|]*\) \| <PARAMPATTERN><DIGITPATTERN>_display = <AUTHORPATTERN>\n\| <PARAMPATTERN><DIGITPATTERN> = <AUTHORPATTERN> $1\n
CalendulaAsteraceae (talkcontribs) 21:00, 1 March 2024 (UTC)[reply]
Replacement #1 will match pretty much every text and the change has no functional benefit, so that on its own is not a good idea. I'm assuming your thinking is to normalise that to make other replacements easier, but arbitrary whitespace surrounding parameters and values is something that will need to be handled in the matching patterns in any case.
[] creates a character class, with the typical example being something like [a-z4-8] to match every lower-case English letter plus the digits 4 through 8. The escape sequences \s and \n match a pre-defined character class consisting of all whitespace characters (space, horizontal tab, etc.) and a literal newline, respectively. You don't generally put pre-defined character classes inside []. The places you've put \n are also places where I would not usually expect to find newlines, even in messy human-entered input, so I'd leave those out and if necessary do a separate run to fix whatever remains. You can also assume these patterns match within a single line if it's clear that that's what you're doing and I can take care of tweaking it accordingly (waving hands over explanation too long for the context)
In replacement #2, are you really intending to remove |author= when there is a |override_author= present? That seems backward to me. Ditto for #3. And is |contributor= semantically equivalent to |section_author=?
You can't reuse patterns as if they were variables, because we're feeding config / command line options to a pre-made tool here. To do that you need programming-level control and that means writing a custom bot. If you do, then passing patterns around and constructing them from strings is straightforward of course. But for this you'll have to repeat the subpatterns for every replacement. I think you'd also better explain your logic in prose so I don't have to reverse-engineer it from your regexes. I think I get roughly what you're aiming at here, but I could easily also be confused and I'm missing the context of what you've implemented in {{header}}. Are we merging disparate forms into just |override_author= and |override_section_author=, and then splitting those into |override_author= + |override_author_display=? And is the final step then to drop the "override" prefix where not needed?
PS. For my sanity, please don't introduce more parameter names with underscores. They're a holdover from ancient ancient template code, are easily confused with MediaWiki's underscore-for-space replacements (they were in fact originally for template parameters with literal space characters in the name), and these days we should be using hyphens for this purpose. In fact, while we still have underscore names they should be made aliases to a canonical name using a hyphen. Xover (talk) 10:09, 2 March 2024 (UTC)[reply]
@Xover: Thank you for all this detailed feedback!
I've made the parameters with underscores aliases of parameters with hyphens, and contributor was already an alias of section-author.
The ultimate goal of these replacements is to replace |override_author = {{anon}} with |author = anon, |override_author = [[Author:Authorlink|Authordisplay]] with |author = Authorlink | author-display = Authordisplay, and |override_author = [[Author:Author1link|Author1display]] and [[Author:Author2link|Author2display]] with |author1 = Author1link | author1-display = Author1display | author2 = Author2link | author2-display = Author2display, and ditto for section_author, without generating duplicate parameters. Ideally, I'd prefer not to have a bunch of display parameters that produce the same result as the automatic display, which is why I wanted to reuse patterns, but I suppose it's not a big deal if it's easier to just leave those in.
Some simpler proposed replacements, which assume the entire match is on one line:
  • \|\s*override_(author|section_author|contributor)\s*=\s*{{anon}}\s*\n\| \1 = anon\n
  • \|\s*override_(author|section_author|contributor)\s*=\s*\[\[Author:([^\|]*)\|([^\|]*)\]\]\s*\n\| \1 = \2 \| \1-display = \3\n
  • \|\s*override_(author|section_author|contributor)\s*=\s*\[\[Author:([^\|]*)\|([^\|]*)\]\] and \[\[Author:([^\|]*)\|([^\|]*)\]\]\s*\n\| (\1)1 = \2 \| (\1)1-display = \3 \| (\1)2 = \4 \| (\1)2-display = \5\n
Replacements to run after the above replacements, to fix duplicate parameters:
  • \|\s*author\s*=\s*\| author(1?) =\| author\1 =
  • \|\s*section_author\s*=\s*\| section_author(1?) =\| section\-author\1 =
  • \|\s*contributor\s*=\s*\| contributor(1?) =\| section\-author\1 =
CalendulaAsteraceae (talkcontribs) 05:35, 4 March 2024 (UTC)[reply]

Update DNB00 and CE13 contributors[edit]

I would like to update some uses of contributor. Specifically, I would like these replacements:

CalendulaAsteraceae (talkcontribs) 16:54, 9 May 2024 (UTC)[reply]