Wikisource:Bot requests

From Wikisource
(Redirected from Wikisource:BOTR)
Jump to navigation Jump to search
Bot requests

This page allows users to request that an existing bot accomplish a given task. Note that some tasks may require that an entirely new bot or script be written. This is not the place to ask for help running or writing a bot.

A bot operating performing a task should make note of it so that other bots don't attempt to do the same. Tasks that are permanently assigned or scheduled for long-term execution are listed on Persistent tasks. See also Wikisource:Bots.

Move all subpages of Who's Who in the Far East to use title case[edit]

I was informed by User:Beeswaxcandle that I should use title case instead of all caps in article names. So I request to move all subpages of Who's Who in the Far East to use title case. Although I can use a bot to move it myself, that would leave tons of redirects for admins to delete. But if an admin can easily batch-delete a list of pages, I can move it myself and then provide the list of pages to delete. I'm sorry for the inconvenience. Thanks, --Stevenliuyi (talk) 08:58, 6 May 2021 (UTC)[reply]

@Stevenliuyi: Please review the list at Wikisource:Bot requests/sandbox. I notice that there is at least one English name that needs to be fixed, and the Chinese names didn't convert on the regex that I used. Would you fix or create the target (only) in the list in the pair list, and I will get it done. No need to fix those that are broken though you should fix the previous/next links of the articles either side. To note that as I did for your other work, I will look to get a work specific template in place, though will do that afterwards. — billinghurst sDrewth 13:10, 24 May 2021 (UTC)[reply]
I suppose that I really to want to ensure that the Chinese names are capitalised properly. — billinghurst sDrewth 02:57, 25 May 2021 (UTC)[reply]
@Stevenliuyi and @Billinghurst: Has this request been actioned (i.e. can it be closed as resolved)? Xover (talk) 10:34, 10 April 2022 (UTC)[reply]
@Stevenliuyi: Please see Billinghurst's request (above) for quality control of the list of targets in Wikisource:Bot requests/sandbox. They have done the legwork to prepare for the move, but it is unable to progress until you've checked and corrected the target page names. Xover (talk) 05:33, 3 September 2022 (UTC)[reply]
@Stevenliuyi: This is blocked on your input here. Xover (talk) 10:19, 7 November 2023 (UTC)[reply]

Wikidata bulk edit[edit]

I made a query for works on enWS that have WD items with no "instance of" statement. The criteria I used are:

  • Pages in mainspace
  • No redirects or disambiguation pages (this includes Versions and Translations btw)
  • Does not contain a forward slash in the page name (in order to exclude subpages)
  • Is linked to Wikidata, and linked Wikidata item does not have a P31 statement

This query returns 13889 results, which is more than even QuickStatements can handle. Would it be possible for a bot to update these Wikidata items with P31=Q3331189 (instance of = version, edition, or translation)?

Thanks :) —Beleg Tâl (talk) 13:22, 1 November 2021 (UTC)[reply]

I think we could be more specific for certain groups, e.g I have addressed "Presidential Radio Address" articles as "instance of speech". There are several groups of articles that can be identified and then addressed with QuickStatements. After that, the bot can be run on what is left. Mpaa (talk) 23:13, 1 November 2021 (UTC)[reply]
@Mpaa: Except they are editions as we host them, the speech would be the parent to the item, per d:WD:Books as there may be other published editions of the same speech. — billinghurst sDrewth 12:17, 5 September 2022 (UTC)[reply]
@Billinghurst I see. I saw other were linked that way and I followed along. If it is not correct, it should be cleaned up but I do not master wikidata tools enough to write a bot for it. Mpaa (talk) 21:34, 5 September 2022 (UTC)[reply]
We desperately need better Wikidata tools (so we're not dependent on Billinghurst to be on eternal vigilance here). But the current gadget we have for this is loaded from some user's personal page on Russian Wikisource (which is kinda iffy in itself these days), and its code is completely incomprehensible. If anybody knows of or runs across good API docs for how to talk to Wikidata I'd be very interested. As far as I can tell, the only existing API is the main MW:API with some very minor additions for WD, and that's way way too painful to use for our purposes. Xover (talk) 06:15, 6 September 2022 (UTC)[reply]
@Xover: Maybe we should just be bold and create a phabricator task and see where we go. We probably should have put this into the desired toys to be built for 2023, though we have missed that boat as it is currently in final stages of voting (I think). — billinghurst sDrewth 05:40, 22 February 2023 (UTC)[reply]
User:Beleg Tâl why not just do it with Petscan itself, from memory it could additions. Also note that there is the interwiki Petscan: for these. — billinghurst sDrewth 12:14, 5 September 2022 (UTC)[reply]

 Comment wondering whether we need to chip out components of this task. For example, something like petscan:23959659 shows works using {{Act of Congress}} which would not be edition, and would instead by another item, and they also have components that could have other elements added through QuickStatements. Yes, this will still need a large slab of works that need version, edition or translation (Q3331189) added, though at least it will allow for something less than the blunderbuss approach. — billinghurst sDrewth 05:24, 27 February 2023 (UTC)[reply]

The three volumes of The Last Man only have a different title page between the first and second edition, could the proofread text of the three-volumes of the second edition be copied to the scans of the first edition. Languageseeker (talk) 23:29, 16 July 2022 (UTC)[reply]

@Mpaa If it is OK to copy also the Page status, better wait for all 3 vols to be validated. Mpaa (talk) 13:53, 18 July 2022 (UTC)[reply]
Makes Sense. Languageseeker (talk) 13:49, 22 July 2022 (UTC)[reply]

Migrate more one- and two-parameter invocations of Template:RunningHeader[edit]

Replacements:

  • {{[running header]|[text]}} to {{rh|[text]||}}
  • {{[running header]||[text]}} to {{rh||[text]|}}
  • {{[running header]|[text]|}} to {{[running header|[text]||}}
  • {{[running header]|[left text]|[center text]}} to {{rh|[left text]|[center text]|}}
  • {{[running header]|[left text]|[center text]|{{[template]|[space or nothing]}}}} to {{rh|[left text]|[center text]|}}
  • {{[running header]|{{[template]|[space or nothing]}}|[center text]|[right text]}} to {{rh||[center text]|[right text]}}
  • {{[running header]|{{[template]|[space or nothing]}}|[text]|{{[template]|[space or nothing]}}}} to {{rh||[text]|}}
  • delete {{rh|| }}
Indexes:

CalendulaAsteraceae (talkcontribs) 03:38, 21 November 2023 (UTC)[reply]

Just to note... Since there's no pre-made page generator to give pywikibot all these pages to work on I am going to have to make custom bot to loop over them, and since I don't have any similar code lying around that's going to have to wait until I have the time to sit down to figure out how to do that. Pywikibot also at some point seems to have dropped the ReplaceBot class, so I may have to reimplement a lot of the basic logic for that too. @Mpaa: you wouldn't happen to have any code like this handy that you could share? Or any advice on how to approach this? Xover (talk) 07:02, 26 November 2023 (UTC)[reply]
@Xover it's called ReplaceRobot it this what you mean. I would use replace.py and make it in chunks; note that you can feed several -prefixindex:<prefix> at at time.

As a side comment, it seems you are changing the signature for {{rh}}, isn't it? If so, I haven't followed the discussion and the rational, but given the wide and long established use of the template, I am a bit skeptical (e.g. I have never specified the last | in {{rh|10|SOME WORK}}, so I expect a learning time, with more of this kind of replacement to be done. Mpaa (talk) 11:52, 26 November 2023 (UTC)[reply]
@Mpaa: Isn't that an internal class for the replace.py script? There used to be a base pywikibot ReplaceBot class for making custom replace bots, but it got dropped at some point (or at least I wasn't able to find it). My Python-fu is pretty weak sauce so I could just be confused.
I've been using stock replace.py with options, but the list of indexes above is going to be a bear to do that way so I was looking for some way to do it in a foreach (and wrapping replace.py in perl, natch, failed due to PAWS' funky handling of stdin).
And, yes, the bot runs are part of changing the call signature of {{rh}} such that the number of args determines how many cells you get. The change should (I think) be immediately obvious so the learning curve might not be so bad. The most tricky case is when rh only gets two params, because that used to give you left+center and will now give you left+right. Xover (talk) 18:46, 26 November 2023 (UTC)[reply]
@Xover: Would it be easier for you to work with a tracking category? If so, we should talk in more detail about the tracking categories I've added to Module:Running header and my thoughts on a two-stage migration. —CalendulaAsteraceae (talkcontribs) 01:06, 27 November 2023 (UTC)[reply]
@CalendulaAsteraceae: Yes. Any list of pages I can reliably get from one of the Page Generators is much easier to work with because then I can just fire off the stock replace.py script. Getting the regexen right still requires some tweaking, but I speak regex natively so that's usually not a big problem.
Making a custom script for stuff like this isn't really that hard either, it's just that I have never done it—or worked with other custom pywikibot code—and combined with not being a Python coder it means it takes sustained attention and effort to learn first (which is the kind of time I have trouble finding).
Incidentally, Mpaa, being able to specify an Index: page in order to work on all the Page:es associated with it, might be a nice convenience for pywikibot. -pagesinindex:"Foo.djvu" or something. -prefixindex:"Page:Foo.djvu/" with manual fiddling works fine, but it is a couple of extra steps. Xover (talk) 07:06, 28 November 2023 (UTC)[reply]
@CalendulaAsteraceae: How did you come up with this list of indexes? Is it something we can automate adding a tracking category to? Or maybe just list all the pages somewhere? It's going to take a while for me to get around to figuring out how to do a custom PWB bot for this, and I have several higher-priority projects. If we can find some way to make the page selection in a way the pywikibot's standard generators can consume it'd be much easier (read: faster). Xover (talk) 09:05, 30 January 2024 (UTC)[reply]
@Xover: I came up with this list of indexes by doing a regex insource search and checking the results manually, but if we switch to using Module:Running header (in its current form, which doesn't change functionality) it will add tracking categories automatically. —CalendulaAsteraceae (talkcontribs) 15:18, 30 January 2024 (UTC)[reply]
@Xover: The category's still filling up, but here you go!
CalendulaAsteraceae (talkcontribs) 00:05, 7 February 2024 (UTC)[reply]
@Xover: Updated request now that there are tracking categories:
  • Category:Running headers with one entry I took care of these manually!
  • Category:Running headers with two entries
    • {{[running header]|left=[left text]|center=[center text]}}{{[running header]|[left text]|[center text]|}}
    • {{[running header]|center=[center text]|left=[left text]}}{{[running header]|[left text]|[center text]|}}
    • {{[running header]|left=[left text]|right=[right text]}}{{[running header]|[left text]||[right text]}}
    • {{[running header]|right=[right text]|left=[left text]}}{{[running header]|[left text]||[right text]}}
    • {{[running header]|center=[center text]|right=[right text]}}{{[running header]||[center text]|[right text]}}
    • {{[running header]|right=[right text]|center=[center text]}}{{[running header]||[center text]|[right text]}}
    • {{[running header]|1=[left text]|2=[center text]}}{{[running header]|[left text]|[center text]|}}
    • {{[running header]|2=[center text]|1=[left text]}}{{[running header]|[left text]|[center text]|}}
    • {{[running header]|1=[left text]|3=[right text]}}{{[running header]|[left text]||[right text]}}
    • {{[running header]|3=[right text]|1=[left text]}}{{[running header]|[left text]||[right text]}}
    • {{[running header]|2=[center text]|3=[right text]}}{{[running header]||[center text]|[right text]}}
    • {{[running header]|3=[right text]|2=[center text]}}{{[running header]||[center text]|[right text]}}
    • {{[running header]|[left text]|[center text]}}{{[running header]|[left text]|[center text]|}}
CalendulaAsteraceae (talkcontribs) 18:04, 20 February 2024 (UTC)[reply]
@CalendulaAsteraceae this is a huge backlog. To make things easier with regexes, is it possible to create different tracking categories: no named parameters, left-center-right parameters, 1-2-3 parameters) Mpaa (talk) 15:36, 13 April 2024 (UTC)[reply]
@Mpaa, I've created Category:Running headers using explicit parameter names, but I'm not aware of any way to distinguish explicit versus implicit numbered parameters. Breaking these down a bit:
  • Category:Running headers with two entries and Category:Running headers using explicit parameter names: https://petscan.wmflabs.org/?psid=28015778
    • {{[running header]|left=[left text]|center=[center text]}}{{[running header]|[left text]|[center text]|}}
    • {{[running header]|center=[center text]|left=[left text]}}{{[running header]|[left text]|[center text]|}}
    • {{[running header]|left=[left text]|right=[right text]}}{{[running header]|[left text]||[right text]}}
    • {{[running header]|right=[right text]|left=[left text]}}{{[running header]|[left text]||[right text]}}
    • {{[running header]|center=[center text]|right=[right text]}}{{[running header]||[center text]|[right text]}}
    • {{[running header]|right=[right text]|center=[center text]}}{{[running header]||[center text]|[right text]}}
  • Category:Running headers with two entries and not Category:Running headers using explicit parameter names: https://petscan.wmflabs.org/?psid=28015774
    • {{[running header]|1=[left text]|2=[center text]}}{{[running header]|[left text]|[center text]|}}
    • {{[running header]|2=[center text]|1=[left text]}}{{[running header]|[left text]|[center text]|}}
    • {{[running header]|1=[left text]|3=[right text]}}{{[running header]|[left text]||[right text]}}
    • {{[running header]|3=[right text]|1=[left text]}}{{[running header]|[left text]||[right text]}}
    • {{[running header]|2=[center text]|3=[right text]}}{{[running header]||[center text]|[right text]}}
    • {{[running header]|3=[right text]|2=[center text]}}{{[running header]||[center text]|[right text]}}
    • {{[running header]|[left text]|[center text]}}{{[running header]|[left text]|[center text]|}}
CalendulaAsteraceae (talkcontribs) 22:09, 13 April 2024 (UTC)[reply]
  • Xover: This should not be done; it goes against the consensus stated here. This is especially so as, if I remember correctly, only you and the bot action requestor were the only editors who expressed support of the proposal. TE(æ)A,ea. (talk) 22:39, 13 April 2024 (UTC)[reply]

Deletion request of a damaged duplicate[edit]

The pages of this file Index:Jesuit education, its history and principles viewed in the light of modern educational problems.djvu contain pencil notations which makes the scan illegible. I uploaded another identical publication Index:Jesuit Education.djvu, which makes this unnecessary.Thank you in advance. — ineuw (talk) 14:26, 22 January 2024 (UTC)[reply]

@Ineuw: Done --Xover (talk) 07:45, 30 January 2024 (UTC)[reply]

Remove use of override_year in Once a Week (magazine)[edit]

In the following pages, please replace override_year[ ]*=[ ]*([0-9]*)\-([0-9]*) with year = %1–%2: insource:/override_year[ ]*=[ ]*[0-9]*\-[0-9]*. \|/CalendulaAsteraceae (talkcontribs) 19:58, 26 February 2024 (UTC)[reply]

Or for a less expensive page list, Special:PrefixIndex/Once a Week (magazine)CalendulaAsteraceae (talkcontribs) 17:06, 1 March 2024 (UTC)[reply]
@CalendulaAsteraceae before moving on, what has changed compared to this? Why are we undoing it? Mpaa (talk) 08:34, 13 April 2024 (UTC)[reply]
@Mpaa: I assume the change was made because the header template had less capacity to deal with non-numeric years, since it was still using {{header/year}} rather than the Lua module. (@Levana Taylor, do you remember why you wanted this change?) In any case, I'm hoping to phase out the override_year parameter, and Module:Header/year can certainly handle years in that format now. —CalendulaAsteraceae (talkcontribs) 22:19, 13 April 2024 (UTC)[reply]
@Levana Taylor: Another question is: should these be using a date range? The year range on the volumes is more descriptive of what they collect—a subtitle rather than a bibliographic year of publication—but the actual physical volumes were published in one specific year. Let's call it "the 1862 edition of the short story first published in Once a Week no. nn in 1861", or whatever. Xover (talk) 07:48, 18 April 2024 (UTC)[reply]
@CalendulaAsteraceae done. Mpaa (talk) 17:57, 19 April 2024 (UTC)[reply]
@Mpaa: Category:Pages using duplicate arguments in template calls Xover (talk) 18:51, 19 April 2024 (UTC)[reply]

Remove some uses of override_author and override_contributor[edit]

It may be impractical to remove all uses of override_author and co., but we can at least get some of the easy cases. To that end, I would like the following replacements:

  1. Pages matching insource:/override_author[ ]*=[ ]*\[\[Author:[^\|]*\|[^\|]*\]\][ ]*.[ ]*\|/
    1. \|[ \n]*author[ \n]*=[ \n]*\|[ \n]*override_author[ \n]*=[ \n]*\[\[Author:<author> \(<disambig>\)\|<author>\]\]\| author = <author> \(<disambig>\)
    2. \|[ \n]*author[ \n]*=[ \n]*\|[ \n]*override_author[ \n]*=[ \n]*\[\[Author:<author>\|<authordisplay>\]\]\| author = <author> \| author_display = <authordisplay>
  2. Pages matching insource:/override_(contributor|section_author)[ ]*=[ ]*\[\[Author:[^\|]*\|[^\|]*\]\][ ]*and[ ]*\[\[Author:[^\|]*\|[^\|]*\]\].[ ]*[\|}]/
    1. |[ ]*(contributor|section_author)[ ]*=[ ]*|[ ]*override_(contributor|section_author)[ \n]*=[ \n]*\[\[Author:<author> \(<disambig>\)\|<author>\]\]\| section_author = <author> \(<disambig>\)
    2. [ ]*(contributor|section_author)[ ]*=[ ]*|[ ]*override_(contributor|section_author)[ \n]*=[ \n]*\[\[Author:<author>\|<authordisplay>\]\]\| section_author = <author> \| section_author_display = <authordisplay>
  3. Pages matching insource:/override_author[ ]*=[ ]*\[\[Author:[^\|]*\|[^\|]*\]\][ ]*and[ ]*\[\[Author:[^\|]*\|[^\|]*\]\][ ]*.[ ]*[\|}]/
    1. |[ \n]*author[ \n]*=[ \n]*|[ \n]*override_author[ \n]*=[ \n]*\[\[Author:<author1>\|<author1>\]\][ \n]*and[ \n]*\[\[Author:<author2>\|<author2>\]\]\| author1 = <author1> \| author2 = <author2>
    2. |[ \n]*author[ \n]*=[ \n]*|[ \n]*override_author[ \n]*=[ \n]*\[\[Author:<author1>\|<author1display>\]\][ \n]*and[ \n]*\[\[Author:<author2>\|<author2>\]\]\| author1 = <author1> \| author1_display = <author1display> \| author2 = <author2>
    3. |[ \n]*author[ \n]*=[ \n]*|[ \n]*override_author[ \n]*=[ \n]*\[\[Author:<author1>\|<author1>\]\][ \n]*and[ \n]*\[\[Author:<author2>\|<author2display>\]\]\| author1 = <author1> \| author2 = <author2> \| author2_display = <author2display>
    4. |[ \n]*author[ \n]*=[ \n]*|[ \n]*override_author[ \n]*=[ \n]*\[\[Author:<author1>\|<author1display>\]\][ \n]*and[ \n]*\[\[Author:<author2>\|<author2display>\]\]\| author1 = <author1> \| author1_display = <author1display> \| author2 = <author2> \| author2_display = <author2display>
  4. Pages matching insource:/override_(contributor|section_author)[ ]*=[ ]*\[\[Author:[^\|]*\|[^\|]*\]\][ ]*and[ ]*\[\[Author:[^\|]*\|[^\|]*\]\][ ]*.[ ]*[\|}]/
    1. |[ \n]*(contributor|section_author)[ \n]*=[ \n]*|[ \n]*override_(contributor|section_author)[ \n]*=[ \n]*\[\[Author:<author1>\|<author1>\]\][ \n]*and[ \n]*\[\[Author:<author2>\|<author2>\]\]\| section_author1 = <author1> \| section_author2 = <author2>
    2. |[ \n]*(contributor|section_author)[ \n]*=[ \n]*|[ \n]*override_(contributor|section_author)[ \n]*=[ \n]*\[\[Author:<author1>\|<author1display>\]\][ \n]*and[ \n]*\[\[Author:<author2>\|<author2>\]\]\| section_author1 = <author1> \| section_author1_display = <author1display> \| section_author2 = <author2>
    3. |[ \n]*(contributor|section_author)[ \n]*=[ \n]*|[ \n]*override_(contributor|section_author)[ \n]*=[ \n]*\[\[Author:<author1>\|<author1>\]\][ \n]*and[ \n]*\[\[Author:<author2>\|<author2display>\]\]\| section_author1 = <author1> \| section_author2 = <author2> \| section_author2_display = <author2display>
    4. |[ \n]*(contributor|section_author)[ \n]*=[ \n]*|[ \n]*override_(contributor|section_author)[ \n]*=[ \n]*\[\[Author:<author1>\|<author1display>\]\][ \n]*and[ \n]*\[\[Author:<author2>\|<author2display>\]\]\| section_author1 = <author1> \| section_author1_display = <author1display> \| section_author2 = <author2> \| section_author2_display = <author2display>
  5. Pages matching insource:/override_contributor[ ]*=[ ]*\[\[Author:[^\|]*\|[^\|]*\]\], translated by \[\[Author:[^\|]*\|[^\|]*\]\]. \|/
    1. override_contributor[ ]*=[ ]*\[\[Author:([^\|]*)\|[^\|]*\]\], translated by \[\[Author:([^\|]*)\|[^\|]*\]\]contributor = %1 \| section_translator = %2
  6. Pages in Category:Pages with override author with {{anon}}: insource:/\{\{[Aa]non}}.[ ]*\|/
    1. \|[ \n]*author[ \n]*=[ \n]*\|[ \n]*override_author[ \n]*=[ \n]*{{anon}}\| author = anon
  7. Pages in Category:Pages with override contributor with {{anon}}: insource:/\{\{[Aa]non\}\}[^,] \|/:
    1. [ ]*(contributor|section_author)[ ]*=[ ]*|[ ]*\|[ \n]*(contributor|section_author)[ \n]*=[ \n]*\|[ \n]*override_contributor[ \n]*=[ \n]*{{anon}}\| section_author = anon

Thank you! —CalendulaAsteraceae (talkcontribs) 20:34, 26 February 2024 (UTC)[reply]

@CalendulaAsteraceae: This is somewhat overwhelming to untangle enough to execute. The standard replace script of pywikibot has a facility for performing multiple Python regex replacements for each page processed. Could you maybe try to break your replacements up into more atomic steps?
You can think of the process as 1) select what wikipages to operate on, and 2) perform this series of regex replacements on those pages. insource: searches can be used for selecting pages but is finicky, so prefer criteria like "all pages in Category:Foo", "All pages transcluding Template:Bar", "All pages linked from the wikipage Baz", etc. (I'm not sure what set intersection, disunion, etc. is available for these generator functions, but I'm pretty sure you can dp some combinatorial stuff with them). And replacements that are more than trivial benefit from being broken up into a series of match pattern + replacement pattern. In typical regex style, parenthesis in the match pattern saves away the matched bit and can be accessed by numerical reference in the replacement pattern: the stuff matched by the first parenthesis is in \1 in Python (and in $1 in JavaScript). Xover (talk) 08:08, 1 March 2024 (UTC)[reply]
@Xover: yep, I'm happy to work on breaking up these replacements into more steps. How can I target multiple instances of the same pattern, like \[\[Author:<author> \(<disambig>\)\|<author>\]\]? —CalendulaAsteraceae (talkcontribs) 17:03, 1 March 2024 (UTC)[reply]
Replacements to make in this message once I learn the appropriate syntax:
  • <AUTHORPATTERN> to pattern using [^\n\|]*
  • <DIGITPATTERN> to pattern using [\d]*
  • <PARAMPATTERN> to pattern using (author|section_author)
Replacements to make in work pages:
  1. All pages in Category:Pages with override author or Category:Pages with override contributor:
    1. \s+\n\n
  2. All pages in Category:Pages with override author:
    1. \|[\s\n]*author[\s\n]*=[\s\n]*\|[\s\n]*override_author[\s\n]*=[\s\n]*([^\n]*)\n\| override_author = $1\n
  3. All pages in Category:Pages with override contributor:
    1. \|[\s\n]*(contributor|section_author)[\s\n]*=[\s\n]*|[\s\n]*override_(contributor|section_author)[\s\n]*=[\s\n]*([^\n]*)\n\| override_section_author = $3\n
  4. All pages in Category:Pages with override author or Category:Pages with override contributor which transclude {{anon}}:
    1. \| override_(author|section_author) = {{anon}}\n\| $1 = anon\n
  5. All pages in Category:Pages with override author or Category:Pages with override contributor:
    1. \| override_(author|section_author) = \[\[Author:([^\n\|]*)\|([^\n\|]*)\]\]\n\| $1 = $2 \| $1_display = $3\n
    2. \| override_(author|section_author) = \[\[Author:([^\n\|]*)\|([^\n\|]*)\]\] and \[\[Author:([^\n\|]*)\|([^\n\|]*)\]\]\n\| ($1)1 = $2 \| ($1)1_display = $3\n\| ($1)2 = $4 \| ($1)2_display = $5\n
    3. \| <PARAMPATTERN><DIGITPATTERN> = <AUTHORPATTERN> \| <PARAMPATTERN><DIGITPATTERN>_display = <AUTHORPATTERN>\n\| <PARAMPATTERN><DIGITPATTERN> = <AUTHORPATTERN>\n
    4. \| <PARAMPATTERN><DIGITPATTERN> = <AUTHORPATTERN> (\([^\n\|]*\) \| <PARAMPATTERN><DIGITPATTERN>_display = <AUTHORPATTERN>\n\| <PARAMPATTERN><DIGITPATTERN> = <AUTHORPATTERN> $1\n
CalendulaAsteraceae (talkcontribs) 21:00, 1 March 2024 (UTC)[reply]
Replacement #1 will match pretty much every text and the change has no functional benefit, so that on its own is not a good idea. I'm assuming your thinking is to normalise that to make other replacements easier, but arbitrary whitespace surrounding parameters and values is something that will need to be handled in the matching patterns in any case.
[] creates a character class, with the typical example being something like [a-z4-8] to match every lower-case English letter plus the digits 4 through 8. The escape sequences \s and \n match a pre-defined character class consisting of all whitespace characters (space, horizontal tab, etc.) and a literal newline, respectively. You don't generally put pre-defined character classes inside []. The places you've put \n are also places where I would not usually expect to find newlines, even in messy human-entered input, so I'd leave those out and if necessary do a separate run to fix whatever remains. You can also assume these patterns match within a single line if it's clear that that's what you're doing and I can take care of tweaking it accordingly (waving hands over explanation too long for the context)
In replacement #2, are you really intending to remove |author= when there is a |override_author= present? That seems backward to me. Ditto for #3. And is |contributor= semantically equivalent to |section_author=?
You can't reuse patterns as if they were variables, because we're feeding config / command line options to a pre-made tool here. To do that you need programming-level control and that means writing a custom bot. If you do, then passing patterns around and constructing them from strings is straightforward of course. But for this you'll have to repeat the subpatterns for every replacement. I think you'd also better explain your logic in prose so I don't have to reverse-engineer it from your regexes. I think I get roughly what you're aiming at here, but I could easily also be confused and I'm missing the context of what you've implemented in {{header}}. Are we merging disparate forms into just |override_author= and |override_section_author=, and then splitting those into |override_author= + |override_author_display=? And is the final step then to drop the "override" prefix where not needed?
PS. For my sanity, please don't introduce more parameter names with underscores. They're a holdover from ancient ancient template code, are easily confused with MediaWiki's underscore-for-space replacements (they were in fact originally for template parameters with literal space characters in the name), and these days we should be using hyphens for this purpose. In fact, while we still have underscore names they should be made aliases to a canonical name using a hyphen. Xover (talk) 10:09, 2 March 2024 (UTC)[reply]
@Xover: Thank you for all this detailed feedback!
I've made the parameters with underscores aliases of parameters with hyphens, and contributor was already an alias of section-author.
The ultimate goal of these replacements is to replace |override_author = {{anon}} with |author = anon, |override_author = [[Author:Authorlink|Authordisplay]] with |author = Authorlink | author-display = Authordisplay, and |override_author = [[Author:Author1link|Author1display]] and [[Author:Author2link|Author2display]] with |author1 = Author1link | author1-display = Author1display | author2 = Author2link | author2-display = Author2display, and ditto for section_author, without generating duplicate parameters. Ideally, I'd prefer not to have a bunch of display parameters that produce the same result as the automatic display, which is why I wanted to reuse patterns, but I suppose it's not a big deal if it's easier to just leave those in.
Some simpler proposed replacements, which assume the entire match is on one line:
  • \|\s*override_(author|section_author|contributor)\s*=\s*{{anon}}\s*\n\| \1 = anon\n
  • \|\s*override_(author|section_author|contributor)\s*=\s*\[\[Author:([^\|]*)\|([^\|]*)\]\]\s*\n\| \1 = \2 \| \1-display = \3\n
  • \|\s*override_(author|section_author|contributor)\s*=\s*\[\[Author:([^\|]*)\|([^\|]*)\]\] and \[\[Author:([^\|]*)\|([^\|]*)\]\]\s*\n\| (\1)1 = \2 \| (\1)1-display = \3 \| (\1)2 = \4 \| (\1)2-display = \5\n
Replacements to run after the above replacements, to fix duplicate parameters:
  • \|\s*author\s*=\s*\| author(1?) =\| author\1 =
  • \|\s*section_author\s*=\s*\| section_author(1?) =\| section\-author\1 =
  • \|\s*contributor\s*=\s*\| contributor(1?) =\| section\-author\1 =
CalendulaAsteraceae (talkcontribs) 05:35, 4 March 2024 (UTC)[reply]