User talk:Mpaa

Welcome to Wikisource

Hello, Mpaa, and welcome to Wikisource! Thank you for joining the project. I hope you like the place and decide to stay. Here are a few good links for newcomers:

You may be interested in participating in

Add the code {{active projects}}, {{PotM}} or {{Collaboration/MC}} to your page for current Wikisource projects.

You can put a brief description of your interests on your user page and contributions to another Wikimedia project, such as Wikipedia and Commons.

Have questions? Then please ask them at either

I hope you enjoy contributing to Wikisource, the library that is free for everyone to use! In discussions, please "sign" your comments using four tildes (~~~~); this will automatically produce your username if you're logged in (or IP address if you are not) and the date. If you need help, ask me on my talk page, or ask your question here (click edit) and place {{helpme}} before your question.

Again, welcome! — billinghurst sDrewth 12:00, 7 April 2011 (UTC)[reply]

Shakespeare's sonnets[edit]

Hello. I have reverted an old edit by MpaaBot, which I guess made sense in the time it was made, but later the situation with various redirects and version pages changed. I did the same here as well, but then I realized it would be much better to employ the bot again. Do you think you could do it? -- Jan Kameníček (talk) 18:12, 13 February 2023 (UTC)[reply]

@Jan.Kamenicek done. Mpaa (talk) 21:11, 17 February 2023 (UTC)[reply]

How's your SQL-fu?[edit]

Out of around 600k texts on enWS, around 200k are not scan-backed and the number is not really decreasing (we're adding new non-scan-backed texts as fast as we scan-back old texts). See this graph.

I have a vague long-term ambition to get this down to somewhere around zero, probably through packaging the task up in some kind of structure that can be attacked like any other large maintenance backlog that the community can be encouraged to divide and conquer, and chip away at over time.

But right now the only real tool we have for that is Special:PagesWithoutScans (provided by PRP). This only shows 5k at a pop, doesn't filter out sub-pages, can't be sorted, updates only once a week, etc. So I think we need something better.

I haven't figured out the optimal form of that "better thing" yet. It could be a dedicated Toolforge tool with a sortable and filterable list, maybe letting you copy it as wikitext for on-wiki tracking and supporting multiple Wikisourcen (if the other sisters would like the same). Or it could be a huge table on-wiki that's updated by a bot on some interval.

But as a first approach I'm thinking the easiest and most useful (value for effort put in) would be to simply bot-add a category to all top-level mainspace pages that are not scan-backed. That'd be our "these need to be checked by a human" backlog category; and the we could build a process on top of it with per-step templates to indicate progress like "there's a scan at IA for this", "scan is on Commons", "Index: set up", "text Matched&Split but needs proofreading" etc.

I asked Samwilson for tips and he pointed me in the direction of checking the templatelinks table for pages that have no transclusions of pages in the Page: namespace. Special:PagesWithoutScans does roughly that in SpecialPagesWithoutScans.php.

But my Python-fu is barely toddler-level, and my SQL-fu even worse, so I'm kinda daunted by the though of trying to hack something like that up myself. I could probably do it, but where the goal would be relative to the heat death of the universe I am less certain of. :)

So… Is this something you'd be interested in helping out with? Does PWB have existing facilities for interrogating these tables (i.e. is the info exposed by the Action API, I guess)? Alternately, is your SQL-fu strong enough to make this reasonable / trivial to tackle? Do you have any thoughts on how best to tackle the problem?

No rush—it's a long term / backburner / not fully formed thing—and as sketched out above I'm not entirely sure what the best approach is yet. I'm just fishing for pointers, recruiting ~~unwitting victims~~ generous volunteers, etc. :) Xover (talk) 07:40, 31 August 2023 (UTC)[reply]

@Xover sorry for the late response but haven't spent much time here lately. Anyhow as a quick feedback for now:

1. pywikibot supports special pages as a pagegenerator (there is no 'official generator' available via CLI but it is easy to make one:

import pywikibot
site = pywikibot.Site('en', 'wikisource')
gen = site.querypage('PagesWithoutScans', 10)
list(gen)

(if you find the right SQL query, pywikibot can run SQL-queries)

2. My SQL is pretty basic, what I would do is to try to mimic the query in SpecialPagesWithoutScans.php in Quarry.

That's all for now, if I have the chance I will dig a bit deeper.Mpaa (talk) 16:16, 23 September 2023 (UTC)[reply]

Merger problem?[edit]

I noticed a problem/confusion with what appears to be a now six months old merger of two different scan versions of Index:The_Ramayana_Of_Tulsidas.djvu. I see that the transcription on the individual chapter pages is still intact: https://en.wikisource.org/wiki/The_R%C3%A1m%C3%A1yana_of_Tulsi_D%C3%A1s/Introduction

But I can no longer find the index where the individual pages are transcribed. The link on my Userpage now goes to the above linked Index page which is empty, with all pages red, even the ones that appear transcribed on the chapter pages. I can no longer find the transcribed table of contents page etc either. Where are they? TryKid (talk) 20:53, 8 September 2023 (UTC)[reply]

@TryKid See Index:The Rámáyana of Tulsi Dás.djvu Mpaa (talk) 16:23, 23 September 2023 (UTC)[reply]

Thank you for the help now and the merger. I was somehow confused by the old link on my userpage. regards, TryKid (talk) 22:40, 23 September 2023 (UTC)[reply]

Category:Scans_with_misaligned_text_layer[edit]

You might want to take a look at these if you get a spare moment. ShakespeareFan00 (talk) 22:24, 19 January 2024 (UTC)[reply]

List of scans with match-and-split pages[edit]

Here’s the list. Sorry for the delay, my computer deleted the list I had started.

Almost none of these have any work done. TE(æ)A,ea. (talk) 03:35, 25 February 2024 (UTC)[reply]

Page:The Yellow Book - 03.djvu/329[edit]

This page is orphaned and seems to be a duplicate of page 328. Can it be put for speedy delete ? -- Beardo (talk) 20:50, 17 March 2024 (UTC)[reply]

User talk:Mpaa

Welcome to Wikisource

Contents

Shakespeare's sonnets[edit]

How's your SQL-fu?[edit]

Merger problem?[edit]

Category:Scans_with_misaligned_text_layer[edit]

List of scans with match-and-split pages[edit]

Page:The Yellow Book - 03.djvu/329[edit]

Navigation menu

User talk:Mpaa

Welcome to Wikisource

Shakespeare's sonnets[edit]

How's your SQL-fu?[edit]

Merger problem?[edit]

Category:Scans_with_misaligned_text_layer[edit]

List of scans with match-and-split pages[edit]

Page:The Yellow Book - 03.djvu/329[edit]

Navigation menu

Search