Wikisource:Scriptorium

From Wikisource
(Redirected from Wikisource:SCRIPTORIUM)
Jump to navigation Jump to search
Scriptorium

The Scriptorium is Wikisource's community discussion page. Feel free to ask questions or leave comments. You may join any current discussion or start a new one; please see Wikisource:Scriptorium/Help.

Some announcements and newsletters are subscribed to Announcements.

Project members can often be found in the #wikisource IRC channel webclient. For discussion related to the entire project (not just the English chapter), please discuss at the multilingual Wikisource. There are currently 367 active users here.

Announcements[edit]

31 Oct: WikidataCon workshop on bibliographic data[edit]

On Sunday, 14:30-15:30 UTC (10:30 Eastern North America, 15:30 Central Europe, check for other time zones), a workshop at WikidataCon will be looking for fields to add to better capture the challenges posed by translations, series of works, and works serialized in installments. We see it as a way of kickstarting what we hope will be an ongoing conversation between data nerds of all stripes regarding books and periodicals.

Looking forward to meeting as many of you as possible tomorrow! Marianika (talk) 08:21, 30 October 2021 (UTC)[reply]

Thank you for the heads-up.
Notifying all members of Data (more info · opt out): (User:Inductiveload, User:MarkLSteadman, User:Xover, User:Languageseeker) One for these folks! Inductiveloadtalk/contribs 10:26, 30 October 2021 (UTC)[reply]
Also since it occurs to me that it's actually not that clear how to access the workshop if you're not already in WikidataCon, you have to register at https://pretix.eu/WDCon21/WDCon21 (free), then you can join Room 3 at https://wikidatacon2021.venueless.events/rooms/ccc25bfd-f080-4add-8082-fc25038b5c5c. Inductiveloadtalk/contribs 12:37, 31 October 2021 (UTC)[reply]

October Monthly Challenge[edit]

Wikisource laurier.svg

The Monthly Challenge has had another excellent showing, with 3672 pages proofread, validated, or marked "no text", smashing the target of 2000, and claiming a new monthly record by a margin of over 500 pages! This represents 12% of all the ~30000 pages processed at enWS in October.

Works completed include:

Validated works:

Only one work expired without being completed:

The November Monthly Challenge is in full swing and includes continuing works from last month, as well as:

...as well as many more. This month is testing the hypothesis that larger Monthly Challenges result in greater participation due to more works of interest being available even at the end of the month. If you agree with this idea, there's only one way to show it: make it come true!

There's also a new idea to add a few un-transcluded works to the MC to drive down the proofread-but-not-transcluded backlog.

Thanks especially to Languageseeker for the leg-work in setting this month's data up and shepherding October's challenge to record heights!

Come on in, the weather is getting cold for some of us, but the Monthly Challenge water is nice and warm. Inductiveloadtalk/contribs 15:47, 1 November 2021 (UTC)[reply]

New image viewer (aka OpenSeadragon)[edit]

As mentioned in the last Tech News, there is a new image viewer, based on the very capable OpenSeadraon viewer, which was deployed yesterday (should have been Wednesday but there was some unrelated blockage). This has been in review for several months. Only the basic implementation has made it in so far, the following features are complete, but still to come, pending review and merge:

  • Rotation control (merged, but missed the cut-off: will land next week)
  • Marker lines (you can preview this here, along with rotation control): 737973
  • Region-based OCR to allow OCRing of a part of a page/column/etc (also in the above preview): phab:T294903
  • Image zoom and position persistence on reload so you don't need to re-zoom when previewing: 737788
  • Configuration of zoom speed and smoothness: 740133

Something else that it enables is future touchscreen and mobile work, since it natively supports pinch-to-zoom (though it is not yet enabled in the mobile skin).

The new viewer does not (yet) implement the old click-to-toggle between scroll and zoom. There is a ticket for this phab:T296079, but the main question there is: was the old behaviour actually optimal in the first place, or are there more fluid methods to do this?

The very old 2006 toolbar is not working with it: this has new been fixed, bug again is pending review and deployment: phab:T296033.

The Jump to file (aka hi-res loader) script and the page_carousel scripts have been updated, and the former even loads "IIIF" tiled zoomable images from the IA. I am not personally aware of any other gadgets or scripts that are likely to explode, but if there are any, let me know. Inductiveloadtalk/contribs 14:01, 19 November 2021 (UTC)[reply]

Update: Due to unrelated breakage (a memory leak) the entire wmf.9 release has been rolled back. OSD will be unavailable until that release is re-attempted, maybe on Monday. If that does not go ahead, the next opportunity for an OSD release will be wmf.11 in a little under 2 weeks' time.
In the meantime, a demo site with (some) unmerged features is here and Beta Wikisource always shows the latest merged patches. People interested in improving the OSD "experience" before then are welcome to give it a go there and reply here or on Phabricator with bugs, and suggestions. Inductiveloadtalk/contribs 20:35, 19 November 2021 (UTC)[reply]

Proposals[edit]

New Request for Comment on Wikilinking Policy is open[edit]

I have just opened Wikisource:Requests for comment/Wikilinking policy. You will find there a proposed complete overhaul/rewrite of the current policy, which is now ready for review by the wider Wikisource community. It is proposed that the RfC will be open for two weeks. Please make your comments there rather than here. Beeswaxcandle (talk) 08:33, 14 March 2021 (UTC)[reply]

@Beeswaxcandle: I think 2 weeks / 72 hours is a little bit too aggressive, even for a presumed uncontroversial policy proposal like this. I understand the reasoning, but I just don't think the community is able to move that fast. For example, we have several long-time contributors that are currently in a phase where they check in only every couple of weeks. And I know for my own part that the local Covid status could easily make me too busy to check in here for weeks on end. We could still have an accelerated timeline (just not quite as accelerated as 2/72) if we notify of the proposal in an site notice and maybe even a talk page message to any established contributor that has been active in the last three months (or similar).
PS. And let me repeat my previous private kudos in public: you took my ongoing whining about the old policy and turned it into a concrete proposal for a new policy. Great work, for which I am extremely grateful! --Xover (talk) 09:25, 14 March 2021 (UTC)[reply]

Blocking policy regarding spammers[edit]

Context: Following on from Wikisource:Administrators'_noticeboard#Spamming_as_a_block_reason.

A very high proportion of the undesirable edits at Wikisource are spam: people or bots adding screeds of text to pages, usually their own users pages, usually containing links to external websites. We do not, as it happens, have a specific clause in the WS:Blocking policy to cover this. This means that the standard admin actions of blocking spammers may technically be outside the policy, depending on how you interpret "vandalism", and no-warning spam-bot blocks are counter to the "friendly warning" guidance.

I am particularly hostile to single-purpose spam accounts where the only action is spamming and the account is either one of the auto-generated names like BobSmith56 or has a particular connection to the spammed link or some other business.

I suggest the following section under "Justifications for blocking" to be added:

Spamming

Promotion of third-party websites, product or company is forbidden. This includes, but is not limited to, inserting links to external websites. This applies even if the spam is accompanied by text unrelated to the spammed material. This does not apply if the links or material is relevant to Wikisource (for example a link to a bookshop relating to a specific work under discussion).

Users who create accounts for the sole purpose of spamming may be blocked indefinitely and without prior warning, and the associated IPs blocked for a short period (the autoblock default is 1 day) to prevent immediate account recreations.

Spam may be reverted by any user without warning, and the revision may also be hidden by an administrator. If the the user is not a single-purpose spammer, the revert should be explained on their talk page.

Users who have made constructive edits, either before or after the spam edits, should not be blocked without warning.

Users should not be blocked from editing their own talk pages, unless they continue to insert spam there after being warned not to do so.

Blocks may also be performed by administrators who see attempted spam in logs of tools such as Special:AbuseFilter, even if the edit is prevented by the filter, as if the edit were made.

Some notes and rationale:

  • Allowing instant-blocks: single-purpose spammers generally do not engage or check for replies, so it's a waste of time to attempt to warn, and wastes editor time on the follow-up. Meanwhile the IP is available for further spamming. Blocks here also show in the SUL log for users at other wikis, so this can help other wikis too if it's cross-wiki spam account
  • Allowing instant-revert: this lowers the ping-pong overheads associated with handling spam
  • Any constructive edit prevents instant blocking: I have never seen a bot spammer make useful edits
  • Allowing talk page access by default: users should be allowed to appeal (no bot spammer will bother, this is an escape hatch in case a mistake is made)
  • Unrelated text: lots of spammers disguise the spam with some junk text like a fake mini-biography.
  • Abuse filter: often, spammers get stuck in one of the local or global abuse filters before they make a successful spam edit. Blocking at that time avoids having to later check back to see if they have eventually been successful example.

Inductiveloadtalk/contribs 09:27, 28 September 2021 (UTC)[reply]

Support. Languageseeker (talk) 21:17, 28 September 2021 (UTC)[reply]
Support. --Jan Kameníček (talk) 21:19, 4 October 2021 (UTC)[reply]
comment - expressing your "extremely unsympathetic" view as a summary block can become slippery. where would you include admin "wriggle room" for spammy newbies who have not been onboarded? could a bot revert, and add to a work list? it's good to document standards of practice, but a link to an appeal process would be helpful for the small amount clueless newbies. Slowking4Farmbrough's revenge 10:30, 6 October 2021 (UTC)[reply]
@Slowking4 the wiggle room is in the (quite deliberate) use of "may" rather than "should", "shall" or "must" in the phrase may be blocked. Putting it another way, the above authorises a block for spamming, but does not mandate it. Administrators are, as always, expected and required to use common sense when using their powers. Fortunately, it's also abundantly clear who is an capital-s Spammer and who is merely clueless.
Also, there is often nothing to revert, because the first signs of many spammers is hits on the spam filter or abuse filters, which act as something of an early warning system. Blocking them at that point avoids having to re-patrol for edits they do manage to slip though later on. Inductiveloadtalk/contribs 17:06, 13 October 2021 (UTC)[reply]
filters are opaque, and you should provide a path for audit of actions and appeal. veterans from time to time wonder what the red wall of text is preventing a save. if the filter prevents the disruption, then your rationale of "block because convenient" seems rather weak. you are creating a system of expanding no go zones, with no review or rationalization. i.e. you are going to get "willy on wheels" forever, even beyond the grave. --Slowking4Farmbrough's revenge 20:41, 15 October 2021 (UTC)[reply]
Veterans will not hit the (local at least) spam filters because they only apply to IPs and users with very small (single digit) edit counts. Mostly the spam appears to be overspray from other wikis, mostly NTSAMR, SEO services, or, oddly specifically, Indian politicians and entertainers.
Filters prevent the disruption in the first, but relying on them completely means that spammers are left free to experiment with filter-circumvention, because the spam filters purposely do not auto-block, no matter how many times someone hits them. Blocking when noticed also 1) prevents re-use of the account in future when no-one is looking 2) re-use of the IP for other accounts for the next day and 3) makes a record of the activity in the SUL log, which makes it clear to other wikis that the user is identifed as a spammer. Inductiveloadtalk/contribs 14:31, 3 November 2021 (UTC)[reply]
Symbol support vote.svg Support PseudoSkull (talk) 17:31, 6 October 2021 (UTC)[reply]
Symbol support vote.svg Support Modulo the talk page exception. Some of the spam bots specifically use the talk page, and the activity that is the target of this policy is so unequivocal that there is no need to permit them talk page access. There are in any case off-wiki options for appeal (e.g. through VRT, or approaching an admin on a different project (and we have unblocked users through this path that have been blocked under other provisions)) if by some miracle, or bad admin judgement, someone with a valid appeal reason should be caught in such a block. Or put another way, if there is a genuine need to permit them talk page access they probably shouldn't be blocked under this particular policy provision in the first place (it's "disruption" not spamming). --Xover (talk) 05:39, 12 October 2021 (UTC)[reply]
@Xover I understand the point, but currently there is not a major issue of spamming on their own talk pages after a block. I'd personally rather reactively amend the policy if needed. Inductiveloadtalk/contribs 16:55, 13 October 2021 (UTC)[reply]
@Xover also we have a shiny new AbuseFilter (#48) that disallows edits by blocked users only that contain non-wiki-relevant links in the User_talk namespace. So if a user has been blocked, they can still appeal on their user talk page. If they're a spammer and try to abuse their talk page access to post any spam that's got an external link in it (i.e. all of it), it will be disallowed. If disallowed, a specific reason will be given: MediaWiki:Abusefilter-blocked-user-adding-links. The block will not be extended, the user will just be told to try again, but without the external links.
I don't anticipate this being hit very often (most spammers move on after blocks, those who don't are more likely to be vandals or LTAs than garden-variety spammers), but I think it should cover the bases while also ensuring that the chance of a bad block being un-appealable on-wiki is as low as possible. While that is extremely unlikely (due to spam being generally very obvious), IMO we should make an effort to ensure we do not have a policy that results un-appealable blocks.
Again, if spammers do start trying to game it, we can revisit, but I don't see evidence that they are reactive to us specifically since we're a small target and mostly just receive overspray spam from the big boys. Inductiveloadtalk/contribs 10:43, 30 October 2021 (UTC)[reply]

Amending WS:WWI[edit]

As suggested by Inductiveload in a thread below, I propose amending WS:WWI to create a new section "Defining what is not included" to state "Wikisource no longer accepts any new texts from Project Gutenberg, or similar second-hand transcriptions, of any sort", even if "scan" backed by a DJVU, PDF, or any other format accepted by Proofreader Page extension created from that text. Languageseeker (talk) 00:25, 21 November 2021 (UTC)[reply]

I would say "no longer accepts any new" rather than "does not include any new". --EncycloPetey (talk) 00:37, 21 November 2021 (UTC)[reply]
Good idea. Updated. Languageseeker (talk) 02:02, 21 November 2021 (UTC)[reply]
Symbol support vote.svg Support --EncycloPetey (talk) 19:29, 23 November 2021 (UTC)[reply]
I generally support, but I wonder if there should be an exception for when there is genuinely no extant scan at all and no possibility of getting our greasy paws on one. WWI follows w:WP:IAR in that it can always be overridden on a case-by-case basis with some level of community agreement. Maybe we should call out that possibility explicitly here (as opposed to a tricky-to-write-clearly policy carve-out)? Inductiveloadtalk/contribs 09:56, 22 November 2021 (UTC)[reply]
Why? Cygnis insignis (talk) 12:06, 22 November 2021 (UTC)[reply]
@Inductiveload: There already is "Some works which may seem to fail the criteria outlined above may still be included if consensus is reached. This is especially true of works of high importance or historical value, and where the work is not far off from being hostable. Such consensus will be based on discussion at the Scriptorium and at Proposed deletions." in the policy. This would take care of your case. Languageseeker (talk) 12:28, 22 November 2021 (UTC)[reply]
@Languageseeker Fair enough. Symbol support vote.svg Support. Inductiveloadtalk/contribs 12:34, 22 November 2021 (UTC)[reply]

Enable mobile talk page tabs (and page NS prev/next/index) for anonymous users[edit]

Currently, anonymous users on the the mobile site do not see the talk page tabs. This means they also do not see the next/previous/index page tabs.

I suggest that we enable this via a configuration change: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/741097. Config changes generally need some level on community support, so please support (or oppose) below. Inductiveloadtalk/contribs 10:05, 24 November 2021 (UTC)[reply]

Bot approval requests[edit]

Repairs (and moves)[edit]

Designated for requests related to the repair of works (and scans of works) presented on Wikisource

Other discussions[edit]

Policy on substantially empty works[edit]

[This is imported from WS:PD, where it applies to multiple current proposals, and several other works].

We have quite a few cases of works that are "collective" or "encyclopaedic" in that they comprise many standalone articles of individual value, which are basically just "shell pages", with no substantial content of any sort, not even imported scans or Index pages. For example, and this isn't intended to make any statement about these specific works, they're just examples and they may well get some work done soon during their respective WS:PD discussions:

Based on the usual rate of editing for things like that, unless dragged up into a process like WS:PD, they'll remain that way a very, very long time. I think it is perhaps there might be a case to host a mainspace page for this work, even though there is zero, or almost zero actual content. Do we want:

  • Mainspace pages where this is a tiny bit of information like header notes, scan links and maybe detective work on the talk page (not in this case). This provides a place for people to incrementally add content. Also gives "false positive" blue links, since there is actually no "real" content from the work itself, or
  • Do not have a mainspace page until there's some content. Only host this in terms of scan links author/portal scan links, much like we do for something like a novel.

Personally, I lean (gently) towards #2, but with a fairly low bar for how much content is needed. Say, Indexes, basic templates, a title page and one example article. Ideally, a completed TOC if practical, especially for periodical volumes/numbers. It is fair to not wish to transcribe entire volumes of these work, it is fair to not want to import dozens of scans when you only wanted one, it is fair to only want an article or two, but it's not fair, IMO, to expect the first person who wants to add an article to have to do all the groundwork themselves, despite having been lured in with a blue link. That onus feels more like it should be on the person creating the top-level page in the first place.

I do see some value in periodical top pages with decent lists of volumes and scans where known, because these are often tricky and fiddly to compile from Google books/IA/Hathi, so it's not useless work, even if there are no imported scans (though imported is better than not).

We currently have a large handful of collective works listed for deletion right now in various levels of "no real content", and, furthermore, every single periodical that gets added can fall into this situation unless the person who adds, so I think we could have a think about what we really want to see here. Inductiveloadtalk/contribs 15:43, 3 July 2020 (UTC)[reply]

  • I believe that, if there is no scan as an Index: page, the main-namespace page should not exist unless it is being actively completed or is already mostly completed. A few pages (of the volume itself) is not very helpful, and is entirely useless if their is no scan given. TE(æ)A,ea. (talk) 15:59, 3 July 2020 (UTC).[reply]
  • I think such preparatory information would ideally be on more centralized WikiProject pages (for the broad subject), both for clarity and to assist in keeping different efforts consistent -- but that it certainly should be retained as visible to non-admins. I think that the red vs blue link issue is minor (but not totally negligible) and outweighed by the disadvantages of hiding the history of previous efforts. I strongly encourage redirecting such pages to appropriate WikiProject pages (after copying over the details there). JesseW (talk) 18:11, 3 July 2020 (UTC)[reply]
  • @JesseW: I agree that history shouldn't be deleted, but I think we should approach this in terms of what we want to see from these works, rather than what to do with the handful of examples at PD. There are hundreds of periodicals we could have but don't, and this applies to those as well. If we can come to a conclusion about what is and isn't wanted, we can make all the deletion requested works conform to that easily enough. Inductiveloadtalk/contribs 20:55, 3 July 2020 (UTC)[reply]
  • I think these pages are necessary to list index pages and external scans of multi-volume works (such as encyclopaedias and periodicals) especially if they are wholly or partly anonymous or have many authors or are simply large. I think it makes no difference whether such pages are in the mainspace, the portal space or the project space (except that it is harder to find pages outside the mainspace). The point is that these works often have so many volumes (often dozens or hundreds) that they must have their own page, and cannot be merged into a larger portal or wikiproject. If the community starts insisting on index pages, what will happen is the rapid upload of a large number of scans for the periodicals that already have their own page. Likewise if the community insists on transclusion. I also think it is reasonable to have a contents page in the mainspace, as it allows transclusion of articles. Most importantly, new restrictions should not immediately apply to existing pages that were created before the introduction of the restrictions. This is necessary to prevent a bottleneck. James500 (talk) 23:55, 3 July 2020 (UTC)[reply]
move the works to a maintenance category, and i will work them; delete them and i will not: i find your sword of Damocles demotivating. Slowking4Rama's revenge 01:55, 5 July 2020 (UTC)[reply]
@User:Slowking4: I am not proposing a sword of Damocles. I agree that the imposition of deadlines is counter-productive. I do not support the deletion of any of these pages. I would prefer to see them improved. James500 (talk) 04:38, 5 July 2020 (UTC)[reply]
TEA is on his usual deletion spree. not a fan. will not be finding scans to save texts, any more. he can do it. Slowking4Rama's revenge 00:15, 6 July 2020 (UTC)[reply]
The entire point of moving this here, and not staying at WS:PD is to decouple from the emotions that get stirred up in a deletion discussion. Let's keep deletion out of this. If we come up with some idea of what we do and don't want, then we can go back to WS:PD and decide what to do. I imagine that all that will be needed will be a fairly limited amount of housework to bring those works up to some standard that we can decide on here, and all the collective works there will be easy keeps. Hopefully with some kind of consensus that we can point at to outline a minimum viable product for such works going forward. There are hundreds and thousands of dictionaries, encyclopedias, periodicals and newspapers that we could/will, quite reasonably, have only snippets of. How do we want to present them? What, exactly, is the minimum threshold? Let's head of all those future deletion proposals off at the pass, because deletion proposals often cause friction. Inductiveloadtalk/contribs 00:47, 6 July 2020 (UTC)[reply]
and yet deletion is the default method to "motivate" quality improvement. i reject your assertion that "emotions get stirred in a deletion discussion", rather, anger is a valid response to a repeated broken process being kicked down on the volunteers. it is unclear that a minimum threshold is necessary, rather a functional quality improvement process is. until we have one, you should expect to see this periodic stirring of emotions, as the non-leaders act out. Slowking4Rama's revenge 11:53, 9 July 2020 (UTC)[reply]
@Slowking4: Thank you for presenting this opinion, and I'm sorry if I have not made myself clear. We do need to figure out how to avoid a de-facto process of using WS:PD as an ill-tempered ad-hoc venue for "forcing" improvements on people who have somehow managed to generate works that are so in need of improvement that another user has nominated them for deletion. Please also consider looking at #Re-purpose_WikiProject_OCR_to_WikiProject_Scans for an idea to have a "functional quality improvement process" to which such works could be referred upon discovery rather than kicking them straight to WS:PD. If you have other ideas or you have previously suggested something similar to address these frustrations, you could detail them there. Personally, I think we should always prefer improvement over deletion. Exactly what the remediation is (refer to a putative WP:Scans, WS:Scriptorium/Help, directly WS:PD as now, or something else) is not what this thread is for. This thread is for discussing, what, if anything, should be the tipping point for deeming a page "lacking" and doing something about, whatever "something" is. I don't think I can be much clearer that this is not about deletion. If we also have a better venue for improvements, then that's even better.
For example, my personal feeling and !vote on A Critical Dictionary of English Literature is "keep and improve", despite it lacking scans or even links to scans, having only one article and no other content, not even a title page: in short, failing almost every criterion suggested so far in this thread. The only thing it does have is have is good text quality of the one entry. I personally do not think this work should be deleted, but I do think it should be improved in specific ways. The first half of that sentence is not the focus of this discussion, the second half is. Inductiveloadtalk/contribs 14:18, 9 July 2020 (UTC)[reply]
deletion threat has been an habitual method of communicating by admins since the beginning of the project. and text dumps have been habitual following in the guttenberg example. culture change and process change would be required to change those behaviors. we could may it easier to start scan backed works, but the wishlist was not supported. Slowking4Rama's revenge 21:00, 14 July 2020 (UTC)[reply]

I don't think this needs to be much of an issue going forward -- we all agree that it's OK to create Index pages for scans, even if none of the Pages have been transcribed yet; so the only case where this would come up is recording research where no scan has yet been identified as suitable to be uploaded. And for that, I still think a WikiProject page is the right location, not mainspace. (Or, if you must, your userpage.) JesseW (talk) 00:59, 6 July 2020 (UTC) I realized I may not have been clear enough here -- in my view, the ideal process goes like this:[reply]

  1. Decide on a work you are interested in (in this case, a periodical/encyclopedic one) -- don't record that anywhere on-wiki (except maybe your user page)
  2. Find and upload (to Commons) a scan of one part/issue/etc of the work.
  3. Create a ProofreadPage-managed page in the Index: namespace for the scan. (You can stop after this point, without worry that your work will later be discarded.)
  4. EITHER
    1. Put further research (on other editions, context, possible wikification, etc.) on that Index_talk page.
    2. Proofread a complete part of the scan (an article from the magazine issue, a chapter from the book, a entry from an encyclopedia, etc.) and transclude it to the mainspace (and create necessary parent pages), and put the further research on the Talk: page of the parent mainspace entry.

If you can't find any scan, and don't want to leave your working notes on your user page, put them on a relevant WikiProject's page.

If you come across such research done by others and misplaced, follow the above process to relocate it to an appropriate place, then redirect the page where you found it to the new location. That's my proposal. JesseW (talk) 01:08, 6 July 2020 (UTC)[reply]

@JesseW: It's not clear to me in your above whether when you use the term "index" you refer to a ProofreadPage-managed page in the Index: namespace, or a general wikipage in the main namespace on which an index-like structure (and/or a ToC, or similar) is manually created. Could you clarify? --Xover (talk) 05:14, 6 July 2020 (UTC)[reply]
I meant the namespace. Clarified now. JesseW (talk) 05:17, 6 July 2020 (UTC)[reply]
  • Hoo-boy. Y'all sure know how to pick the difficult issues…
    My general stance is that: 1) scans and Index: (and Page:) namespace pages have no particular completion criteria to meet to merit inclusion, and can stay in whatever state indefinitely (there may be other reasons to get rid of them, but not this); and 2) the default for mainspace is that only scan-backed complete and finished works that meet a minimum standard for quality should exist there.
    That general stance must be nuanced in two main ways: 1) there must be some kind of grandfather clause for pre-existing pages; and 2) there must exist exceptions for certain kinds of works that meet certain criteria. I won't touch on the grandfather clause here much, except to say I'm generally in favour of making it minimal, maybe something like "No active effort to get rid of older works, but if they're brought to PD for other reasons they're fair game". The design of a grandfather clause for this is a whole separate discussion, and an intelligent one requires analysis of existing pages that would be affected by it. It is always preferable to migrate pages to a modern standard, so a grandfather clause is by definition a second choice option.
    Now, to the meat of the matter: the exceptions…
    We have a clear policy to start from: no excerpts. Works should either be complete as published, or they should not be in mainspace. But quite apart from the historical practices that modify this (which are somewhat subjective and inconsistent, so I'll ignore them for now), there are some fairly obvious cases that suggest a need for more nuance than a simple bright-line rule alone provides. The major ones that come to mind are: 1) massive never-completed projects like EB1911 or the New York Times (EB because it's big; NYT because new PD issues are added every year); 2) compilations or collections of stand-alone works with plausible claim to independent notability.
    For encyclopedias and encyclopedia-like things, we have to accept some subsets due to sheer scale of work. But when that is the grounds for exception, there needs to be some minimum level of completion. I'm not sure I can come up with a specific number of pages/entries or percentage, but it needs to be more than just a single entry (and, obviously, only complete entries). For this kind of exception to apply, I think it needs to be a requirement that the framing structure for it is complete: that is, the mainspace page should give a complete overview of the relevant work even if most of it is redlinks. That includes title pages and other prolegomena when relevant. For a periodical like the NYT, that means complete lists of issues with dates and other such relevant information (e,g. name changes etc.). For preference, these kinds of things should be in Portal: namespace or on a WikiProject page until actually complete, but that will not always be practical (EB1911 and NYT are examples of this). Mainspace or Portal:-space should never contain external links (i.e. to scans) or links to Index: or Page: space (except the implied link of transclusion and the "Source" tab in the MW UI provided by ProofreadPage).
    For exception claimed under independent notability there are a couple of distinct variants.
    Newspaper or magazine articles need to have a certain level of substance in addition to a specific identifiable byline (possibly anonymous or pseudonymous, and possibly identified after the fact by some other source, such as the Letters of Junius) in order to qualify. It is not enough to ipso facto be a newspaper article, a magazine article, a poem, or an encyclopedia entry. On the one hand we have things like dictionaries and thesauri, where an entry could be as little as two words. Or a one-sentence notice without byline in a newspaper. Or two rhymed lines (technically a poem) within a 1000-page scholarly monograph.
    To merit this exception it should be reasonable to argue that the "work" in question should exist as a stand-alone mainspace page (not that we generally want that; but as a test for this exception, it should be reasonable to make such an argument). This would clearly apply to moderately long entries in the EB1911 written by a known author that has their own Wikipedia article. It would apply to short stories or novella-length serialisations in literary magazines by authors that have later become famous (or "are still …"). It would apply to various longer-form journalistic material from identifiable journalists (again, rule of thumb is notable enough for enWP article), including things in magazines that have similar properties. For most periodicals the most relevant atomic (indivisable) part is the issue not the entry or article, but with some commonsense exceptions.
    It would, generally, not apply to things that are works by a single author, like a scholarly monograph that just happens to be arranged in "entries" rather than chapters. It would not apply to things that are essentially lists or tables of data. It would not apply to short entries in something encyclopedia-like or entries that are not by an identifiable author. The OED for example, iirc, is a collective work where entries are by multiple not individually identifiable authors (and each entry is mostly very short too); only the overall editor is usually cited.
    For works claiming this exception too the framing structure should be complete, even if most of it are redlinks. The same general rules about Portal:/WikiProject and no external or Index:-space links apply. An exception would be for periodicals where new issues enter the public domain every year; and we should generally avoid including even redlinks for the non-PD issues here (but may allow them in a WikiProject page). For non-periodical works in multiple volumes where some volumes were published after the PD cutoff, including listings for the non-PD volumes (but not links to scans; those are a copyvio issue) is ok.
    Poems, short stories, and novellas are a special class of works here. A lot of these were first published in a magazine (possibly serialized), and a lot of them exist as multiple editions in substantially the same form. Some exist in multiple versions. These should all primarily exist the same way as chapters as part of their various containing works; but there are some cases where we might want to have, for example, a series of connected pages of the poems of Emily Dickinson. I am significantly ambivalent about this practice, as it amounts to making our own "edition" or "collection" of her poems (in violation of several of our other policies), but I acknowledge that it is an established practice and it is something that has definite value to our readers. It may be that it is actually a practice that should be governed by its own dedicated policy rather be attempted to be handled within these other general policies.
    For the sake of example; applying this to the works Inductiveload listed at the start of this thread would shake out something like this:
    Auction Prices of Books—This work appears to have no sensible subdivisions and is in any case by a single author. I see no obvious reason to grant this work an exception, except under sheer volume of work and even there I would want to see both a substantial proportion completed and some kind of ongoing effort towards completion (no particular time frame, but definitely not infinite and definitely not as an effectively abandoned project). In a deletion discussion I would very likely vote to delete the mainspace pages here (but, as nearly always, to keep the Index: and Page: namespace artifacts). I don't see this as a reasonable candidate for a Portal:, nor really a good fit for a WikiProject (though I probably wouldn't object to a WikiProject if someone really wanted one).
    Central Law Journal/Volume 1—A single volume is too little, so I would want to see a complete structure for the entire Central Law Journal, with level of detail for each volume similar to the one existing volume. Each article in the journal can be individually considered for a stand-alone work exception; but for the collection I would want to see at minimum a full issue finished to justify having the mainspace structure, and preferably multiple issues (in a deletion discussion I might insist on multiple issues). Index: and Page:-space artefacts can, of course, stay. A Portal: might make sense for selections from the journal, of articles that meet the standalone work exception. A WikiProject to coordinate work and track links to scans etc. might be a decent fit here, if someone wanted that. As it currently stands I would probably vote delete for the mainspace artefacts (with option to move whatever content has reuse value to a non-mainspace page for preservation; and undeleting if someone wants to work on something is a low bar).
    A Critical Dictionary of English Literature—The top level mainspace page has near-zero value, existing only to link to the single transcribed entry. For a credible claim to exception to exist it would need to be a complete framework for the work as a whole, and significantly more than a single entry must be complete. I would probably also want to see ongoing work, unless a substantial percentage of the entries were complete. The single finished entry is eligible to claim a standalone work exception, but I think it probably would not meet my bar for that (I might be wrong; and the rest of the community might judge it differently). In a deletion discussion I would probably vote to delete all the mainspace artifacts here (as always keeping Index:/Page: stuff) but with a definite possibility that I might be persuaded on the one completed entry (an absolute requirement for convincing me would be to scan-back it: as a separate issue, my tolerance for grandfathering of non-scan-backed works is small, and effectively zero for new/non-grandfathered works).
    Bradshaw's Monthly Railway Guide—Would need a full framework and a number of individual issues finished to merit a mainspace page. I see no credible subdivisions for a standalone work exception, but might be persuaded otherwise if, say, one of the train tables was used as a (reliable primary) source in a Wikipedia article (implying some sort of notability beyond just being raw data). In a deletion discussion I would probably vote to delete all mainspace artifacts here. If anyone made the argument, I would entertain the notion that there is value in treating train tables like poems, and hosting a series of train tables like we do Dickinson's poems; but that would require a substantial number of them completed.
    For everything above my stance is nuanced by a willingness to accept temporary exceptions for things that are actively being worked: active being operative, but with no particular deadline to complete the work. We have differing amounts of time available, and some works are so labour-intensive or tedious to do, that my person threshold for "active" is a pretty low bar to clear. If it's months and years between every time you dip in and do a bit I might start to get antsy, but days or weeks probably won't faze me. And that the projected time to completion is very long at that pace is not particularly a problem so long as it is not infinite. Within those parameters I would always tend to err on the side of letting contributors just get on with it in peace, regardless of any of the policy-like rules sketched above.
    I also want to emphasise that I think this is a very difficult issue to deal with. There are a lot of competing concerns, and a lot of grey areas that will likely take individual discussions to resolve. My balance point on this issue is partly formed by a broader concern about our overall quality (we have waay too many works of plain sub-par quality, and too many not up to modern standards) and a hope that by preventing the creation of these kinds of works (rather than deleting them after creation) we will be able to retain the good and desirable exceptions without dragging down quality, and without the traumatic and stressful events that deletions and proposed deletion discussions are.
    And for that very reason I am grateful this issue was brought up here for discussion, and I hope we can end up with some clear guidance, possibly in the form of a policy page, going forward. And in any case, since it will create de facto policy, this is a discussion that needs to stay open for a good long while (there are several community members that have not yet commented whose opinion I would wish to hear before closing this), and depending on how well we manage to structure the consensus, may also require a formal vote (up in the #Proposals section). --Xover (talk) 09:03, 6 July 2020 (UTC)[reply]
  • Symbol oppose vote.svg Oppose. It is becoming clear that a policy on incomplete works in the mainspace is going to place enormous pressure on individual editors. I think it would be more effective to start a wikiproject devoted to scan-backing works that lack scans and so on. James500 (talk) 12:14, 6 July 2020 (UTC)[reply]
    • @James500: FYI, this thread was made in order to provide an exception to the current policy of "no excerpts". A literal reading of the policy as it stands has a plausible chance of coming down delete on the mainspace pages over at WS:PD. This thread is a chance to come up with a better way to support such partial collective works. That we have several substantially incomplete and abandoned collective works lolling around in mainspace is actually the result of laxity in respect to stated policy (not to say I think it's a bad thing). The deletion proposals, whatever you may think of them, are actually not in contradiction to policy. That said, as always, there is scope to adjust policy. Which is what this is.
    • Now, in terms of a WikiProject to scan back works, I think that is a good idea. See #Re-purpose_WikiProject_OCR_to_WikiProject_Scans above, which proposed to reboot Wikiproject OCR as a scan-backing Wikiproject. Inductiveloadtalk/contribs 14:40, 6 July 2020 (UTC)[reply]
      • The policy says "When an entire work is available as a djvu file on commons and an Index page is created here, works are considered in process not excerpts." A literal reading of that policy is that no scan-backed work is an excerpt (it is expected to be completed eventually). Further the policy refers to "Random or selected sections of a larger work". A literal reading of that expression is that it does not include lists of scans, or auxilliary content tables, as they are not "sections" (they are not part of the work), and that not every incomplete portion of a work is either "random or selected" (which would not include starting from the beginning and getting as far as you can, with intent to finish later). I could probably argue that an encyclopedia article or periodical article is a complete work. James500 (talk) 15:16, 6 July 2020 (UTC)[reply]
  • Nice wall of text, Xover (and I say that with great respect!) -- it generally makes sense and sounds good to me. As another hopefully illustrative example, take The Works of Voltaire, which I've been digging thru lately. I think this would very much satisfy your criteria as a large work, with sufficient scaffolding to justify the mainspace pages that exist for it. I would love to hear others thoughts on that. JesseW (talk) 16:07, 6 July 2020 (UTC)[reply]
    @JesseW: Yeah, apologies for the length. Brevity is just not my strong suit.
    The Works of Voltaire probably qualifies on sheer scale of work, yes. I don't think the current wikipage at The Works of Voltaire is quite it though: as it currently stands it is more WikiProject than something that should sit in mainspace (its contents are for Wikisource contributors, to organise our effort, not our readers, who want to read finished transcriptions). It also mixes a work page with a versions page in a confusing way. So I would probably say… Move the current page to Wikisource:WikiProject Voltaire; create a new The Works of Voltaire as a pure versions page, linking to…; The Works of Voltaire (1906), that is set up as a work page with the cover and title (and other relevant front matter) of the first volume, and an AuxTOC (and possibly also the {{Works of Voltaire}} volume navigation template). I don't know how tightly coupled the volumes of this edition are (does the first volume have a common ToC or index of works for all the volumes?), so some flexibility on format may be needed to make sense. But as a base rule of thumb it should start from a regular works page and deviate only as needed to accommodate this work (mainly the size is different).
    In any case… With a volume or two completed (they're only ~350 pages each) I'd be perfectly happy having something like that sitting around. With less then that I'd possibly be a bit more iffy, but it's hard to put any kind of hard limit on that. And with somebody actively working on it I'd be in no hurry whatsoever regardless of current level of completion.
    PS. I'm pretty sure a large proportion of the contents of these volumes are works that would qualify under "standalone works" that could exist independently in mainspace, regardless of what's done with the The Works of Voltaire page. Even his individual poems and essays can presumably make a credible claim here (because it's Voltaire; less famous authors would have a higher bar). Better as part of the edition, but also acceptable on their own. --Xover (talk) 16:56, 6 July 2020 (UTC)[reply]
  • @JesseW: I personally take no issue with this page's existence (actually I think it's a nice work and good way to allow an important author's works to be slotted in piece-by-piece. I have some general comments which overlap with this thread (written before Xover's reply, so pardon overlap):
    • First off, I differ with Xover in terms of the scan links: I think they're better than nothing, and I don't see much value in duplicating the volume list onto an auxiliary page just to add scan links. However, I can sympathise with the sentiment that our mainspace shouldn't direct users off-wiki (or at least off-WMF). But if we don't have the scans, and that's what the user wants, they're leaving anyway. Real answer: import moar scans!
    • No scan links are necessary where the volume exists in mainspace and is scan-backed (e.g. v3)
    • Ext scan links should only be used when there is no Index page or imported scan. Use {{small scan link}} or {{Commons link}} when possible (e.g. v2)
    • The first volume list could probably be in an AuxTOC to mark it out as WS-generated content.
    • The "Other editions" section belongs on an auxiliary namespace page (Talk, Portal or Wikisource). I suggest the Talk page is best in this case. Inductiveloadtalk/contribs 17:35, 6 July 2020 (UTC)[reply]
  • @Xover: I am in agreement with the majority of what you say. Particularly, I think a framework around any collective work (be it a single-volume biographical dictionary or a 400-issue literary review spanning 80 years) is the critical prerequisite, plus at least some scans, the more the merrier. Where I think I differ:
    • I am inclined to be a bit more relaxed in terms of how much of a work we need. As long as a single article exists, it's not "trivial" (e.g. only a short advert or some incidental text like a "note to correspondents", as opposed to an actual article), it's well-formatted and scan-backed, and a complete framework exists, including front matter and a TOC, such that's it is easy for anyone to slot in new pieces, I'd be fairly happy. Lots of periodicals have all sort of tricky bits like tables of stocks or weather tables and writing into policy that those must be proofread in order to get the "real" articles into mainspace would be a chilling effect, in my opinion. If you allowed an exception, it would be verbose and tricky to capture the spirit without saying "unless, like, it's totally, like, hard, man".
    • I am not dead against scan links in the mainspace at the top level, when such a top-level page exists. See my comments on Voltaire above. I am against them where they could sensibly be on an Author page and they are the only mainspace content.
    • I am ambivalent on the presence of, e.g., disjointed train timetables. It's not my thing to have a smattering of random timetables, but as long as they're individually presented nicely, it's not too offensive to my sensibilities. I might question the sanity of someone who loves doing tables that much, but whatever floats the boats! Also, I think that this might circle back to "good for export" - a mark which certainly would require completed issues or volumes. If you want to get that box ticked, you have to do it all.
    • Re the "notability" aspect of individual articles, I'm not really bothered by that, as I don't think we'll see a flood of total dross because few people really want to take the time to transcribe 1867 articles about cats in a tree from the Nowhere, Arizona Daily Reporter, and, actually I think some of the "dross" can be quite interesting in a slice-of-life kind of a way (always assuming well-formed and scan-backed). And the real dross is usually so bad (no scans, raw OCR, etc) that it can be dealt with outside of this topic. I think part of the value of WS is the tiny, weird and wonderful, not just in blockbusters like War and Peace and Pultizers. I think I might like to see more of our articles strung together thematically via Portals, but that's another day's issue. Inductiveloadtalk/contribs 17:35, 6 July 2020 (UTC)[reply]
      • @Inductiveload: We appear to be mostly in agreement. But… instead of me dropping another wall of text on the remaining points of disagreement, maybe that means we're in a position to try to hash out a draft guidance / policy type page with the rough framework? Then we could go at the remaining issues point by point. Because I think I'm in with a decent chance to persuade you to my point of view on at least some of them, but this thread is fast getting unwieldy (mostly my fault). It would also probably be easier for the community to relate to now, and much easier to lean on in the future. --Xover (talk) 18:31, 6 July 2020 (UTC)[reply]
        • @Xover: If there are no more comments forthcoming after a couple of days, I think that makes sense. I don't want to railroad it: considering we have at least one !vote for "do nothing", I'd like to see if there are any other substantially different opinions floating about. Inductiveloadtalk/contribs 17:41, 7 July 2020 (UTC)[reply]

The quantity of text here has grown far faster than my ability to absorb it, so rather than continue to put it off, here's my position: I don't see any problem with transcriptions that are scan-backed, even if the transcription only covers a small fraction of the entire scan. If Sally chooses (say) to transcribe a favorite story, that happened to be published in an issue of Harper's back in the 1890s, and goes to the trouble of uploading the full issue, but only creates pages for the one story that interests her, I think that's great. It doesn't matter to me whether she intends to work on the other pages or not. If it's not scan-backed, but it's fairly high quality, I am personally willing to do some work trying to locate a scan and match it up to the text; I'd rather we take that approach, than deletion, though of course deletion is the better option in some cases where the scan is very hard to come by.

If all this has been said above, or if I've misunderstood the topic, my apologies. Please take this comment or leave it, as appropriate. -Pete (talk) 02:00, 8 July 2020 (UTC)[reply]

Apologies, I see I had missed the point.

I disagree with Xover's statement that a top-level page for a publication, with a link only to a single article within the publication, has "near-zero value." Such a page can serve an important function linking content together in ways that help the reader (and search engines) find the content they're looking for, or understand the context around it. For instance, A Critical Dictionary of English Literature is linked from the relevant Wikidata entry. The banner on the Wikisource page clearly tells a Wikisource reader that they won't find a full transcription here; and with a simple edit, it could link to a full scan on another site, or (with perhaps a little more effort) even transcription links here on Wikisource. This page has been here since 2010; we don't have any way of knowing what links might have been created elsewhere in the intervening decade. (I do think that new pages like this should not be created without a scan at Commons to be linked to.) -Pete (talk) 02:12, 8 July 2020 (UTC)[reply]

I'm really bad with walls of text, so I have only read a tiny portion of the above discussion. But I want to mention a couple of things that I think are worth considering in this discussion.
  • Most of the time, a mainspace "work" that is only a table of contents, but which has none of the actual content, and is not actively being worked on, can be (and should be) deleted as No meaningful content or history under our deletion policy.
  • A mainspace work that has only a little bit of content, but that content is a work unto itself within the scope of Wikisourse, should be kept. Most periodicals are like this. For an example, see the Journal of English and Germanic Philology which only has one hosted article, but that hosted article is scan-backed and firmly within scope.
  • On some occasions, empty mainspace works do have value. I ended up creating the page The Roman Breviary, depsite containing no actual content, mostly because there are a lot of works that link to it, using many different titles, and if someone uploaded a copy of the work under one title then many of the links would remain red because they point to different titles of the work. This could be easily solved by creating redirects to a simple placeholder page, so I did. I tried to make the placeholder page as useful as a placeholder page can be, as it contains useful information about the history and authorship of the work, and links to the Index pages where the transcription will take place.

Anyway those are my 2 cents, sorry if they are redundant —Beleg Tâl (talk) 00:40, 29 July 2020 (UTC)[reply]

Proposal[edit]

Since there has been no extra input for a month, and not wanting this section to get archived without at least attempting a proposal, I have started a proposal #Collective work inclusion criteria above. Inductiveloadtalk/contribs 11:00, 25 August 2020 (UTC)[reply]

Since the proposal has now slipped off the main page (to here), with vague support for the first part (collective work inclusion criteria) and a fairly consistent opposition to the second (no-content pages), my plan is to transfer the first part, as guidelines rather than policy, to Wikisource:Periodical guidelines. As non-binding guidelines, they can then be worked on further in situ. Sound OK? Inductiveloadtalk/contribs 08:10, 16 April 2021 (UTC)[reply]
The example given in Wikisource:Periodical guidelines might be improved, PSM is and was an exercise that has gone its own way (no offense to @Ineuw:, this is a site under development and that is only one example).CYGNIS INSIGNIS 13:05, 17 April 2021 (UTC)[reply]
@Cygnis insignis: You would be wrong to think that I am offended. Remember that when I started, I knew everything. By now, so much of that knowledge is lost that I am happy to listen. Would you elaborate please? — Ineuw (talk) 19:50, 17 April 2021 (UTC)[reply]

I've created Bradshaw's Monthly Railway and Steam Navigation Guide (XVI) - it couldn't be done on one page, due to the very high number of template transclusions. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:52, 1 September 2020 (UTC)[reply]

@Pigsonthewing: The links in the toc on that page appear non-functional. Also, depending on just exactly which templates were the culprit, it is possible that you may be able to put all the content you wanted onto one page now due to some recent technical changes (template code moved to a Lua module which drastically improves performance and prevents hitting transclusion limits until much later). Xover (talk) 11:17, 14 September 2021 (UTC)[reply]
Create the Draft namespace to hold substantially empty works? Then delete if no improvement after months?--Jusjih (talk) 19:22, 1 November 2021 (UTC)[reply]
The issue is that the "substantially empty works" can have useful and complete content that stands alone. For example, an article from a scientific journal.
I would not want to see that either shunted into a Draft namespace to rot or deleted a few weeks down the line.
Index and Page namespaces provide our long term staging areas, and works can and do remain unfinished there for years. But what do we do when a self-contained piece of a larger work is ready? Inductiveloadtalk/contribs 20:29, 1 November 2021 (UTC)[reply]

Universal Code of Conduct News – Issue 1[edit]

Universal Code of Conduct News
Issue 1, June 2021Read the full newsletter


Welcome to the first issue of Universal Code of Conduct News! This newsletter will help Wikimedians stay involved with the development of the new code, and will distribute relevant news, research, and upcoming events related to the UCoC.

Please note, this is the first issue of UCoC Newsletter which is delivered to all subscribers and projects as an announcement of the initiative. If you want the future issues delivered to your talk page, village pumps, or any specific pages you find appropriate, you need to subscribe here.

You can help us by translating the newsletter issues in your languages to spread the news and create awareness of the new conduct to keep our beloved community safe for all of us. Please add your name here if you want to be informed of the draft issue to translate beforehand. Your participation is valued and appreciated.

  • Affiliate consultations – Wikimedia affiliates of all sizes and types were invited to participate in the UCoC affiliate consultation throughout March and April 2021. (continue reading)
  • 2021 key consultations – The Wikimedia Foundation held enforcement key questions consultations in April and May 2021 to request input about UCoC enforcement from the broader Wikimedia community. (continue reading)
  • Roundtable discussions – The UCoC facilitation team hosted two 90-minute-long public roundtable discussions in May 2021 to discuss UCoC key enforcement questions. More conversations are scheduled. (continue reading)
  • Phase 2 drafting committee – The drafting committee for the phase 2 of the UCoC started their work on 12 May 2021. Read more about their work. (continue reading)
  • Diff blogs – The UCoC facilitators wrote several blog posts based on interesting findings and insights from each community during local project consultation that took place in the 1st quarter of 2021. (continue reading)


unsigned comment by SOyeyele (WMF) (talk) 22:37, 10 June 2021‎.

Index:Robert Carter- his life and work. 1807-1889 (IA robertcarterhis00coch).pdf[edit]

First run through is done, and it's transcluded. Needs validation. Thanks in advance for any help. Jarnsax (talk)

Request for comment notification[edit]

Here is a link to a RFC on Meta concerning all Wikimedia projects. unsigned comment by Lionel Scheepmans (talk) .

Camera Work[edit]

Hi, Just in case someone here might be interested, I am uploading to Commons all the 49 volumes of Camera Work, the quarterly photographic journal of Alfred Stieglitz. These are high quality scans. I am mostly interested by the images, but maybe someone would like to transcribe the text here. Regards, Yann (talk) 22:33, 17 September 2021 (UTC)[reply]

That's a really pretty work. If I had more wall space, I'd be down to the print shop this lunchtime! Inductiveloadtalk/contribs 11:01, 20 September 2021 (UTC)[reply]
@Yann: Please continue to upload the volumes and I will make a page for them. Languageseeker (talk) 11:18, 20 September 2021 (UTC)[reply]
It will take some days to have the complete collection. Already 24 (+ 1 supplement) uploaded. Regards, Yann (talk) 12:56, 21 September 2021 (UTC)[reply]
@Yann: I've started work on transcribing the first issue. Also of note are the high quality scans of Camera Notes (1900–1) from the Camera Club of New York available from Rijksmuseum Amsterdam. - ei (talk) 11:31, 23 September 2021 (UTC)[reply]
Hi, Did you create a page for the whole set? Yann (talk) 20:47, 25 September 2021 (UTC)[reply]
@Languageseeker, @Einstein95, @Inductiveload: All files uploaded. Could you please create a page here for the whole set, so it could be linked in WD? I will look at Camera Notes in a few days. Regards, Yann (talk) 21:52, 4 October 2021 (UTC)[reply]
@Yann Yes check.svg Done : Camera Work. Enjoy, and thanks for the work so far! Inductiveloadtalk/contribs 22:05, 4 October 2021 (UTC)[reply]
Also I set up {{Camera Work link}} and {{Camera Work volumes}} for you. Inductiveloadtalk/contribs 22:20, 4 October 2021 (UTC)[reply]

Camera Notes[edit]

I couldn’t get Camera Notes from Rijksmuseum, as each page has to be downloaded separately. It would take ages. But I got complete scans on Hathitrust in 3 files. First file is on Commons: c:File:Camera Notes, v. 1-2, 1897-1899.pdf. The others will follow. Regards, Yann (talk) 20:48, 5 October 2021 (UTC)[reply]

I can’t upload the other volumes from Hathitrust due to a bug somewhere when uploading large files (714 and 864 MB). However I managed to upload File:Camera Notes, 1897-1903.pdf, which is only 163 MB, but still 1,662 pages! Not sure about the quality, so may be it is better to wait for the other files. What do you think? Yann (talk) 20:03, 6 October 2021 (UTC)[reply]
Welcome to the upload bug party! If you can upload to the Internet Archive (or anywhere I can reach them), I can try upload them for you via a secret method (involving Toolforge, SSH, duct tape and prayer). And if that still fails, I can file a server-side-upload request for you, but that can take "some time" to be actioned. Inductiveloadtalk/contribs 21:33, 6 October 2021 (UTC)[reply]
It seems at least one bug was fixed, and the server-side uploads were done: File:Camera Notes, v. 3-4, 1899-1901.pdf, File:Camera Notes, v. 5-6, 1901-1903.pdf. Regards, Yann (talk) 20:00, 31 October 2021 (UTC)[reply]

291[edit]

While waiting for Camera Notes, I found some scans of the 291 magazine: c:File:291 magazine, No. 1, March 1915.pdf. Text is available at IA. Yann (talk) 20:50, 7 October 2021 (UTC)[reply]

Localisation requests ...[edit]

Localization is requested due to additional/new material consisting of editorial notes and Commentary by Robert William Chapman (1881-1960), meaning that regrettably the work is not PD in the UK yet, and so cannot be hosted on Commons. ShakespeareFan00 (talk) 09:00, 9 October 2021 (UTC)[reply]

Thank you for finding out about this. I would add Index:Fragment of a novel written by Jane Austen.pdf to the list because it was also edited by R.W. Chapman. Languageseeker (talk) 00:00, 10 October 2021 (UTC)[reply]
Bar the PDF for Chapman's Jane Austen, the other five files have been made local. The other file should be brought to Commons as a djvu and I will move it over here. The pdf is just dying when I try to migrate it and after three attempts that is enough time-wasting. — billinghurst sDrewth 11:31, 13 October 2021 (UTC)[reply]

in addition, Localization is requested :

--Slowking4Farmbrough's revenge 01:48, 2 November 2021 (UTC)[reply]

on hold recent configuration changes have got us half-pregnant. — billinghurst sDrewth 12:07, 2 November 2021 (UTC)[reply]
is "upload-by-URL for autoconfirmed users" enabled, so we can just do the upload without pestering admins? maybe a noticeboard would be good for tracking. --Slowking4Farmbrough's revenge 16:31, 2 November 2021 (UTC)[reply]
This is now done. There was a small constellation of bugs and config snags that made this difficult, because since copy-upload has only jsut been enabled, this have never been done before. AFAICT it's all resolved now. For the record of a fun process:
  • phab:T294824: allow upload-by-URL from Commons to Wikisource
  • phab:T294825: allow PWB to retry if an upload-by-URL fails (e.g.: the domain is not in the copy-upload whitelist, which it was not, but is now)
  • phab:T294916 PWB hits an error when uploading-by-URL a file that has a warning (e.g. file already existed on Commons)
upload-by-URL is now enabled, and since phab:T294824 was deployed today, you can do that from Commons. However, the benefit of a "proper" import with imagetransfer.py is that the original file history is added to the page for the record. Inductiveloadtalk/contribs 14:10, 3 November 2021 (UTC)[reply]

Category:Authors with honorifics in title[edit]

FYI: I've created a maintenance template and category to help track Author pages where there is a honorific in the page title. Normally we'd just remove the honorific by moving the page, but sometimes that would leave us with no more than a surname, or sometimes an Author page should have a honorific in the name for whatever reason. Anyway, you can now use {{honorifics}} and Category:Authors with honorifics in title to keep track of such pages. —Beleg Tâl (talk) 18:14, 13 October 2021 (UTC)[reply]

I would be interested in any feedback, especially with regards to the functionality I built in to allow for pages where the honorific is supposed to be there (based on what I've seen billinghurst do with {{initials}}) —Beleg Tâl (talk) 19:06, 13 October 2021 (UTC)[reply]
The problem is that Beleg Tâl says that it is only a tracking template, but in fact it distributes a message about the alleged inappropriateness of the title of the page. If the title is inapropriate, it should be moved. If it is decided that the honorifics can or should be kept in the page title for some reason, then it is the message that is inappropriate. I have nothing against tracking such pages without the message, in such a case the tracking and categorizing template can be put to the bottom of the page where other categories are located and where it would be less disturbing to editors. --Jan Kameníček (talk) 19:24, 13 October 2021 (UTC)[reply]
That's no different from any of the other maintenance tracking templates we use ({{initials}}, {{incomplete}}, {{migrate to djvu}}, etc) - and it encourages editors to improve Wikisource while showing them how to do so - so I don't see how this is a bad thing? —Beleg Tâl (talk) 19:44, 13 October 2021 (UTC)[reply]
That is different. E.g. {{incomplete}} asks readers to improve an incomplete page. Similarly, {{initials}} says that the page should use the full name but we do not know it, so if any reader knows more, they can improve it. However the situation with honorifics is different. In all cases where it is possible or desirable to remove honorifics, it should be done immediately. But how can readers help us in cases when we simply decided to keep the honorifics? You have admitted that such cases may occur, so why should we inform readers about alleged inappropriateness honorifics in titles if we decided not to remove them? What do we ask the readers to do in such cases? --Jan Kameníček (talk) 20:25, 13 October 2021 (UTC)[reply]
If we decide to keep the honorifics, we can put the {{honorifics}} tag on the Author Talk page (so it does not make the page ugly, just like we do with {{initials}}), and I have added a parameter to the template to differentiate this situation (it removes the page from the maintenance category, and adds text to the template confirming that the honorific should be kept).
In fact, most of the Author pages in Category:Authors with honorifics in title should not be kept with honorifics indefinitely, but the honorifics can't be removed immediately either. For example, if we remove the honorific from Author:Miss Morrison all we will have left is Author:Morrison, which is not a usable title for an Author page. More research is needed to find out her full name so that we can remove the honorific, and in the meantime it is tracked with a maintenance template and tracking category. —Beleg Tâl (talk) 20:46, 13 October 2021 (UTC)[reply]
If something is considered OK, than it is OK. Message is needed only if we want people to encourage to do something, e. g. to do a research and correct something. Placing the template to the talk page would remove it from the direct sight but would not make it less redundant. Besides that some contributors might be unnecessarily confused where to place it. I prefer simple solutions, and not using the template in such cases seems much simpler and cleaner to me, although if it were placed at talk pages, I could live with it. --Jan Kameníček (talk) 22:07, 14 October 2021 (UTC)[reply]
@Billinghurst: I wonder if the "ignore=yes" parameter could be helpful for {{initials}} as well, in case of individuals such as Author:Harry S. Truman whose middle name is "S." (i.e. it's not short for anything), or for Author:T. S. Eliot depending on the outcome of the discussion regarding that page title, or for pseudonyms like Author:Franklin W. Dixon and other Stratemeyer personas —Beleg Tâl (talk) 13:29, 15 October 2021 (UTC)[reply]
To do something like this we would change the text inside {{smaller}} ... Additional research is required to move this Author page to its full proper title, at which point we can remove this notice. to be {{#if:{{{ignore|}}}|PUT HERE THE IGNORE TEXT.|Additional research is required to move this Author page to its full proper title, at which point we can remove this notice.}} unsigned comment by Billinghurst (talk) .
Yes, exactly - that's what I did with {{honorifics}}, with the ignore text being: "Community consensus has been established to retain the honorifics on this page." —Beleg Tâl (talk) 01:36, 18 October 2021 (UTC)[reply]
Unfortunately removing honorifics is not always possible, as the given name of Lady Zhang in the 16th century China is never known as I check zh:明史/卷209#沈束. Some ancient Chinese women had no given names.--Jusjih (talk) 20:55, 9 November 2021 (UTC)[reply]

Splitting of works into chapters[edit]

As I understand it, it's "conventional" to split book-like works into units about the size of chapters, rather than transcluding in one huge lump.

Benefits of this, to me (partly as an export nerd), include:

  • Pages are "tractable" in terms of scrolling, rather then being hundreds of screenfuls long. Especially annoying on mobile.
  • It's possible to link to chapters. A minor factor for most books, as opposed to collective works like encyclopedias.
  • It exports to formats like Epub with a table of contents which allows the reader to navigate the text more easily (admittedly this could be resolved by allowing intra-page ToC entries; phab:T270612).
  • Each chapter starts on a new page on export without needing manual page breaks.

The downside, obviously, is that it's a right old tedious pain to transclude into many pages than doing it all in one massive go.

With reference to pages like Impressions of Theophrastus Such, Essays and Leaves from a note-book/Theophrastus Such (which is also published as single book, but here published in a volume alongside other works), is that merely "convention", or something more towards "expected"? The existence of {{split}} and pages like Help:Subpages indicate one thing, but on the other hand it's never indicated what, if any, the general expectations around actually splitting of works at all are.

Courtesy @Cygnis insignis: since that's !your page and you opposed the use of {{split}} there. Inductiveloadtalk/contribs 20:36, 13 October 2021 (UTC)[reply]

I don't think there is a clear-cut line. It seems to me that there is a basic standard, in that a work which is not easily navigable when transcluded onto a single page should be broken up. Also if the work is long enough that it starts hitting template transclusion limitations. —Beleg Tâl (talk) 20:48, 13 October 2021 (UTC)[reply]
  • This work is at the edge of what needs to be split (on desktop computer); I think it should be. Another work of a similar or slightly shorter length, without natural breaks, might not need to be split. TE(æ)A,ea. (talk) 21:04, 13 October 2021 (UTC)[reply]
  • After removing the tag I said "I see no reason to split this, or any other work, merely because it could be done that way; it hardly makes it easier for the reader." The could split this text emerges from 'something to do' with copy pastes of PG texts, it has become something that must be done, to the most discrete fragment of text in any circumstance (breaking a lot of links in works that I chose not to split). Navigating is more annoying than scrolling for me, especially when using multiple windows to search, skim, or read texts. Linking chapter, section and verse is possible with the page numbers in their larger context. Is having the software recognise book's inherent convention such a trick, I don't imagine the largely unsplit works at PG have a problem with exporting content that doesn't suit readers and their readers. Cygnis insignis (talk) 21:32, 13 October 2021 (UTC)[reply]
    You haven't put in any ToC or page breaks though, so it's functionally an amorphous lump of 58,000 words.
    PG do their exports manually, as far as I know, after the work is "locked": they ensure the work has a navigable TOC at that stage. We have no such luxury because we keep the works open for editing forever (that's, like, our whole "thing"), so the exports have to be automatically generated "live" every time.
    I do not recommend "splitting to the most discrete fragment"; in fact I have recently advised against splitting a legal work into 400+ sections. However, chapters do seem a natural split level (I normally do not split finer than that, even if the chapter has "subparts"), and failing that you should at least make sure your exports are navigable. Even if phab:T270612 were done today, you would require a ToC which links to some kind of anchors (even if just page numbers) and probably some page-break mechanism (which could be with {{tl|classed header]} and break-before:always is index CSS, or with {{page break}} of some sort). Inductiveloadtalk/contribs 21:55, 13 October 2021 (UTC)[reply]



Barging in, I was going to ask for help in this same question area. I'm interested in suggestions/opinions for another work I'm helping with, which *cannot* be easily split into simple 'chapters' as it rather has an academic outline form that does *not* have nice regular like-sized divisions. Please see notes and summary at Looking ahead at subdividing text for publishing to main space.

The work does need to be split since it is 325+ pages, but has an idiosyncratic organization at best. One division has 2/3 of the whole work! Ultimately divided into small 'paragraphs' (sections) (e.g. § 204 we need to be able to link from paragraph references to main space. But to do that we need a somewhat regular system of dividing into named subparts. No nice hierarchy suggests itself.

How to divide into named main space subdivisions, when *not* amenable to 'chapter' divisions? And I note above and agree with the "most discrete fragment" comment above. Such would completely distress the work. Some intermediate rule is needed. Shenme (talk) 22:01, 13 October 2021 (UTC)[reply]

This is a debate between the lumpers and splitters and I think that both sides have valid arguments. Texts on a single-page are easier to read through in one session, while a text in separate chapters is easier for longer reads, when exported, and on mobile. Generally, I prefer split texts. frWS usually has presents both options: a split text and a single-page text. I was wondering if it would be possible to modify the auxtoc and TOC templates to allow for a single-page option. In this way, even if a text is split, a users would have the option to view the text on a single page. By default this single-page option would transclude the entire index, but there would be an override parameter to set the page range manual.

Something like this Chapter 1
Chapter 2
....
Chapter N

View on Single Page


Languageseeker (talk) 01:01, 14 October 2021 (UTC)[reply]
yeah, i would support a reader option of lump or split. we do not know if readers are desktop or phone with limited connectivity / memory issues. a flexible response would be nice.--Slowking4Farmbrough's revenge 19:34, 14 October 2021 (UTC)[reply]
Well I've wandered around and seen 'chapters' of over 100 pages in peeking at even a few works from the "New Texts" page. So my 325 pages split into only ~7 parts sounds fine, except that two of the parts are 97 and 85 pages long. It would be nice to know what the comfortable limits are for mobile devices, as a guide here at WS. Shenme (talk) 19:13, 15 October 2021 (UTC)[reply]
Splitting a work may inhibit searching, so Template:Engine will be vital.--Jusjih (talk) 20:13, 7 November 2021 (UTC)[reply]

Localisation request: File:A Catalogue of the Birmingham Collection - 1918.pdf[edit]

Request because of a lack of clarity concerning the applicability of the PD-UK-anon license at Commons, As a 1918 work it is PD-US and so could be hosted locally, irrespective of any UK status. ShakespeareFan00 (talk) 17:03, 15 October 2021 (UTC)[reply]

@ShakespeareFan00: Any chance of getting a djvu to localise? PDFs are still ugh in my opinion. — billinghurst sDrewth 00:45, 16 October 2021 (UTC)[reply]
It is the same as the archive.org copy. The work is "compiled under the direction of Walter Powell [see below] and Herbert Maurice Cashmore [who died in 1972.]". As a catalogue, however, this would also fall below the threshold of originality, me thinks, except for any preface. The author of the preface (before the introduction) is E. Marston Rudland, chairman of the Public Libraries Committee, who was born in 1875 (he would have been 75 in 1950, so not impossible he was alive). The introduction is by Howard S. Pearson, of whom a portrait in 1906 (12 years earlier) shows quite an elderly man ([1]). Safe to say, (if the UK term of 70 years after death is taken into account), that he died before 1951. As to the author of the "Note on the catalogue", he is named as Walter Powell, chief librarian, who looks like this guy, died 1928. RandomCanadian (talk) 01:11, 16 October 2021 (UTC)[reply]
Addendum: Rudland published some works in the early 1950s, so he would have been alive at that time. This document seems to confirm a death date in 1958 (if it can be trusted). Of course, none of that alters the unquestionably PD nature of the work in the US. I'm not quite sure how much this affects the UK status, though, as it is a collective work. RandomCanadian (talk) 01:28, 16 October 2021 (UTC)[reply]
http://moseley-society.org.uk/wp-content/uploads/2018/12/Ernest-Marston-Rudland.pdf - Ernest-Marston-Rudland (1875-1958). ShakespeareFan00 (talk) 07:06, 16 October 2021 (UTC)[reply]
http://calmview.birmingham.gov.uk/CalmView/Record.aspx?src=CalmView.Catalog&id=MS+2724%2f5 - Howard Shakespeare Pearson (1838-1928) ShakespeareFan00 (talk) 07:06, 16 October 2021 (UTC)[reply]
@ShakespeareFan00: I am not going to move that pdf version over as it is giving rubbish layer extraction. We can move over a djvu now or later, I care not. @RandomCanadian: enWS only looks to comply with US copyright, so the death dates of the authors and his mates is irrelevant for this work. — billinghurst sDrewth 11:39, 3 November 2021 (UTC)[reply]
FWIW Ernest Marston Rudland died in 1958. — billinghurst sDrewth 11:41, 3 November 2021 (UTC)[reply]

Constitution of India[edit]

Hi to all. Seems that this document has random fixes relating to later versions of the document. We need to work out what to do with it, and agree to a means that makes this a static document, rather than a dynamic document. My thoughts are that getting a scan of the document, and having that in place will greatly resolve the issues that we are having. Otherwise we re going to have people pick through all the pages and revisions. — billinghurst sDrewth 07:26, 20 October 2021 (UTC)[reply]

Same as always: find relevant scans/born-digital version (as appropriate) for each revision.
Was there a conclusion to the matter of whether the "updated" editions (eg [2]) published by the Ministry of Law and justice (Legislative Dept) were PD, or only the legislation that makes the change? Inductiveloadtalk/contribs 07:43, 20 October 2021 (UTC)[reply]
@Billinghurst, @Inductiveload: -- EdictGov template is not applicable for such updated files (as per the provisions of the EdictGov-India template in Commons), but there exists a way to host such files. Commons has thousands of files under the GODL license. This license is not actually applicable to any of those files, but Commons persists in its policy that any file sourced from a government website of India is hostable under GODL (see note to reviewer here). We can also take advantage of this Commons policy, if the community agrees; and then such files can be hosted here. Regards. Hrishikes (talk) 07:22, 17 November 2021 (UTC)[reply]
@Hrishikes: The page is rooted. We should start again with a disambiguation page, and subsequent scan sourced pages. then you can license appropriately. — billinghurst sDrewth 11:55, 21 November 2021 (UTC)[reply]
@Billinghurst: -- In that case, you can possibly undelete either 1 or 2 under GODL, and we can proceed from there. Note: all the amendments to the Indian Constitution are already present in this Index, and the original is at 1 and 2. Hrishikes (talk) 12:16, 21 November 2021 (UTC)[reply]

Sister project links to Wikipedia pages of the same name that aren't connected through Wikidata or added by {{header}}[edit]

At least on Pollution and Wernher von Braun. Is this a known issue? —CalendulaAsteraceae (talkcontribs) 03:32, 31 October 2021 (UTC)[reply]

But it is connected at wikidata, via "Main subject". If there is no exact match (like a en.wiki page for the poem) then the template pulls in the page for the main subject. Not an issue; more like an intended set of instructions....--RaboKarbakian (talk) 04:15, 31 October 2021 (UTC)[reply]
@CalendulaAsteraceae: Detail at Template talk:Plain sister and there was some mention here at one point. — billinghurst sDrewth 11:30, 31 October 2021 (UTC)[reply]
Cool, thank you! Yeah, it's not a thing I have a problem with, but it was a bit surprising, so it's good to know what's going on! —CalendulaAsteraceae (talkcontribs) 22:15, 31 October 2021 (UTC)[reply]

Translation redirects[edit]

I notice that {{translation redirect}} has been deleted. What's the current consensus on linking mainspace to Translation space? Is it an exception to WS:CSD M3: Cross-namespace redirects - if so can we update the deletion policy accordingly? Or do we prefer to leave the work title redlinked in mainspace? —Beleg Tâl (talk) 13:26, 1 November 2021 (UTC)[reply]

IMO, cross-namespace redirects from Main→Translation, as well as Author→Portal when the person has been "portallifed" and Main→Portal for periodicals and similar without any mainspace content (yet) should all be allowed.
Cross-namespace redirects only increase the ability of a reader (who may well not even know about portals, translation namespaces) to find the text. Even as a Wikisource editor, if I see a redlink, I will assume there is no content for that work/author, and I will continue on my way, blissfully unaware of a translation or portal page. Inductiveloadtalk/contribs 14:02, 1 November 2021 (UTC)[reply]
Gospel? Cygnis insignis (talk) 14:25, 1 November 2021 (UTC)[reply]
Agreed, this seems to be the wisest course of action, but is that the consensus that was established when the soft redirects were removed? —Beleg Tâl (talk) 15:26, 1 November 2021 (UTC)[reply]

We don't need them. The search functionality treats Main, Author, Portal, and Translation as content namespaces so they all show up in searches. Why would you see a redlink? Any redlinks should be fixed if someone is doing the job properly. These links should only exist for limited periods per {{dated soft redirect}}. — billinghurst sDrewth 10:19, 2 November 2021 (UTC)[reply]

I still do not understand the need to delete, say Printing Times and Lithographer instead of simply redirecting to Portal:Printing Times and Lithographer pending mainspace content. When the mainspace page Printing Times and Lithographer is eventually made, many incoming links to Portal:Printing Times and Lithographer should be changed back. What's the point? It seems to me to be make-work, and designed specifically to trip people up who want to link the text "Printing Times and Lithographer" in some other work (example: this page), but do not expect there to be a portal and just settle for the red link. Inductiveloadtalk/contribs 12:10, 2 November 2021 (UTC)[reply]
@Billinghurst: The High Mountains is a red link. If I have a work and I want to link it to "The High Mountains", I will just put The High Mountains and assume that no one has added this work yet. If I upload a work called "The High Mountains", I will put it at The High Mountains and not bother to disambiguate. Why would I check to see if Translation:The High Mountains exists? Why would anyone? (Inductiveload's point on periodical portals is essentially the same, but the caveats about portal-space pages not being "Works" does not apply when we are talking about WS:T) —Beleg Tâl (talk) 13:34, 2 November 2021 (UTC)[reply]
What is the purpose of main namespace? Can we stick to it? What is the purpose of a redlink ... CONTENT! People will come here via external search, via internal search, or via links from other wikis. I hardly doubt that a significant portion of our traffic arrives by someone typing the title name component into a url. When we have a redlink in a work, it is meant to be pointing to the published work, not our listing of various scans in the portal namespace. While I appreciate volunteers translations of works in the Translation: ns, we should not be creating direct links from published works in main ns to Translation ns. That was never part of the reasoning for setting up that namespace for those user focused, and unchecked. Think through what that explicitly saying about a foreign language work and what you are saying.

Every time someone thinks that it is a finding aid and you flick people all over the site you are diluting main ns's purpose. By your logics, we should create redirects for every author so you find those. We would need to create redirects for each version of the name of a publication. That is just a maintenance nightmare and a linkrot waiting to happen. If you don't like redlinks to a journal name here, then create the content. If you think that we have an issue with how our landing pages are working then create a better solution of what it could look like beside disambiguation pages, version pages, translation pages, not go the lazy option of cross namespace redirects. Please take a step back and maintain the quality and integrity of the system. How many times have we had situations of main ns works coming in to the match the name of works in the Translation ns and causing conflict? Have we captured and resolved those? Is there a means that we can have report those and resolve those? Please don't give hypothetical edge cases where we have a system that looks at such additions and resolves, or where we can fix those by alternate processes through reports. — billinghurst sDrewth 22:10, 2 November 2021 (UTC)[reply]

If that's the consensus that replaced the use of {{translation redirect}}, then that's fine with me - Translation space is a wild west anyway —Beleg Tâl (talk) 23:22, 2 November 2021 (UTC)[reply]

software adding line breaking hyphens in Page namespace[edit]

Reality check please: somebody has introduced line breaking hyphens into the justified text? If the intent was yet another way to keep people busy—such as the blind man in a dark room looking for the black cat … that isn't there!—it is ingenious. Cygnis insignis (talk) 14:39, 1 November 2021 (UTC)[reply]

Meet the new Movement Charter Drafting Committee members[edit]

The Movement Charter Drafting Committee election and selection processes are complete.

The committee will convene soon to start its work. The committee can appoint up to three more members to bridge diversity and expertise gaps.

If you are interested in engaging with Movement Charter drafting process, follow the updates on Meta and join the Telegram group.

With thanks from the Movement Strategy and Governance team. --Civvi (WMF) (talk) 15:13, 1 November 2021 (UTC)[reply]

@Civvi (WMF) Why is the WMF using a closed, commercial, app-based platform like Telegram (which requires my phone number and don't allow a dedicated account) for this? Inductiveloadtalk/contribs 11:58, 2 November 2021 (UTC)[reply]
because the other media channels have adult supervision, unlike wiki-talk. they are merely acknowledging where the conversation is. this is a perennial subject, but the community does not acknowledge its own role to driving conversation elsewhere. --Slowking4Farmbrough's revenge 16:17, 2 November 2021 (UTC)[reply]
@Inductiveload: Hi, thanks for asking and sorry for my bad english. Yes, Slowking is true, the discussion is perennial and AFAIK taking place in a lot of different places. I can just add that every language community (but sometimes even single projects) has different preferences so I guess that the choice was to stay were most persons are, and this seems to be telegram. Every alternative has pros and cons. (Personally I would love to go back to IRC...) --Civvi (WMF) (talk) 17:29, 2 November 2021 (UTC)[reply]
@Civvi (WMF) would it not make sense for the WMF to undertake to "open" these disparate channels by formally bridging them - especially in the context of a channel being advertised in a message from a WMF staffer? Please note, this isn't a aimed at you personally, it's just that the (WMF) suffix brings with it some level of organisational approval, so I'm talking to the "(WMF)", not the "Civvi" part of the username. A Wikimedia Matrix homeserver springs to mind (used by, e.g. projects like KDE and Mozilla delegate to EMS, or it can be self-hosted) as a flexible set-up, but "just" IRC would work too. Inductiveloadtalk/contribs 08:37, 3 November 2021 (UTC)[reply]
@User:Inductiveload Thanks :-) Indeed the WMF suffix unfortunately is not linked to "WMF-Omniscience" (does this word exist?) I don't have answers to this specific and quite technical question but I am happy to try to find them, perhaps @User:Xeno (WMF) might know more about this and can help. --Civvi (WMF) (talk) 14:36, 3 November 2021 (UTC)[reply]
i kind sympathize for WMF in that they tried meta:Flow and Liquid Threads, and were roundly criticized, but then started and abandoned Wikimedia Space. we are in thrash mode, and need some sustained messaging leadership. until then we will just patch together ad hoc channels; and the open failure to deliver newbie friendly comms will lead people to try everything. --Slowking4Farmbrough's revenge 22:10, 3 November 2021 (UTC)[reply]

Tech News: 2021-44[edit]

20:28, 1 November 2021 (UTC)

Exhibition of the Impressionists[edit]

This seems to be a translation from French, but there is no translator information. Huhu9001 (talk) 04:36, 2 November 2021 (UTC)[reply]

This looks to be from 1946 by Rewald Google Books, which may be a copyright violation (Renewal: R570616) MarkLSteadman (talk) 04:55, 2 November 2021 (UTC)[reply]

Unstrip size limit exceeded (5,000,000)[edit]

I'm getting this error on A Dictionarie of the French and English Tongues. Does anyone know what it means? Languageseeker (talk) 01:03, 3 November 2021 (UTC)[reply]

@Languageseeker Basically, you asked the server to process more text than it's configured to (5MB). You should split the work up into smaller chunks. Inductiveloadtalk/contribs 08:15, 3 November 2021 (UTC)[reply]
That's true, but there's something more going on too. If it was simply too big the error would just reference the "Post-expand include size" (which is the 5MB limit). In this case, something is causing unstripping of strip markers to exceed that limit after transclusion. It may be a simple potato vs. potato thing, but it could also mean that there's a suboptimally behaving template, or module, or extension tag, or invocation of either in there somewhere. For example, if large chunks of text are stuffed into a template param, and especially if that text itself contains templates, you might get this effect. To quote T189416: unstrip-size-warning is shown when the maximum expansion size for nested parser extension tags is exceeded. "Unstrip" refers to the internal function of the parser, called 'unstrip', which recursively puts the output of parser functions in the place of the parser function call. If that's the case it might be worthwhile tracking it down and fixing it (or at least understand it) to avoid problems down the road. Xover (talk) 08:54, 3 November 2021 (UTC)[reply]
@Xover I'm pretty sure that this is caused by the massive use of <poem> (or maybe even <pages>, which is, in total being fed megabytes of text here (I'm not sure if all the poems share the same StripState). It's a "fun" edge case, but it looks to me like just another way that you can fall off the edge of the world when transcluding excessively large amounts of text. Inductiveloadtalk/contribs 14:59, 3 November 2021 (UTC)[reply]
@Languageseeker: Who thought that it was a good idea to transclude 900+ pages of not proofread pages to a single page at enWS? How is that considered a reasonably presentation of a work? Do we truly have to even ask that question? — billinghurst sDrewth 11:49, 3 November 2021 (UTC)[reply]

Transcluding pages 1 to 100 gives me

NewPP limit report
Parsed by mw1371
Cached time: 20211103115326
Cache expiry: 1814400
Reduced expiry: false
Complications: [vary‐page‐id]
CPU time usage: 1.190 seconds
Real time usage: 1.625 seconds
Preprocessor visited node count: 4865/1000000
Post‐expand include size: 145366/2097152 bytes
Template argument size: 68010/2097152 bytes
Highest expansion depth: 15/40
Expensive parser function count: 0/500
Unstrip recursion depth: 2/20
Unstrip post‐expand size: 847175/5000000 bytes
Lua time usage: 0.084/10.000 seconds
Lua memory usage: 1524608/52428800 bytes
Number of Wikibase entities loaded: 0/400

billinghurst sDrewth 11:54, 3 November 2021 (UTC)[reply]

I agree, it was a bad idea. But, I tried to first transclude in my user ns to see which pages needed the most help. However, in user NS, there are no page numbers. The text has been fully proofread and mostly formatted by Distributed Proofreaders. So, I was hoping to look through, see what pages needed help, and then split up the work. It seems that I created a royal mess instead. My apologies to everyone. Languageseeker (talk) 14:56, 3 November 2021 (UTC)[reply]
@Languageseeker: Important context that would have been valuable in the beginning. Are you aware that if you look at the page source of any wiki work that it generates a mw:NewPP parser report. Always good reading to help look at pages that are anything outside of vanilla production values. YOu will also see following that a "Transclusion expansion time report". Also to reference mw:Strip marker which explains that MediaWiki software adds elements that look and act like XML tags. — billinghurst sDrewth 00:47, 4 November 2021 (UTC)[reply]
@Languageseeker: page numbers should now be enabled in an Wikisource or User namespace page with "Sandbox" in the title. Inductiveloadtalk/contribs 11:27, 4 November 2021 (UTC)[reply]

M&S (Phe-Bot) is Stuck[edit]

It seems that the poor match-and-split bot got stuck. Could someone give it a nudge? Languageseeker (talk) 06:23, 3 November 2021 (UTC)[reply]

@Languageseeker (CC MarkLSteadman): There was an unplanned network outage on WMCS and Toolforge yesterday, so multiple tools hosted there may need a kick to come back alive (Toolforge tools run on a job grid with NFS-mounted home directories; when the network goes, everything breaks and often can't recover without being restarted). In any case, the match & split tool has had the requisite Russian Space Station Fix™ applied and should now be back in working order. You and Mark were the only ones with jobs queued, but those jobs will need to be resubmitted. Xover (talk) 08:35, 3 November 2021 (UTC)[reply]
Yep, it took out the Discord-Matrix/IRC bridge too (though that did eventually recover itself and the Kubernete pod appeared magically). Inductiveloadtalk/contribs 09:17, 3 November 2021 (UTC)[reply]
It seems as if the bot is running, but not actually creating pages. Languageseeker (talk) 14:48, 3 November 2021 (UTC)[reply]
@Languageseeker: Yeah, it looks like after the restart code changes that happened since the previous restart has broken it. I'm looking at it, but it might be a while (it's a big, complex, and completely undocumented codebase that's mainly written in Python 2 and in the process of being updated to Python 3 piecemeal). Xover (talk) 18:55, 3 November 2021 (UTC)[reply]
Ayup. The bitrot has finally caught up with Phetools. Match & Split and several other bits of it are currently either non-functional or will be unstable, and it's going to take major surgery to fix it. In other words, you should plan to be without M&S for a bit. The flip side is that this "surgery" was badly needed anyway, and the prognosis is good. It'll be a bit of a pain short term, but it might end up getting us a more maintainable tool once done. Or, you know, I might mess it up completely and you'll all hate me forever. One of those two, probably. :) Xover (talk) 22:15, 3 November 2021 (UTC)[reply]
Such is life. In worse case, Phetools shall lie down with the iron and the lamp in silicon heaven. Languageseeker (talk) 02:53, 4 November 2021 (UTC)[reply]
@Languageseeker (CC Beleg Tâl and MarkLSteadman): Ok, I think I have it back up and running at least to test. With the type of changes that were needed the most likely type of bug to occur is various forms of breakage related to non-ascii characters (including accents, greek, fancy quote marks, etc.) and primarily in either the source wikipage name or the target PDF/DjVu filename. Non-ascii characters in page content may also be affected, but these are more likely to be subtle or trivial. Xover (talk) 11:28, 6 November 2021 (UTC)[reply]

Illustrations are not displayed[edit]

@Beleg Tâl: What's wrong with my photos?

@Виктор Пинчук: I put {{image missing}} where the images in the original publication are; if you (or anyone else) wants to upload a copy of those images and insert them into the text, that would be great. —Beleg Tâl (talk) 00:04, 5 November 2021 (UTC)[reply]
@Виктор Пинчук: I have added the missing images to Translation:God is One… in 200 Persons. Please note that Wikisource is not a platform for publishing new versions of these works with new images - we expect works on Wikisource to be faithful to the scans of the original publication. —Beleg Tâl (talk) 02:56, 9 November 2021 (UTC)^^^[reply]
@Beleg Tâl: What about this?

Multimedia content added to texts can greatly improve the quality and presentation. Such content includes not only published illustrations or photographs from or about the book itself which are out of copyright, but also original contributions of audio recordings, diagrams, or other content. https://en.wikisource.org/wiki/Wikisource:What_Wikisource_includes#MultimediaВиктор Пинчук (talk) 06:12, 9 November 2021 (UTC)[reply]

@Виктор Пинчук: This policy also notes that "Multimedia contributions are subject to Wikisource:Image use guidelines", which indicates that "Inappropriate images are those which are not part of the original document, those indirectly related to the work, and should not be included on the page". Perhaps you could put extra images on the Talk page, or on your User page, and put a link to it in the work header? (Not in the work itself) —Beleg Tâl (talk) 01:52, 11 November 2021 (UTC)[reply]
@Beleg Tâl:The Wikisourse administrator is not a bot: he should not blindly follow the rules, but should consider each case individually. In this case, the author has not died yet. For example, in the Russian version of Wikisource, I illustrate my articles with author's videos. Of course, they were not in the paper version: video can't be printed in a newspaper. — Виктор Пинчук (talk) 06:14, 11 November 2021 (UTC)[reply]
@Beleg Tâl: Illustrations and videos not published in the newspaper refer to the author of the works, and not to the participant (user) of the project. The user is a temporary concept, the author of literary texts is forever. If required, I can request VTRS permission confirming that the illustrations belong to the author of the text posted on Wikisourse. — Виктор Пинчук (talk) 06:38, 11 November 2021 (UTC)[reply]
@Виктор Пинчук: If you want to have an edition of your articles that has extra images and videos, or which omits the "grunge text", there are other websites on which you can do that. On English Wikisource, however, we only welcome editions that are faithful to the original publication. If another publisher agrees to publish your articles with the new images, you can provide a scan of the new publication with the images included in the new publication. Otherwise, as I have suggested, you will need to put your alterations elsewhere, and we can link to them from Wikisource as appropriate. —Beleg Tâl (talk) 15:40, 11 November 2021 (UTC)[reply]
@Beleg Tâl: …you will need to put your alterations elsewhere, and we can link to them from Wikisource as appropriate. Can you show me what it looks like and how it's done? — Виктор Пинчук (talk) 14:58, 13 November 2021 (UTC)[reply]
@Виктор Пинчук: Something like this, perhaps —Beleg Tâl (talk) 02:39, 15 November 2021 (UTC)[reply]
@Beleg Tâl: (An expanded version of this article can be viewed on the author's [https://www.viktor-pinchuk.example.org/god-is-one-in-200-persons website) But this link doesn’t work, because I haven’t "author's website". May be use for this aim page "Discussion"? —Виктор Пинчук (talk) 05:59, 16 November 2021 (UTC)[reply]
P.S. For example, here: https://en.wikisource.org/w/index.php?title=Translation_talk:Notes_of_an_international_tramp&action=edit&redlink=1Виктор Пинчук (talk) 10:29, 16 November 2021 (UTC)[reply]

Academic journal articles with a "Digital object identifier"[edit]

What is the status for journal articles with a "Digital object identifier" (DOI) the index page does not appear to support this field? --2db (talk) 16:29, 5 November 2021 (UTC)[reply]

@2db it does now ^_^. In the longer term, this should come from Wikidata. 17:04, 5 November 2021 (UTC) Inductiveloadtalk/contribs 17:04, 5 November 2021 (UTC)[reply]

Captains Courageous source txt origin[edit]

Where did the source txt for Captains Courageous come from? --2db (talk) 04:34, 7 November 2021 (UTC)[reply]

  • 2db: Obvious, it’s not stated, but the text is most likely from Project Gutenberg, which is available here. TE(æ)A,ea. (talk) 04:39, 7 November 2021 (UTC)[reply]
Should this be in the category ready to split and match? --2db (talk) 04:45, 7 November 2021 (UTC)[reply]
see previous followup question 2db (talk) 04:49, 7 November 2021 (UTC)[reply]
  • 2db: No. The process for match-and-split is described here. It requires a scan to be in existence. To request a scan, you may place a request at the Scan Lab. TE(æ)A,ea. (talk) 04:51, 7 November 2021 (UTC)[reply]
Also, because the source edition for this copy is unknown, it is not appropriate for match-and-split and a new scan should be proofread through the normal process. Beeswaxcandle (talk) 05:09, 7 November 2021 (UTC)[reply]
you have three scans to choose from c:File:Kipling - Captains courageous, 1899.djvu; c:File:"Captains courageous" (IA cu31924013493246).pdf; c:File:"Captains courageous", a story of the Grand Banks (IA captainscourageo00kipl).pdf. --Slowking4Farmbrough's revenge 18:21, 7 November 2021 (UTC)[reply]
Please do not match-and-split. The correct text to proofread is Index:Captains Courageous (1897 London).djvu. Languageseeker (talk) 06:18, 11 November 2021 (UTC)[reply]

Doctor Dolittle's Post Office (1923)[edit]

Are we allowed to use the Gutenberg epubs to pdfs for Hugh Lofting's Dolittle books? I know there was talk of getting scans of Doctor Dolittle's Circus (1924) earlier in the year, but nothing seems to have come of it. I ask this because I found a file on Gutenberg that I could use for Doctor Dolittle's Post Office here: https://www.gutenberg.org/cache/epub/58947/pg58947-images.html I have not found a file for Doctor Dolittle's Circus on my own. I am a bit unhappy that all the talk of getting scans earlier this year amounted to nothing. SurprisedMewtwoFace (talk) 23:48, 7 November 2021 (UTC)[reply]

I don't think there's much point using a Gutenberg ebook for proofreading - if you're going to use a Gutenberg edition, then you may as well skip the proofreading process until an actual scan is found. By the way, there is a very bad scan of the book here if anyone wants to take the time to split and reassemble the page images. —Beleg Tâl (talk) 00:48, 8 November 2021 (UTC)[reply]
There's a scan at Hathi (https://babel.hathitrust.org/cgi/pt?id=uva.x002562566) but it needs splitting. Would that work for you? Inductiveloadtalk/contribs 10:02, 8 November 2021 (UTC)[reply]
I think I'll upload the Hathi that you linked to, @Inductiveload:. Thanks for the link! SurprisedMewtwoFace (talk) 13:16, 8 November 2021 (UTC)[reply]
@SurprisedMewtwoFace Can you split the pages, or would you like me to do it? Inductiveloadtalk/contribs 13:52, 8 November 2021 (UTC)[reply]
@Inductiveload I've uploaded it on Wikimedia Commons. Could you do the page splitting, please? Here is the link: https://commons.wikimedia.org/wiki/File:Doctor_Dolittle%27s_Post_Office_(1923).pdf I think you have a much better knowledge of how to page split than I do! Don't feel you have to rush, I'm working on finishing up the Agatha Christie novel for now. SurprisedMewtwoFace (talk) 14:34, 8 November 2021 (UTC)[reply]
@SurprisedMewtwoFace Here ya go: Index:Doctor Dolittle's Post Office - 1923 - Lofting.djvu Inductiveloadtalk/contribs 15:07, 8 November 2021 (UTC)[reply]
@Inductiveload Thank you so much! This is a great help. This will be my next project after the Agatha Christie novel now. SurprisedMewtwoFace (talk) 15:13, 8 November 2021 (UTC)[reply]
Currently there are 2 more Gutenberg books by Agatha Christie marked for speedy deletion: File:MurderOnTheLinks.pdf and File:TheSecretofChimneys.pdf. I am unsure about them. I also do not like the idea of transcribing Gutenberg books, but I also failed to find these two anywhere else. --Jan Kameníček (talk) 16:04, 8 November 2021 (UTC)[reply]
@Jan.Kamenicek I was the one who uploaded those. I was looking for copies of them on Hathitrust and Internet Archive that would have been actual book scans, but I could not find any other thna the Gutenberg copies. I'm not sure what people will decide on them.
SurprisedMewtwoFace (talk) 20:07, 8 November 2021 (UTC)[reply]
I think if you have to use a PG edition, the PDF is not adding anything except complexity. The edition is born digital, so the text version is a good a source as any and more easily Wikified.
Still, my general opinion stands that copying PG works without scans is not very useful, since PG is very much still a thing, and Wikisource can do better than being a very limited and incomplete backup of PG. We can just link out to an existing PG work if we really cannot find a scan. Inductiveloadtalk/contribs 20:28, 8 November 2021 (UTC)[reply]
yeah, we need to go find the first edition scans. and provenance seems not to be a priority at PG. it will be a long term quality improvement task. PG is a good placeholder until the scans get done and uploaded. --Slowking4Farmbrough's revenge 22:46, 8 November 2021 (UTC)[reply]
I have a physical copy of the Secret of Chimneys but (a) I won't be able to scan it until at least the first of the year and (b) it's an abridged version from slightly later.--Prosfilaes (talk) 01:11, 9 November 2021 (UTC)[reply]
We already host the PG copy of The Murder on the Links and File:Agatha Christie-The Murder on the Links.djvu. This last was kept as a result of a Copyvio discussion in 2019. We don't need any further copies. Beeswaxcandle (talk) 04:20, 9 November 2021 (UTC)[reply]
So I have deleted MurderOnTheLinks.pdf as redundant to File:Agatha Christie-The Murder on the Links.djvu. What about the File:TheSecretofChimneys.pdf which does not seem to have any other copy available? Delete as well or keep? I am really hesitant. --Jan Kameníček (talk) 08:37, 15 November 2021 (UTC)[reply]

Tech News: 2021-45[edit]

20:36, 8 November 2021 (UTC)

PAGE:subpages' parent[edit]

This page does not exist:

But does have multiple subpages, for example:

Is there ever any need to create the parent page of PAGE:subpages?

How do you automatically watchlist all the subpages, or does each subpage have to be manually added to a watchlist? --2db (talk) 16:28, 9 November 2021 (UTC)[reply]

@2db There is no need to create the parent page (sometimes it can be done when moving pages, but that's an admin-only trick for if you don't use a bot).
There's currently no way to watch all pages in an index in a one-click way, but I am literally currently working on adding that to the server (phab:T289466) and it will then get some kind of UI. 16:51, 9 November 2021 (UTC) Inductiveloadtalk/contribs 16:51, 9 November 2021 (UTC)[reply]
@Inductiveload I suggest making each empty/nonexistant parent in the PAGE namespace into a "police log" that displays a watchlist style report of all the subpages. 2db (talk) 17:03, 9 November 2021 (UTC)[reply]
@2db The underlying task will allow many things like that over the API, but on-wiki, you can already see that particular information via "Related changes" of an index page: https://en.wikisource.org/wiki/Special:RecentChangesLinked/Index:Tarzan_and_the_Golden_Lion_-_McClurg1923.pdf Inductiveloadtalk/contribs 17:11, 9 November 2021 (UTC)[reply]

A new way to transclude multi-page scores[edit]

I've written an initial attempt (in the form of Module:Mscorewithpipes) to transclude a multi-page score without some of the restrictions previously imposed by Template:Tscore. Namely, it is possible to use pipes and unseparated brackets (and perhaps templates; haven't checked this myself) within the <score> tag on a page. On the other hand, the module expects certain aspects of the overall score structure (such as the ordering of staves and layout preferences) to be marked with specific comments, and at the moment expects notation to be presented in separate contexts after specifying the staff order (these contexts also marked with specific comments). My hope is that the net changes needed compared to the previous attempt at score transclusion lead to a less intrusive experience proofreading scores.

An example of this module's use is at Asleep in the Deep (1898). Comments and criticisms welcome. @CalendulaAsteraceae: as someone who might be interested in this. Mahir256 (talk) 00:15, 11 November 2021 (UTC)[reply]

Question about TOC[edit]

I working on transcluding The Wealth of Nations, but the TOC is a style that I do not recognize. Can somebody take a look at Page:The wealth of nations, volume 1.djvu/15? Languageseeker (talk) 02:56, 11 November 2021 (UTC)[reply]

I'm wondering about if this will export cleanly and whether or not automatic headers will work with this style of TOC. Languageseeker (talk) 03:11, 11 November 2021 (UTC)[reply]
  • Languageseeker: Yes, this will export cleanly; it was designed with export in mind (and made quite recently, too). Automatic headers aren’t dependent on templates, so that won’t be a problem. TE(æ)A,ea. (talk) 03:15, 11 November 2021 (UTC)[reply]
TE(æ)A,ea. Hmm, automatic headers don't seem to work The Wealth of Nations/Introduction and plan of the work. Languageseeker (talk) 04:06, 11 November 2021 (UTC)[reply]
@TE(æ)A,ea. Actually {{TOCstyle}} is pretty questionable on export, because every line is a <li> element, which contains a whole separate <table>.
It looks like this in Koreader, for example: phab:F34741733 (the last line has gone wrong). This is one of the better exports for TOCstyle, some other models are a bit less successful: phab:F34568887. 07:58, 11 November 2021 (UTC) Inductiveloadtalk/contribs 07:58, 11 November 2021 (UTC)[reply]
  • Languageseeker: The table of contents needs to be added to the “Table of contents” field in the Index: to work for automatic headers. The reason it’s not fully working is because the Wiki-links have not yet been fully added (the chapters don’t have any, for example). Once that is fixed, there won’t be a problem. TE(æ)A,ea. (talk) 13:17, 11 November 2021 (UTC)[reply]
Thanks! Languageseeker (talk) 15:34, 12 November 2021 (UTC)[reply]

Index:Over the Sliprails - 1900.djvu[edit]

An easy proofread for whoever wants one. —Beleg Tâl (talk) 19:49, 11 November 2021 (UTC)[reply]

Celebrating 18 years of Wikisource[edit]

Hello Wikisource enthusiasts and friends of Wikisource,

I hope you are doing alright! I would like to invite you to celebrate 18 years of Wikisource.

The first birthday party is being organized on 24 November 2021 from 1:30 - 3:00 PM UTC (check your local time) where the incoming CEO of the WMF, Maryana Iskendar, will be joining us. Feel free to drop me a message on my talk page, telegram (@satdeep) or via email (sgill at wikimedia.org) to add your email address to the calendar invite.

Maryana is hoping to learn more about the Wikisource community and the project at this event and it would be really nice if you can share your answers to the following questions:

  • What motivates you to contribute to Wikisource?
  • What makes the Wikisource community special?
  • What are the major challenges facing the movement going forward?
  • What are your questions to Maryana?

You can share your responses during the live event but in case the date and the time doesn't work for you, you can share your responses on the event page on Wikisource or in case you would like to remain anonymous, you can share your responses directly with me.

Also, feel free to reach out to me in case you would like to give a short presentation about your and your community's work at the beginning of the session.

We are running a poll to find the best date and time to organize the second birthday party on the weekend right after 24th November. Please share your availability on the following link by next Friday:

https://framadate.org/zHOi5pZvhgDy6SXn

Looking forward to seeing you all soon!

Sent by MediaWiki message delivery (talk) 09:10, 12 November 2021 (UTC)[reply]

Note for everyone: there's a live stream here: https://stream.meet.google.com/u/1/stream/60c89368-65a7-4f27-8934-2f053e62f7f1. Even if you are not in the Google Meet, you should be able to listen in there. Inductiveloadtalk/contribs 13:24, 24 November 2021 (UTC)[reply]
Nope, doesn't work. Languageseeker (talk) 13:48, 24 November 2021 (UTC)[reply]

on behalf of User:SGill (WMF)

Source digitized by Google[edit]

The following source has been digitized by Google:

Every page of the downloaded PDF is watermarked, is it OK to upload this PDF to Commons after stripping the top page even though the other pages are watermarked? --2db (talk) 14:43, 12 November 2021 (UTC)[reply]

@2db: You'd have to ask at Commons to get a definitive answer, but merely having the text "Digitized by Google" at the bottom of each page should not be a problem as there is nothing copyrightable in that text. The first page is different as that contains a lot more text that could theoretically be construed to be copyrightable, and the logo which definitely is. Xover (talk) 15:03, 12 November 2021 (UTC)[reply]
As a wordmark, the Google logo is held to be not copyrightable by Commons (it's still a trademark, obviously). Hence the various denizens of commons:Category:Google logos. I think the watermark is safely void of creative input, though the first pages should indeed be stripped. That said, commons:Book scans with Google Books cover sheets (to remove) has ~400 members and countless others floating about the place I should think. Inductiveloadtalk/contribs 15:08, 12 November 2021 (UTC)[reply]
Three years ago, the google covers were being removed "because they are ugly". My suggestion was to place a |zz]] at the end of the category assignment, but that only puts the "ugly covers" last if there are images. It is weird to see the "ugly covers" reason being fluffed up with trademark/copyright concerns now. I mention it now because if this is just an evolution of a personal preference, it shouldn't have to cause consternation and excessive work for people who are more worried about the end product, ie, the transcription and transclusion of the "ugly scan". I mean, there is already quite a bit to care about....--RaboKarbakian (talk) 15:38, 12 November 2021 (UTC)[reply]

Styling and semantics on requested works[edit]

Hey, this is kind of weird: for some strange reason, pages like Wikisource:Requested_texts/1924 and Wikisource:Requested_texts/1928 have bulleted lists of works that use proper semantics to show that they are lists but weirdly, Wikisource:Requested_texts/1923 doesn't. I propose that all of these pages use actual unordered lists because they are exactly that: lists. Does anyone else think that 1923 is different from every other year and should have a different format? Does anyone else think that listings are somehow better by breaking them up into dozens or hundreds of paragraphs rather than lists? If the community consensus is that these pages should be consistent and that lists should be lists rather than streams of paragraphs, I'll change the 1923 page. Thanks. —Justin (koavf)TCM 21:56, 13 November 2021 (UTC)[reply]

The 1923 page is not different in formatting from the other pages. Some items on the pages are bulleted lists, some are not. Some items are separated from each other by blank lines, and some are not. The 1923 page is not different from the other pages in this respect; the same is true of lists on the other pages. Why does the 1923 page have to have a bullet in front of every single item at every level when the lists on the other pages do not? --EncycloPetey (talk) 22:04, 13 November 2021 (UTC)[reply]
I'm very interested to see which items are not part of bulleted lists on Wikisource:Requested texts/1924 or Wikisource:Requested texts/1928 or Wikisource:Requested texts/1929 or Wikisource:Requested texts/1930 or Wikisource:Requested texts/1931 or Wikisource:Requested texts/1926 or Wikisource:Requested texts/1945 or Wikisource:Requested texts/1927. It seems like we've used proper semantics on all of those pages to make bulleted lists of works to transcribe and then nested lists for things like individual volumes or specific authors of multiple works or commentary. Am I wrong? —Justin (koavf)TCM 22:09, 13 November 2021 (UTC)[reply]
A casual look at those pages will reveal items without bullets and groups separated by blank lines. --EncycloPetey (talk) 22:12, 13 November 2021 (UTC)[reply]
Every single work on all of those pages is part of a bulleted, unordered list and all of the commentary is included as a nested list. Give me an example of a work that is not. It is only on the 1923 page that they are not. —Justin (koavf)TCM 22:24, 13 November 2021 (UTC)[reply]
Old New York volumes on 1924; Great Gatsby editions on 1925; Yale Shakespeare volumes on 1926. . . --EncycloPetey (talk) 22:34, 13 November 2021 (UTC)[reply]
@EncycloPetey: Please actually look at those pages before you spread misinformation. Did you actually look at the Old New York volumes? They have "**: " at the beginning, making them unordered lists nested under an unordered list. As do the Great Gatsby editions in 1925. As do the Yale Shakespeare volumes in 1926. Thank you for proving my point for me. Now, can you actually show me any works in any of the other pages that are not listed in semantically meaningful unordered lists, which I have asked you several times now and you have given me examples of the exact opposite? Why do you think 1923 is different from every other year? —Justin (koavf)TCM 22:56, 13 November 2021 (UTC)[reply]
So, you are fine with an absence of visible bullets? What you are arguing for is invisible formatting because. . . . ? Is there any reason for making the change in terms of Wikisource? Is this a desire to impose a specific format on a select group of working pages alone? --EncycloPetey (talk) 23:11, 13 November 2021 (UTC)[reply]
@EncycloPetey: You obviously did not look at this edit or the one prior to it or else you would not be asking this question. You have also argued that we should follow the existing practice and that the pages should be similarly formatted: that is literally what I am arguing. I never made an argument about style: I made one about semantics. Proper semantics help structure information in web pages (e.g.) for search engines or screen readers, etc. Why are pages enhanced by removing semantic elements? Why is 1923 different from all other years? Can you actually show me any works in any of the other pages that are not listed in semantically meaningful unordered lists, which I have asked you several times now and you have given me examples of the exact opposite? Did you actually even look at the edits I made or did you blindly revert? —Justin (koavf)TCM 23:30, 13 November 2021 (UTC)[reply]
No, I have not made that argument. That is indeed the argument you are making, but I have not made that argument. These are working pages, back-end lists of things to be accomplished, and I see no reason why they should be forced to be consistent. And why would we be concerned with such working space pages in the Wikisource namespace appearing in search engines? But the key question I have been asking is "Is there any reason for making the change in terms of Wikisource?" and you have provided no answer except to say that it should be that way because it is that way, which is circular. Since this discussion is unproductive, I will wait to see what others say. --EncycloPetey (talk) 23:38, 13 November 2021 (UTC)[reply]
@EncycloPetey: Since, as you have shown, all the other pages are semantically correct and since there is on reason why 1923 is somehow different than other years, it follows that it should be semantically correct as well. Plus, as I've already explained, all web pages should have proper semantics, this is a web page, therefore, it should have proper semantics. As I have asked you and you have ignored: Why are pages enhanced by removing semantic elements? Why is 1923 different from all other years? Can you actually show me any works in any of the other pages that are not listed in semantically meaningful unordered lists, which I have asked you several times now and you have given me examples of the exact opposite? Did you actually even look at the edits I made or did you blindly revert? Note also that "e.g." refers to examples and that you skipped over screen readers. Please actually answer the questions that have been asked of you, as I have done for you. —Justin (koavf)TCM 00:14, 14 November 2021 (UTC)[reply]
It seems to me that an unordered list (i.e. *) is a pretty natural representation for such a list (the other being a section per work, like the MC nominations, which is overkill here, since the sections don't need to be archived). I do not see why this deserved too be instantly reverted, to me I appears to be a perfectly fair AGF/BOLD edit.
On one hand, I don't find myself especially worried by the semantic correctness of such back-room pages (which is really my own laziness and chauvinism as someone who doesn't need a screen-reader, though my excuse is I don't have one and though I have tried to use Orca to see how Wikisource behaves in a screen-reader, I've never really gotten it to work). Also, if we actually made a concerted effort to make Wikisource more accessible (which we should be, but aren't really, doing) I wouldn't necessarily start there of all places. On the other hand, certainly there's nothing wrong in my book with such a change there. Inductiveloadtalk/contribs 00:29, 14 November 2021 (UTC)[reply]
^^^ This. Who gives a toss, fix it or leave it. They are work pages in the Wikisource: ns. Is this really sucking up our time? — billinghurst sDrewth 01:06, 14 November 2021 (UTC)[reply]
Consistency is good, whether as policy, or convention, or unwritten and new. This type of it aids in legibility, and reduces decipherment. Presenting a list as a list in the HTML document I find desirable. And I do not find long bullet-pointed lists bad. —Genesis Bustamante (talk) 01:16, 14 November 2021 (UTC)[reply]

@EncycloPetey: Please unprotect and revert yourself per the above. —Justin (koavf)TCM 06:52, 20 November 2021 (UTC)[reply]

Jungle Tales of Tarzan (1919) issue with chapter split[edit]

I noticed that some of the chapters of "Jungle Tales of Tarzan" (1919) that I have been working on have been replaced with the versions I have worked on. However, there seems to be a slight issue with the division between Chapters IX and X. Some of the pages from Chapter IX are in the Chapter X space. Chapter IX ends at https://en.wikisource.org/wiki/Page%3AJungle_Tales_of_Tarzan.djvu/245 The ending of Chapter X is correct and there do not appear to be any other issues with chapter splits. Thanks for all your help, and I'm glad we're using the updated version! SurprisedMewtwoFace (talk) 00:09, 15 November 2021 (UTC)[reply]

@SurprisedMewtwoFace: Thank you for working on this book. Sorry about that. Adjusted and fixed. In the future, you can fix errors by changing to to and from values in <pages index="Jungle Tales of Tarzan.djvu" (index name) from=246 (start of chapter) to=276 (end of chapter) header=1/>. Languageseeker (talk) 00:28, 15 November 2021 (UTC)[reply]
@Languageseeker Thanks so much for your help! It is much appreciated. I plan on doing some proofreading on "Tarzan and the Golden Lion", which is a monthly challenge book, next. SurprisedMewtwoFace (talk) 00:32, 15 November 2021 (UTC)[reply]
@SurprisedMewtwoFace: Looking forwards to it! Always great to have more people helping out in the Monthly Challenge. Languageseeker (talk) 03:33, 15 November 2021 (UTC)[reply]

Invisible change[edit]

Can anybody explain, what has been changed at [7] ?--Jan Kameníček (talk) 17:33, 15 November 2021 (UTC)[reply]

I think that's a change in how the data is stored internally in the database. Years ago, it was stored as actual complete chunk of Wikitext. It is now serialised differently and only appears to be Wikitext because the serve constructs the Wikitext live on demand (you can also have it served to you as JSON). There is a small amount of more technical detail here: mw:Extension:ProofreadPage/Index_data_API. 17:54, 15 November 2021 (UTC) Inductiveloadtalk/contribs 17:54, 15 November 2021 (UTC)[reply]
@Inductiveload: Well, if it is only such a technical issue, why was the edit attributed to a novice user who had founded their account only two minutes before? --Jan Kameníček (talk) 18:01, 15 November 2021 (UTC)[reply]
@Jan.Kamenicek It's probably just that they saved the page. Normally that would be a null edit that doesn't actually result in a revision, but in some very old ProofreadPage content model pages, it can result in a revision actually being saved. 18:05, 15 November 2021 (UTC) Inductiveloadtalk/contribs 18:05, 15 November 2021 (UTC)[reply]
@Inductiveload: Should we null-edit all index pages by a bot? Ankry (talk) 20:58, 15 November 2021 (UTC)[reply]
There's no need: this kind of update is specifically designed to be able to happen transparently on next save. If there was a technical need to migrate storage formats, it would be done directly with a database upgrade script, and should be done for all Wikisources. Inductiveloadtalk/contribs 21:03, 15 November 2021 (UTC)[reply]

Tech News: 2021-46[edit]

22:06, 15 November 2021 (UTC)

Relevant changes[edit]

There are two major changes of note for Wikisource. The first is mentioned above: large file uploads are hopefully fixed, thanks to the hard work of User:Legoktm and the eagle eyes of User:Xover.

The second one is one that's finally pushed though review after a few months' hiatus: OpenSeaDragon (OSD) image zooming in the Page namespace editor, thanks to the hard work of Yash9265 (via a Google Summer of Code, AFAIK no enWS account) and our mate User:Samwilson. This should be much more robust and functional for image zooming and also, due to the OSD plugin system, allows a lot of extra features.

For example, it will soon be possible to OCR only portions of images. The patch for this may not make the cut this week, but you can see a sneak preview at this PatchDemo (note: only PDF files work there). There are more cool features in the pipeline for OSD, like a marker line widget to keep your place, but regional OCR's probably the most requested feature.

The interface to the zoom controls is basically the same, but hopefully the finicky click-to-enter zoom mode will be a thing of the past (not that it was wasn't a brave attempt when written)! Welcome to the future! Inductiveloadtalk/contribs 23:13, 15 November 2021 (UTC)[reply]

@Inductiveload, @Yash4357, @Sohom data: Yes, thank you all for your hard work! It's exciting to see this happening. — Sam Wilson 23:20, 15 November 2021 (UTC)[reply]
I'm always in awe of the technical users making these changes on the backend that enable or make efficient thousands and millions of edits later. Great work, comrades! —Justin (koavf)TCM 23:26, 15 November 2021 (UTC)[reply]

Getting completed Wikisource transcriptions into local library catalogues?[edit]

Hi all. User:Giantflightlessbirds, User:Eothan and I have been throwing around ideas about how to show completed Wikisource works in local library catalogues. We were wondering if any related work has been done in this space? We have also made an initial approach to OCLC who are engaging with how open content might be made available on Worldcat. We'd love to hear if anyone else is interested in pursuing this, or if the community already has ideas. --99of9 (talk) 22:57, 15 November 2021 (UTC)[reply]

@99of9: Do you have in mind printed books on dead trees (I hope bamboo) sitting in a brick-and-mortar library or do you mean some kind of digital access to Wikisource editions that online users of the library could find? If it's the former, PediaPress is tangential to this and if it's the latter, maybe Kiwix is kinda/sorta related. I don't know of any examples of actual libraries either hosting books made of Wikimedia movement editions, nor do I know of any libraries whose digital card catalogue also includes entries for accessing our works digitally. —Justin (koavf)TCM 23:29, 15 November 2021 (UTC)[reply]
@Koavf: the latter. A regular library customer should search their catalogue, and see that WS items can be read/"borrowed" digitally. Perhaps only those selected by the librarians as locally-relevant. I'll look into Kiwix. --99of9 (talk) 23:33, 15 November 2021 (UTC)[reply]
If you're already doing outreach for a scheme like this, I would be very surprised if Internet Archive hadn't done something like this in the past three decades. —Justin (koavf)TCM 23:37, 15 November 2021 (UTC)[reply]
Here's an example. We proofread Old Westland and uploaded it manually as an EPUB to Overdrive in my library, so it shows up for loan (7 copies on loan at the moment): [10]. We've also listed it in our online catalogue next to the physical book, with a link to Wikisource. This is a bit tedious to do one at a time, so any way a library could streamline the process, perhaps via WorldCat, would be nice. —Giantflightlessbirds (talk) 00:16, 16 November 2021 (UTC)[reply]
@Koavf: Yes, it appears that the Internet Archive may have achieved part of this already. This example on Worldcat links directly to the Internet Archive scanned version. It would be nice if it also linked to our transcribed version. --99of9 (talk) 00:29, 16 November 2021 (UTC)[reply]
@99of9, @Giantflightlessbirds, @Eothan: This is a topic that I’ve thought intensely about and that I care deeply about. My basic response would be that “yes, this is possible, but it would require libraries to hire paid staff for it to work.” Wikisource has numerous advantages and several drawbacks that I would enumerate as follows:
  1. On a broad scale, Wikisource features two types of texts: scan-backed and non-scan-backed. Scan-backed texts have an Index in Proofreader Page extension where the digital text can be produced in direct comparison with a digital, photographic reproduction of the original text. These are the golden standard for digital text because it allows for an easy comparison between the original and digital versions of a work which can satisfy scholarly standards. Non-scan backed texts are drawn from various sources across the internet. Some of them are quite good, but most are either incomplete or have dubious textual quality. As a broad category, non-scan backed works must be considered as junk and no library will want to incorporate them into their catalog. Therefore, any library importer will need to separate scan-backed works from non-scan backed work.
  2. Wikisource is committed to open-data access. Therefore, it’s possible to download the raw wikitext of a given Index. This will enable institutions to preserve a copy for themselves. Sites, such as Project Distributed Proofreaders or Literature Online (LION), either remove access from the original text or never provide it. By contrast, once a text is proofread on Wikisource, it will never need to be proofread again. Therefore, Wikisource is the best platform for long-term preservation.
  3. Wikisource is free and does not impose access restrictions. This makes it much easier and cheaper to provide access to works. To be blunt, digital content providers are bleeding institutions dry. They have figured out how to transform the public domain into a goldmine. Across the globe, institutions are paying billions of dollars annually to gain access to texts that are in the public domain. Access to Wikisource is free.
  4. Wikisource is modular which makes it easy to make rapid adjustments. A user can update any one part and all the rest of the pieces will be automatically updated. If a user wants to correct one scanno, the system will automatically retransclude the text and reexport the text.
  5. Wikisource is open-source. This makes it easy to add new features or deploy it in another setting.
These are the positives. Here is the other side
  1. Integration with libraries will require the development of code to transform wikidata into MARC records, automatically retrieve the data from Wikisource, and link it in local catalogs. This is something that libraries will need to pay for.
  2. While Wikisource does offer the possibility of replacing expensive paid databases, volunteers have limitations. The catalog of Wikisource is quite small and it’s impossible to tell volunteers that they must work on something. There are vast chasms is the offering of Wikisource.
  3. Wikisource needs more accessible features. Once again this will require development.
  4. Librarians don’t necessarily make good Wikisource users. While the librarians of the National Library of Scotland have done an absolutely fantastic job of importing and mostly proofreading the Scottish Chapbooks, too many of them remain untranscluded or without images.
In the end, I believe that Wikisource has a lot to offer to libraries, but libraries also need to give back.
  1. Provide development funding. This does not necessarily mean hiring a full-time developer, but for certain projects, it would make sense to provide certain a bounty.
  2. Importing texts into libraries will require lots of grunt-work to fix metadata. This should be paid for.
  3. Pay proofreaders. It’s impossible to expect volunteers to digitize the entire world library in any reasonable amount of time or to have the same interests as libraries. For some rough economics, it takes about 15-20 hours to proofread a novel of around 300 pages. Therefore, the cost of proofreading a book will be this amount of time multiplied by the hourly wage. To make any economic sense, the hourly rate would have to be quite low. This leaves two possibilities: a country where workers speak English, but have a low wage or undergraduates. I believe that hiring workers from a low-salary English-speaking country would just lead to a ton of badly, proofread texts that would wind up burdening the Wikisource community. Therefore, undergraduates stand-out as the best possible labor source. In the US, undergraduates earn between $10-15 an hour. Therefore, to digitize one book will cost between $150-$300 USD. Federal law prohibits most undergraduates from working more than 20 hours a week. Therefore, even one full time undergraduate can probably only digitize 30 books an academic year. This should give you some sense of scale. The only way for libraries to pull this off would be form a consortium. If 10 libraries joined a consortium, the price per-book would drop to $15 for a perpetual license to a work. With 100, it would decrease to $1.50 and so on.
I would love to see more input from libraries, but I wanted to lay out some of the benefits and challenges. It’s not going to be as easy as most would hope for, but I do believe that the rewards will be worth it. Wikisource is the only site that offers libraries a way to stop paying annual subscriptions while maintaining scholarly standards. It will save money, but this will be a transitional process that will cost money rather than a simple, free switchover. Languageseeker (talk) 03:08, 16 November 2021 (UTC)[reply]
Thanks User:Languageseeker for your well considered reply. It sounds like your eventual end goal is very ambitious. Perhaps we should focus on a modest starting point: getting our best complete Wikisource items integrated, trying to impose minimal additional work on anyone. Your comments about scan-backing and metadata completion are crucial to filtering out which items are "ready". We agree with your intuition that this metadata work is best housed on Wikidata. From a simple query it's nice to see over 8000 texts that have both an author and a copyright status in Wikidata. I can't yet see an easy way of filtering for those backed by scans. I'm sure a typical MARC record would want more metadata. Are the required fields/properties well established? --99of9 (talk) 04:55, 16 November 2021 (UTC)[reply]
I wanted to lay out the big picture because I think that you're asking for the hardest bit right now which is the construction of a bridge between Wikisource and a library catalog. This is going to take careful planning and a clear vision of what the ultimate goal is. As I see it, there will probably need to be some sort of software that can take Wikidata and automatically convert it to a MARC record. The question becomes what do you want to import into your library catalog? Do you want to have links to Wikisource or compilied epub/mobi/pdf? Should they have generic covers like enWS or original covers like frWS? How will the library catalog update the metadata when it changes on Wikidata? What about mitigating vandalism? Should there be a time delay? What changes will need to happen on enWS or Wikidata? Who will write and maintain this software? Who will work on the metadata? I fully support this idea and I would love for it to happen, but I don't think its right to ask Inductiveload or Xover to write a quite complex piece of software for libraries. Languageseeker (talk) 05:41, 16 November 2021 (UTC)[reply]
I really think that it would be useful to get some potential libraries to articulate what their needs are to have an understanding what is needed in terms of additional work on top of proofreading and transclusion (e.g. preparing for export onto devices, additional metadata needed, whether they want to merge back the proofread text as a text layer, etc.), areas which could use more people helping to drive them forward. Having outside organizations can be helpful here in providing more motivation and getting more volunteers who might be specially interested in doing that type of work. But as mentioned, there is in addition the area of the integration between the two systems (how to get the data over into their systems and keep it in sync, how to serve the actually ebooks if required). This basically cannot be done without libraries investing in it as likely to vary by library anyways. Projects like "make more works on WS exportable" or "do more with wikidata" are more generic. Separately note that archive has a catalog of books https://openlibrary.org/, there might be possibilities to have WS ebooks on there as well. MarkLSteadman (talk) 10:26, 17 November 2021 (UTC)[reply]
There are large amounts of projects that the few people developing software here can work on, but one way to get more prioritization on something like automated generation of MARC is getting people excited about using it, volunteers excited about going through and backfilling the huge number of works with the required metadata and people interested in helping to build and maintain the code. While it is easy to talk about doing cool things with works once proofread, it takes a large amount of additional work when there is a large backlog of existing stuff to do on the proofreading infrastructure side. MarkLSteadman (talk) 10:40, 17 November 2021 (UTC)[reply]
Open Library has been accepting wikidata ids for a while now and lately, I see that it has a spot for wikisource as well. Open Library and Internet Archive are related enough that the same login works for both.--RaboKarbakian (talk) 16:26, 17 November 2021 (UTC)[reply]
The problem is that Open Library does not offer a MARC export either? Also, not all library software is the same. This is going to be a fairly challenging thing to do. I wish that it wasn't, but I don't see a simple way. Languageseeker (talk) 19:51, 18 November 2021 (UTC)[reply]
It may be that we don't need to work with the entire Marc record as for most items in WikiSOurce there should already be a world-cat Marc record. We will find out if this is the case but we may only need to work on one, or a small subset, of fields. As you say they are already ingesting links to Open Library so some systems must already be in place. Eothan (talk) 21:42, 18 November 2021 (UTC)[reply]
Most of the Marc records are fairly poor. My honest suggestion would be to figure out how to do this manual first and document every step. How do you get from an Index to the transclusion to the Wikidata to the MARC record? What fields in Wikidata map to what in a MARC record? What needs to be there? What should the program log as errors? How do you handle complex cases where a single Index is transcluded to multiple areas? Document everything and ask for help for the small part. People are willing to help, but the question have to be specific. In the end, you'll asking to import something on the scale of over 20,000 items into local catalogs. So the value is there, but the system needs to be created and thought through. Also, overdrive is not the answer. Instead, the link to the wikisource should probably be recorded as 856 40 |y Online book |u . Languageseeker (talk) 01:45, 20 November 2021 (UTC)[reply]

Indexes created from transcriptions[edit]

Recently, I've seen a number of Indexes created that are not scans or print-outs of websites. Instead, they are transcriptions of scans that are converted into PDFs. For example, a pdf of a Project Gutenberg work. What is the policy on those? It seems to make so sense to have these as Indexes and it seriously damages the meaning of scan-backing. After all, if a scan can be a Project Gutenberg text, then what is the difference between scan-backing and non-scan-backing? Is there a specific policy on these? If not, should they be added to the sdelete criteria? Languageseeker (talk) 19:49, 18 November 2021 (UTC)[reply]

Scan backing with a PDF from PG is, IMO, just completely pointless, since it's not really any more useful that just the text dump, and introduces a lot more faffing about with pages that never actually existed. If someone must copy-and-paste, just do that. Such transcriptions are no more (or less) valuable than the copy-dump, they just take more effort for no gain. That said, I don't think they need to be specifically militated against other than to avoid people wasting their own time. I don't think it damages the meaning of scan backing in general, any more than painting a racing stripe on a Fiat Punto damages the meaning of a GT 350. People will just wonder what you're smoking. Inductiveloadtalk/contribs 17:17, 19 November 2021 (UTC)[reply]
I don't think we should be confusing the terminology here--if it's backed by a "scan" that's actually a PDF from PG, it's not really scan-backed. IMO, a better definition of "scan" is all that's needed, and then the policy will naturally follow. — Dcsohl (talk)
(contribs)
21:53, 19 November 2021 (UTC)[reply]
i am not confused. project guttenberg, not scan backed, should not be deleted. --Slowking4Farmbrough's revenge 03:21, 20 November 2021 (UTC)[reply]
@Inductiveload: I think it's more of a case of someone ordering a GT 350 and finding out that it has a Ford Pinto engine. As Dcsohl said, I always felt that scan backing should mean that the transcription is based on a scan of the original text, not that the transcription is based on a transcription. If scan backing can be a transcription of a transcription, then what is the difference between transcluded and non-transcluded texts? Languageseeker (talk) 11:56, 20 November 2021 (UTC)[reply]
Whether the "scan" (loosely used) is "valid" is not our problem. We don't make any claim that a work is automatically "better" because it's "scan backed". It really depends on what it's scan backed by. I don't like these "scans", and I think they're completely pointless and wrong-headed, but there is no basis to forbid them specifically.
I don't personally think we are facing an issue here that needs solving. At worst, there may be a few new users who think that's somehow helpful, but the solution to that is engagement and advice, not pre-emptively forbidding things and deletions. If we focus on scan backing our existing PG works, as you have been doing and the problem will solve itself. PG imports as a process pre-date ProofreadPage: treat it like any other long-term cleanup task.
In general, spamming piecemeal reactive rules is not good for a cohesive policy platform. No "PG scans" is too specific. A better proposal, IMO, would be amending WS:WWI with "no new PG, or similar second-hand, texts of any sort", even if "scan" backed by a PDF export of the text. I'd support that. Inductiveloadtalk/contribs 12:11, 20 November 2021 (UTC)[reply]
i agree, we need a PG fyi, PG migration process, and newbie onboarding. we do not need more policy, deletion, and block threats. we will not get more quality by increasing the scrap rate, rather we will need a quality circle. Slowking4Farmbrough's revenge 01:43, 21 November 2021 (UTC)[reply]
I think that a policy is appropriate because it makes it seem less like an unwritten law that is imposed on new users rather than actual policy. I'm always believe that education and a gentle approach is the right way. This formal policy is to make it seem less arbitrary. In many ways, this is a codification of a common law principle. Languageseeker (talk) 03:31, 21 November 2021 (UTC)[reply]

Monthly challenge displaying error message[edit]

The Monthly Challenge for this month is displaying an error message ("The time allocated for running scripts has expired") where the texts should be. Why is this, and how can it be fixed? DraconicDark (talk) 16:19, 19 November 2021 (UTC)[reply]

There are too many indexes (actually, pages) in the MC and there's a performance issue. A fix is being worked on. Inductiveloadtalk/contribs 16:58, 19 November 2021 (UTC)[reply]
Yes check.svg Done The fix for this has been deployed this week and performance on those pages seems much better. Inductiveloadtalk/contribs 12:32, 24 November 2021 (UTC)[reply]

Tech News: 2021-47[edit]

20:02, 22 November 2021 (UTC)

uploading scans to IA -- are you experienced?[edit]

I followed the instructions, and yet there is no ocr there. Just my uploaded tar file.

Is anyone experienced with IA and knows the magic word or the right place to thump it?--RaboKarbakian (talk) 22:23, 23 November 2021 (UTC)[reply]

How long has it been since you uploaded the file to IA? It can take time for the other file types to be generated. --EncycloPetey (talk) 23:12, 23 November 2021 (UTC)[reply]
The marc.xml says:23-Nov-2021 15:02 I pushed their "Derive button" this morning because they seemed to be just sitting there and what I could glean from the "management" all of the files were in the purple bunch and that meant everything was done and fine. I am afraid of annoying them by pushing it a second time.--RaboKarbakian (talk) 01:16, 24 November 2021 (UTC)[reply]
Given that we are close to a US Holiday, and that the process can take several days anyway, I would be patient and check back next week. --EncycloPetey (talk) 01:20, 24 November 2021 (UTC)[reply]
I think it's supposed to be a zip file, not a tar. A link to the item would be helpful too. Inductiveloadtalk/contribs 07:54, 24 November 2021 (UTC)[reply]

AWB Approval[edit]

Is there a separate process for AWB approval for enWS or is it the same as enWP? Languageseeker (talk) 13:45, 25 November 2021 (UTC)[reply]

We haven't put restrictions on its use, and more interested in the function and intensity of its use, and this is related to fact that with works and the Page: namespace that we are different, and it hasn't been abused. Now noting that our designation of bots is a little different, and noting that AWB can be semi-automated. If you are looking to run AWB as a bot, see WS:Bots. If you are running as a user, then it is about consideration of the RecentChanges, and then considering whether it should be run as a bot. For instance, I run it as an automated bot through my bot account, and semi-automated through this account where I have pattern/formulaic changes that I eyeball prior to saving, or where I need admin rights to process some of what I am doing. I would also note that if you are going write regexes for changes, that can undertaken — billinghurst sDrewth 09:29, 26 November 2021 (UTC)[reply]

Talk to the Community Tech: The future of the Community Wishlist Survey[edit]

Magic Wand Icon 229981 Color Flipped.svg

Hello!

We, the team working on the Community Wishlist Survey, would like to invite you to an online meeting with us. It will take place on 30 November (Tuesday), 17:00 UTC on Zoom, and will last an hour. Click here to join.

Agenda

  • Changes to the Community Wishlist Survey 2022. Help us decide.
  • Become a Community Wishlist Survey Ambassador. Help us spread the word about the CWS in your community.
  • Questions and answers

Format

The meeting will not be recorded or streamed. Notes without attribution will be taken and published on Meta-Wiki. The presentation (all points in the agenda except for the questions and answers) will be given in English.

We can answer questions asked in English, French, Polish, Spanish, German, and Italian. If you would like to ask questions in advance, add them on the Community Wishlist Survey talk page or send to sgrabarczuk@wikimedia.org.

Natalia Rodriguez (the Community Tech manager) will be hosting this meeting.

Invitation link

We hope to see you! SGrabarczuk (WMF) (talk) 20:03, 26 November 2021 (UTC)[reply]

Seeking opinion: Moves in Index: and Page: namespaces[edit]

Hi to all. At the moment, users/wikisourcers are able to move pages in the Index: and Page: namespaces (Special:ListGroupRights => Users) however it leaves redirects which require tidying up by an administrator. We have been tracking this with Special:AbuseFilter/36 ([11]). It seems to me that this half-pregnant approach is not particularly working. We either need to allow users to move without redirects in that namespace (if that is even possible to have it set to be namespace specific moves), or we stop the ability to move from these namespaces for general users.

Interested to hear users thoughts. — billinghurst sDrewth 01:10, 28 November 2021 (UTC)[reply]

My mid-term plan is to introduce a back-end user right to move without redirects specifically in the Page/Index NS and then allow wikis to individually assign this as part of user groups (eg autoconfirmed or a dedicated group) as/if they wish on a wiki-by-wiki basis: phab:T293200. Inductiveloadtalk/contribs 10:41, 28 November 2021 (UTC)[reply]

Unprotect The Time Machine (Heinemann text)[edit]

I'm trying to transclude over the existing text, but the Chapters seems to be under sysop level protection since 2006. Could this restriction be removed? Languageseeker (talk) 14:14, 28 November 2021 (UTC)[reply]

Done. At the same time I think the text should be defeatured (which imo applies to all non-scan backed texts) and renominated after all the work is done. --Jan Kameníček (talk) 16:48, 28 November 2021 (UTC)[reply]
@Jan Kameníček Thanks. I think some of the chapters are still under protection. See, The Time Machine (Heinemann text)/Chapter III, The Time Machine (Heinemann text)/Chapter V, The Time Machine (Heinemann text)/Chapter IX.Languageseeker (talk) 17:01, 28 November 2021 (UTC)[reply]
Yes check.svg Done --Jan Kameníček (talk) 17:40, 28 November 2021 (UTC)[reply]
@Jan Kameníček The Time Machine (Heinemann text)/Chapter IX still looks protected. Languageseeker (talk) 17:42, 28 November 2021 (UTC)[reply]
So now it should be finally OK :-) --Jan Kameníček (talk) 17:46, 28 November 2021 (UTC)[reply]
It is! Thank you! :) Languageseeker (talk) 17:49, 28 November 2021 (UTC)[reply]
@Jan Kameníček One more request. Could you delete The Time Machine (Heinemann text)/Epilogue as it is not in the original book. (There's a different epilogue that already transcluded) Languageseeker (talk) 18:11, 28 November 2021 (UTC)[reply]
Yes check.svg Done . It seems to be taken from some much later edition. The text should have never been featured in this state. --Jan Kameníček (talk) 18:38, 28 November 2021 (UTC)[reply]
When it was Featured, it was not the Heinemann text, but a later edition of The Time Machine. It became "the Heinemann text" as a result of this move in 2010. --EncycloPetey (talk) 19:26, 28 November 2021 (UTC)[reply]

I note that the Table of Contents from this new Heinemann text points to chapters in the Holt text. That can't be right. --EncycloPetey (talk) 16:57, 28 November 2021 (UTC)[reply]

Fixed my tab confusion. Languageseeker (talk) 17:01, 28 November 2021 (UTC)[reply]
It seems that the work was featured despite the fact that it was incomplete as some chapters were omitted… --Jan Kameníček (talk) 17:49, 28 November 2021 (UTC)[reply]
In 2006, when the Featured Text process was in its first year. Also see above. When it was featured it was not "the Heinemann text"; it should never have been declared so, since it did not include the final chapters from the Heinemann text. --EncycloPetey (talk) 19:24, 28 November 2021 (UTC)[reply]

Adventures List...[edit]

I've reinstated by Adventures list, with a view to the remaining unvalidated (or non scan backed volumes) being added to the Monthly Challange at some point.

There are currently 20 or so works which are not scan backed on Wikisource. ( Only 15 or so of these have no located scans.)

Would other contributors please assist in matching the remaining works to suitable scans? Thanks. ShakespeareFan00 (talk) 18:24, 28 November 2021 (UTC)[reply]

@ShakespeareFan00 The list looks amazing. I'm deeply grateful to you for making it. Speaking from the perspective of the MC, I have a few tips/requests. First, only the original publication or a printing of the electrotyped/stereotyped text should be listed. I don't want to run into the situation where I ask users to proofread some derivative edition. Second, no match-and-splits. Many of these texts are extremely popular and have more editions than one can count, match-and-splitting usually creates an unmitigated mess. Three, for translated works, it would be nice to pick a specific translation.
In general, I'm going to add it to the nominations. Jules Vernes already failed, so his works will probably be added to the bottom. Is there any particular order that you would like the texts to be featured in? Languageseeker (talk) 00:20, 29 November 2021 (UTC)[reply]
Thanks for considering these as nominations for the MC.
Not currently, Somewhere I have the publication order of the UK partwork, which inspired the list, and I may consider if certain works haven't been featured by the time I find it, use the ordering of that list.. I added a few additional suggestions to the list of volumes the original partwork had, (and had to remove a few for copyright reasons.)
For translated works, a specifc translation is indeed a good idea. Also in respect of multiple editions, I would suggest concentrating on first (and if we already have those "popular" editions), or those with illustrations by regarded artists, (Such as editions with illustrations by Rackham to give one example.) ShakespeareFan00 (talk) 09:47, 29 November 2021 (UTC)[reply]
I'm open to suggestions on what could be added to it. (I already add The Four Feathers for example.)
For example, Kim is another possibility for inclusion. It depends on what you count as the "adventure" genre and what is considered a work of literary or cultural interest.

ShakespeareFan00 (talk) 09:47, 29 November 2021 (UTC)[reply]

Adding Images to Index:Baum - The Wonderful Wizard of Oz.djvu[edit]

The Index for The Wizard of Oz has been proofread, but needs the illustration added from a high-resolution LOC scan. Would anyone like to take this request? Languageseeker (talk) 18:41, 28 November 2021 (UTC)[reply]

Why LOC scan? Any problem with the jp2 folder of the source site (1)? Moreover, images are already present at 2. Hrishikes (talk) 13:48, 30 November 2021 (UTC)[reply]
The LOC has the highest image quality available because they have the original TIFF available. JP2 are a lossy compression and incur further losses with image processing. Languageseeker (talk) 14:15, 30 November 2021 (UTC)[reply]

Category:Bulgarian authors[edit]

Hi, I'm wondering why there are so many French authors in the Category:Bulgarian authors. Is it an issue with Wikidata? --M-le-mot-dit (talk) 15:01, 29 November 2021 (UTC)[reply]

I'm not sure either. But I don't think it's a Wikidata issue, since authors like Author:Eugène Aubry-Vitet are categorized as Bulgarian and not French, despite not having a single mention of Bulgaria on his Wikidata page. I haven't seen any Bulgarian authors in that category, so it might be a problem with the category? DoublePendulumAttractor (talk) 15:35, 29 November 2021 (UTC)[reply]
It was a typo in the auto-categorisation for authors. It's fixed now, thanks for the report. Inductiveloadtalk/contribs 15:54, 29 November 2021 (UTC)[reply]

Tech News: 2021-48[edit]

21:15, 29 November 2021 (UTC)

Relevant changes to Wikisource due in wmf.11[edit]

Some relevant changes to be released in wmf.11 include:

  • Some page viewer configurations, which you will be able to set at Special:Preferences#mw-prefsection-editing-proofread-pagenamespace:
    • You can now set the animation speed on the OpenSeadragon (OSD) viewer, which makes the panning and zoom feel smoother. The default is '0', which is how it's always been. The usual OSD default is 1.2. I personally like about 0.5.
    • You can now set the zoom step of the viewer. The default is 1.2, which is roughly similar to how it used to be when zoomed out, but the new viewer does not zoom asymptotically slower as you zoom in (this was arguably a bug or oversight in the old implementation). If you find the zoom too aggressive, you could try to lower it to, say, 1.1.
  • Page rotation is now supported in the viewer
  • OSD click-to-zoom is now set to the same as the scroll step
  • Various other small OSD fixes (e.g. phab:T296153, phab:T296260).
  • The CodeMirror syntax highlighting editor should now at least be the right size in the Page namespace (Gerrit 740821)
  • The default name for the Index page config data is now Mediawiki:proofreadpage_index_data_config.json, not Mediawiki:proofreadpage_index_data_config. The old name will continue to work as a fallback.
  • JS config variables for page number, index name, index fields and image URLs are now set in edit, view and submit modes (phab:T285218, phab:T255345, phab:T167200 and phab:T204384). This will make it easier to get useful information about a given page from a JS script or gadget.
  • There is now a formal API to get a list of the page in an index: mw:Extension:ProofreadPage/Index pagination API.

Before wmf.11 lands (tomorrow evening UTC, assuming no train blockers like 2 weeks ago), you can, as always, try out every change that's been merged so far at Beta Wikisource.

The following completed items are still in review and will again be deferred:

  • Image region OCR (e.g. for OCRing a single column) phab:T294903
  • Marker lines in the page NS (phab:T296160)
  • Position persistence after reload
  • Correctly toggling the indicators for horizontal/vertical layout

The following item needs community input:

  • Middle-click-to-toggle between zoom and scroll (also adds Ctrl/Shift scroll actions) phab:T296079. Input is still sought on the ideal user experience for scroll/panning at that Phabricator issue or here. No concrete positive (i.e. what people would like to see) suggestions have yet been made.

You can try all the above features, including the current "best-guess" about the scroll and pan here: https://patchdemo.wmflabs.org/wikis/2ae7e75fef/wiki/Main_Page.

Inductiveloadtalk/contribs 15:36, 30 November 2021 (UTC)[reply]

Category:Pages with illegal formatting in header fields[edit]

"Illegal" is a very strong word. Who decided that certain formatting is "illegal" in header fields? Was this a community decision, or unilateral? I suspect that someone took on the task of getting some consistency into our headers, and I'm sure this was helpful overall... but I would prefer if this category focussed on the fact that the standard template can't do some things; e.g. ":Category:Pages with header field formatting not yet supported by the standard header template" or ":Category:Pages with explicit formatting in header fields".

Headers on pages like A Passionate Pilgrim and Other Tales (Boston: James R. Osgood & Co., 1875)/Madame de Mauves/Part 5 have been crafted to express clearly that this page is part of a particular short story as published in a specific edition of a book. It concerns me that someone might come along and strip out this formatting, and return it to a bog-standard header that doesn't effectively communicate what the page is, just because the formatting has been declared "illegal".

Hesperian 22:40, 29 November 2021 (UTC)[reply]

Adventures lirst...[edit]

With a few additions/substitutions, it covers most of the list of volume published as part of the original partwork.

However, I'd like some suggestions on what could be added to cover certain 'adventure' novels that may be lacking.

There is currently no adventure novel that is set in the period of Imperial Rome, in the list. My thought was perhaps to have Last Days of Pompeii assuming we have a scan (or one can be located.) but I'm open to other suggestions.

The other absent 'adventure' is those involving Space exploration. Are there any pre 1964 (and not renewed) novella of print Science Fiction's Glden era that would be appropriate? (Alternatively First Men in the Moon by H.G. Wells would be a reasonable choice.)

Do other contributors on Wikisource have suggestions on specific 'adventures' that are worth including in the list, suitable for a general audience, but are sub-genres not covered by those already in the list? ShakespeareFan00 (talk) 08:22, 30 November 2021 (UTC)[reply]

The scans for Pompeii in first edition are here: (IA). Scott's Count Robert of Paris is set in Byzantium. We also could scan back the actual novel from Ancient Rome by Apuleius. Marius the Epicurean is another novel set in Rome. MarkLSteadman (talk) 10:36, 30 November 2021 (UTC)[reply]