Wikisource:Scriptorium/Archives/2021-04

From Wikisource
Jump to navigation Jump to search

Arabella Move

The following discussion is closed:

Moved, to the right volume.

Could someone move Index:The female Quixote, or, The adventures of Arabella (Second Edition V2).pdf to Index:Arabella (Second Edition - Volume 2).pdf. The source file for the first Index was deleted on Commons. 01:19, 5 April 2021 (UTC)

The deletion of the source file should not require a move. We simply upload a local version of that file with the same name as the original file, and everything is corrected. We have admins here with the ability to do this. No rename or move is required. --EncycloPetey (talk) 01:24, 5 April 2021 (UTC)
Never got moved or replaced. Languageseeker (talk) 23:39, 7 April 2021 (UTC)
Clarification. I think the pages had text in them that needs to be moved. Languageseeker (talk) 19:57, 8 April 2021 (UTC)
@Languageseeker: The pages associated with that index were from Vol. 1, not Vol. 2. Was that perhaps the reason the file was deleted at Commons? I'm guessing then that the moved Index: was also for Vol. 1, making even that move rather pointless. In any case, after a detour through Vol. 2, the pages have now been moved to the correct place under Index:Arabella (Second Edition - Volume 1).pdf. --Xover (talk) 15:44, 9 April 2021 (UTC)
@Xover: Oops, I confused myself with bad naming. Appreciate you fixing this and my mistake. Languageseeker (talk) 18:47, 9 April 2021 (UTC)
This section was archived on a request by: Xover (talk) 19:04, 9 April 2021 (UTC)

Move Oliver Twist to Oliver Twist (Boz Issue)

The following discussion is closed:

A community proposal that supersedes this discussion has now been opened at WS:S#Creation of Version Pages for the Individual works of Charles Dickens. If the outcome of the discussion about that proposal necessitates technical measures (such as moving large numbers of pages) new requests for those measures can be opened at that time.

This text is specifically the Boz Issue of Oliver Twist. I'm asking to move it to Oliver Twist (Boz Issue) to disambiguate it. Languageseeker (talk) 19:57, 8 April 2021 (UTC)

@Languageseeker: When you proofread another edition of Oliver Twist we'll move this one to disambiguate them and leave a versions page in its place. In the mean time it's just fine where it is. --Xover (talk) 19:06, 9 April 2021 (UTC)

@Xover, @Peteforsyth: Just to be clear, the current plan is this

  1. Change main link on Author Page to Title Versions, e.g. Oliver Twist Versions
  2. Create Version Page.
  3. For the appropriate edition, create a redirect to Title
  4. Add version information to Title

In the future,

  1. Change main link on Author Page to Title
  2. Move Title to appropriate page
  3. Move Title Version to Title page
  4. Remove redirect

Am I understanding this correctly. Languageseeker (talk) 00:26, 10 April 2021 (UTC)

I am clearly stupid, as you are not making sense to me. Please do this:

  1. Transclude your new version, and add a {{other versions}} to the top point to "Oliver Twist"
  2. Put its information into a new WS item, list as an edition of Oliver Twist (Q164974)
  3. Link to it on from the author page
  4. Come back here and identify the existing work that needs to be moved, and identify the new version

We will fix up the rest either alone or with assistance. — billinghurst sDrewth 01:15, 10 April 2021 (UTC)

  • @Languageseeker, @Peteforsyth: I can't quite grasp where it was that you took a wrong turn, but you're out in the weeds here.
    Neither Oliver Twist versions nor Oliver Twist (Charles Dickens Edition) should exist (and will be deleted as soon as the information in them have been verified to be preserved on a suitable page). The information currently in them belong on Author:Charles Dickens, or possibly on a Portal: of some stripe if the inclusion criteria or organisation conflicts with the goals of an Author: page. Iff there is ever a Proofread transcription of what you call the "Charles Dickens Edition" transcluded onto a suitable mainspace page, then we'll move what you call the "Boz Edition" (i.e. what is currently on Oliver Twist) to a suitable disambiguated name, and then change Oliver Twist into a versions page. That versions page will contain a listing of, and links to, any Proofread edition that we currently host. It will not contain a complete listing of editions that exist of a work, nor a selection of the most significant editions, or… Just the ones we actually have proofread; either fully or in rapid progress towards completion.
    But the bottom line here is that so far there is not another proofread edition of this work, or at least not one that you have identified. So right now you're just making a mess and creating extra work for others over a hypothetical future proofread edition that may or may not ever materialise. I suggest you put your energies towards proofreading that other edition instead. --Xover (talk) 13:53, 10 April 2021 (UTC)
@Xover: The goal is to create standard version pages that exist for other authors, such as All's Well That Ends Well (Shakespeare). Every version listed on this page is one that Charles Dickens directly contributed to. Versions page seem to be fairly standard around here even when there are not other proofread edition. I'm trying to create the same for Charles Dickens. The standard practice seems to be to make the Title page the version page. For Charles Dickens, it seems that Wikisource is adopting a different approach. I'm trying to make Charles Dickens follow the same guidelines as other pages. What makes Charles Dickens different from someone like William Shakespeare? Languageseeker (talk) 14:13, 10 April 2021 (UTC)
@Languageseeker: Shakespeare is an exception in nearly every context; but mostly our pages for Shakespeare's works was the result of an effort to clean up all our Shakespeare pages and a discussion that landed on a compromise to deal with the different tradeoffs involved. In particular, All's Well That Ends Well (Shakespeare) exists and contains what it does mainly because it exists in the context of all the other {{versions}} and {{similar}} pages for these works. It is not an example for how versions pages should generally be constructed. --Xover (talk) 17:20, 10 April 2021 (UTC)
@Xover: I've seen these version pages all over Wikisource, such as here Oedipus Rex and Wuthering Heights. Are these incorrect? If so, should be deleted? I'm not trying to make anyone's life difficult, but I also don't want to flood the author's page and create future maintenance. The truth is, there is not one Oliver Twist or Hard Times or any other work. There are multiple different versions that Dickens wrote. If we put them on the Author's page, should it be an indented list or subheading? Languageseeker (talk) 19:07, 10 April 2021 (UTC)
This section was archived on a request by: Xover (talk) 19:07, 9 April 2021 (UTC)

Please remove wikilivres from header template

The following discussion is closed:

Parameter has been disabled in the template and a temporary tracking category created to track any existing uses.

The site has been dead for years and is not coming back (I no longer own the old domain anyway). —Justin (koavf)TCM 04:39, 10 April 2021 (UTC)

@Koavf: Done --Xover (talk) 13:16, 10 April 2021 (UTC)
This section was archived on a request by: Xover (talk) 06:25, 12 April 2021 (UTC)

Disambiguation pages for chapter numbers

I recently created the page Chapter 4, as a disambiguation page listing every fourth chapter in every book in existence here. I think we should create disambiguation pages for all examples of these, because 3 people on all of planet Earth might not know how to read a table of contents, or go to a subpage manually, or go to Special:WhatLinksHere, or any other DEFINITELY less useful methods than that for getting to the fourth chapter of a book.

It should be pretty easy for people to scroll through the entire page to find what they're looking for, even if they can't use CTRL F. People with slow connections definitely won't have to wait for the page to load for very long at all. And best of all, if pages or subpages are moved in a book, it will be EASY AS CAKE to deal with, with all the chapter disambiguation pages there'll be.

[[:File:Max Headroom broadcast intrusion.webm|Here's some past discussion of this very idea,]] right here in the Scriptorium, 2008. PseudoSkull (talk) 15:47, 1 April 2021 (UTC)

The page is not as described and is actually a link farm to several webm files. The link to the past discussion above is also false (hence my adjustment). I assume therefore that this is an April Fool's joke and have accordingly deleted the page as "not notable". If you were serious about the concept, then open a discussion. Beeswaxcandle (talk) 17:49, 1 April 2021 (UTC)
You young whippersnapper Inductiveloadtalk/contribs 21:59, 1 April 2021 (UTC)

Author description

Can someone show me the rule that demands that the Author description has to be just a few words like "American writer" and cannot be a few sentences telling where and when they were born and where and when they died? Is this a strict !Wikilaw or is someone imposing their personal preferences? If you look at all the authors listed in VIAF and LCCN and the list of authors at Project Gutenberg, wouldn't you want to know more information about them to properly disambiguate them? In some cases there are a dozen people with the same name, some are duplicates, but there is too little information to know. At Wikidata people with the same or similar names get conflated every day through bad merges, some of the entries have to be abandoned because there is no way to determine who-is-who anymore, then VIAF copies our bad data. See for example wikidata:Wikidata:VIAF/cluster/conflating_entities for people irreparably conflalted. I think the "American writer" description is what you would add if the name is already recognizable to anyone with a high school education, so there is no chance of conflation or improper disambiguation. Most people writing newspaper articles for local papers, would not fit this category of recognition. --RAN (talk) 02:15, 1 April 2021 (UTC)

Not everything at Wikisource is written down as "rules". Wikisource Author descriptions are not the place to provide an author's biography. Wikisource does not create encyclopedic content. That is what Wikipedia is for. Wikidata has an organizational project page where problematic VIAF IDs are listed and the issues noted. We do not need to duplicate that function here.
Here, we typically list nationality, field of writing/study/work, and additional author-relevant items such as a pen-name or a link to a spouse who is also a writer. If there are multiple people sharing the same name, then a Disambiguation page is created to disambiguate them. The description itself should be deftly written to allow distinction, but not verbose. For example, Author:George Van Santvoord (1891-1975) is "American literary scholar and professor at Yale", the minimum necessary to distinguish him from another author George Van Santvoord (1819–1863), who was an "American scholar of US government". The field of work and place of employment are enough to distinguish the two. For father and son, when they are both authors sharing the same name, we may state "son of [father]" or "father of [son]." Again a deft and simple description, not a biography. Biographic information can be stored at Wikidata, and if there is enough for an encyclopedic entry, that belongs on Wikipedia.
If you believe you have found sources for biographical details necessary for distinguishing two authors, and that information cannot be placed at Wikidata, that information can (and often has) been placed on the Author_talk page associated with that Author. --EncycloPetey (talk) 02:48, 1 April 2021 (UTC)
smdh. or how i learned to stop worrying and love the wikidata. Slowking4Farmbrough's revenge 01:19, 4 April 2021 (UTC)

Edit the text in a PDF?

I've been trying to find a good way to edit the text in a PDF file prior to converting to DJVU and/or uploading to Commons, ideally with free software. For instance, to remove the words "Digitized by Google" from every page in the OCR text. Does anybody have any tips for this? -Pete (talk) 22:35, 1 April 2021 (UTC)

@Peteforsyth: Have you looked at Pdf Shaper free? — Ineuw (talk) 11:18, 2 April 2021 (UTC)
@Ineuw: Great suggestion, I had not encountered that one. I'm on my Linux computer right now, but I'll be sure to check this out when I'm back on a Windows box. -Pete (talk) 21:14, 2 April 2021 (UTC)
@Ineuw: Seems that the free version permits one to extract the text layer all as one text file, which is very useful, but not what I'm looking for. I don't see an ability to edit the text layer. I might give the free trial of the pro version a try, and see if it's in there. Thanks again for the suggestion. -Pete (talk) 06:39, 3 April 2021 (UTC)
@Peteforsyth: what exactly is the workflow here? Specifically, what is the input data (IA PDF, Google books PDF, raw images, ...)? If DjVu is the desired output, you can handle the text layer much more easily at that point, since DjVu has a set of tools for handling the text layer (djvutxt and djvused), and the text layer is a well-defined s-expression format.
Analogously to djvutxt, PDF text streams can be extracted in a similar format (including line and "block" data) with:
pdftotext -bbox-layout in.pdf out.xml
However, once you've figured out a heuristic for nixing the right words in the XML and made the edit, I do not know how to re-insert modified data like djvused can (though I'm sure with enough grobbling about in the PDF stream references you could manage it). Inductiveloadtalk/contribs 13:31, 3 April 2021 (UTC)

Britannica's articles

There are two identical articles: 1911 Encyclopædia Britannica/Pausanias (general) and 1911 Encyclopædia Britannica/Pausanias (commander). Fix it please. -- Sergey kudryavtsev (talk) 07:59, 2 April 2021 (UTC)

Fixed, the two pages have been unified at 1911 Encyclopædia Britannica/Pausanias (Spartan commander). --Jan Kameníček (talk) 08:50, 2 April 2021 (UTC)
Thank you. -- Sergey kudryavtsev (talk) 09:06, 3 April 2021 (UTC)

Newcomers in Recent changes

One of the filters in Recent changes is "Newcomers" who are supposed to be "Registered editors who have fewer than 10 edits or 4 days of activity". However, it seems that this filter stops displaying edits of those who reached over 10 edits, no matter whether they reached also 4 days of activity. Is it possible to fix the filter locally? --Jan Kameníček (talk) 00:36, 4 April 2021 (UTC)

@Jan.Kamenicek: Huh? Not something that I am seeing in special:recentchanges where are you seeing that. Noting that the page can be variable depending on one's preferences. — billinghurst sDrewth 00:55, 4 April 2021 (UTC)
Oh new editors' contribs https://en.wikisource.org/w/index.php?title=Special:RecentChanges&userExpLevel=newcomer;learner&hidebots=1&hidecategorization=1&hideWikibase=1 Hmm, not certain how much control we have there mw:Release_notes/1.34#New user-facing features in 1.34, definitely not something I have explored. — billinghurst sDrewth 00:59, 4 April 2021 (UTC)
@Billinghurst: Yes, that’s it (only you have added Learners there too and so it shows more). For example there is a new user (Milivojevsasa) who started to be active in en.ws on 3 April 2021, but when I had this filter on I saw his edits only until their number reached 10, then the filter stopped displaying them. At the moment I cannot see his edits when I have this filter on, although he has not reached 4 days of activity in en.ws. The reason might be that he has longer activity in other Wikimedia projects, but I would expect that the filter of recent changes in Wikisource takes into account only Wikisource activity. --Jan Kameníček (talk) 01:10, 4 April 2021 (UTC)
@Billinghurst: Ah, you have probably clicked "new editor’s contribs" at the top which filters out both Newcomers and Learners. But I switched on only Newcomers after clicking "Filter changes". --Jan Kameníček (talk) 01:19, 4 April 2021 (UTC)
(ec) That is the tag itself, I just copy and pasted. Mediawikiwiki is very slim on detail. Phabricator:T149637 seems to be the only place with detail, and it says that NEWCOMER is an AND statement. — billinghurst sDrewth 01:22, 4 April 2021 (UTC)
Hmm, I can see it… If it is so, then it is very confusing, because the legend in RC states "OR", which would in fact also make much more sense... --Jan Kameníček (talk) 01:29, 4 April 2021 (UTC)
NEWCOMER is the combination of edits and time, not "either" statement. Those falling outside of only one of those parameters, fit into LEARNER—which currently aligns with AUTOCONFIRMED. Please provide a url for what you wished reviewed, as I hate guessing. — billinghurst sDrewth 01:32, 4 April 2021 (UTC)
Settings are not locally configurable, though the surroundings may be. — billinghurst sDrewth 01:45, 4 April 2021 (UTC)
@Jan.Kamenicek: It is something you are seeing in the filter names and descriptions? phab:T149385, and probably worth looking at mw:Help:New filters for edit review/Filteringbillinghurst sDrewth 01:49, 4 April 2021 (UTC)
And all now explained as you are using the javascript interface through your preferences, which I am not, so I am not see half the cruft that you are discussing. — billinghurst sDrewth 02:15, 4 April 2021 (UTC)
@Billinghurst: I had a look at the mw help page you have linked to, and in the section Filters lists there is also written: Newcomers–Registered editors who have fewer than 10 edits OR 4 days of activity :-) So after your explanation I will try to remember that they write OR while they mean AND :-) I also understand that it is not possible to change it locally so I will not bother about it anymore and I will use this filter in combination with the Learners filter instead. Thanks very much for all the explanation. --Jan Kameníček (talk) 10:13, 4 April 2021 (UTC)
@Jan.Kamenicek: This is set in Mediawiki like this:
/**
 * The following variables define 3 user experience levels:
 *
 *  - newcomer: has not yet reached the 'learner' level
 *
 *  - learner: has at least $wgLearnerEdits and has been
 *             a member for $wgLearnerMemberSince days
 *             but has not yet reached the 'experienced' level.
 *
 *  - experienced: has at least $wgExperiencedUserEdits edits and
 *                 has been a member for $wgExperiencedUserMemberSince days.
 */
$wgLearnerEdits = 10;
$wgLearnerMemberSince = 4; # days
$wgExperiencedUserEdits = 500;
$wgExperiencedUserMemberSince = 30; # days
and is implemented as follows:
		if ( $editCount < $wgLearnerEdits ||
		$registration > $learnerRegistration ) {
			return 'newcomer';
		}
so they should be a "newcomer" if they have either (less than 10 edits) OR (younger than 4 days) (or both). Which makes some sense, as you don't stop being a newcomer if you have 2 edits, 8 weeks apart. I think the confusion in wording is that the condition to not be a newcomer any more is, by application of Demorgan's Theorem, also an AND statement:
NOT ((less than 10 edits) OR (younger than 4 days)) = (10 edits or more) AND (older than 4 days)
The user account in question has existed but been dormant since 16 October 2014, so that's why they dropped off the newcomers' list after 10 edits - their account age condition has been satisfied long ago.
Regardless, we do technically have the ability to request that the English Wikisource values of wgLearnerEdits and wgLearnerMemberSince are adjusted if we (as a site) want (to do this, open a Phab ticket and add the Sites-Requests team). But it would apply globally to everyone's RC lists at enWS. There is no way to have a personal cut-off in the existing RC page that I know of, other than something hacky using the API or some kind of Toolforge tool with DB access. Inductiveloadtalk/contribs 10:48, 4 April 2021 (UTC)
Oh, thanks for the very detailed explanation! So the problem was that the activity is not counted from the first edit, but from creating the account no matter if the creation was intentional or just automatic. So I would suggest to change it so that it counted from the first edit if it is possible, but only if (as you have said) we want it as a site, it would not make sense to change for the whole ws only because of me. --Jan Kameníček (talk) 14:25, 4 April 2021 (UTC)
Counting from the first edit should possible, but it'd be a change to the Mediawiki core code: change $registration = $this->getRegistration(); to $registration = $this->getFirstEditTimestamp(); it's actually not that concise, because the getFirstEditTimestamp() method is deprecated. I guess you could raise a Phab ticket to suggest to calculate in that way.
Changing the values of wgLearnerEdits and wgLearnerMemberSince is just a config change. I kinda feel that 10 edits is on the low side for a learner, because you can blow though 10 edits in no time in the page NS at WS and then you'll drop off the "newcomer" radar (if you don't have a brand new account) and become a "learner" (which lumps you in with users right up to the 500 edit mark). Inductiveloadtalk/contribs 15:22, 4 April 2021 (UTC)
I founded the task T279258. As for raising the level of 10 edits, I agree, but not too high, let’s say 20–30. The aim of this filter is to filter out the very new accounts where there is quite a high probability to meet a single-use vandal account. Accounts surviving 4 days and a certain number of edits without being blocked can be usually considered serious users learning to contribute. --Jan Kameníček (talk) 17:50, 4 April 2021 (UTC)

Offer of a proofread text

There is some offer to use a proofread text at Index talk:The Commentaries of the Emperor Marcus Antoninus.pdf, which was written there by the same contributor who founded the Index page. Would it be possible to employ a bot to add the proofread text to the individual pages of the book? --Jan Kameníček (talk) 17:57, 4 April 2021 (UTC)

At a glance, this doesn’t seem suitable for a merge-and-split because the pages are renumbered in a way that makes it challenging to reconstruct the original sequence. Languageseeker (talk) 18:56, 4 April 2021 (UTC)

Threshold for disabling Validation for a page

I was wondering if it makes sense to block validation for a page if the differences are too significant. For example, if there are more than 5 characters changed in the text, it would seem that it could benefit from another glance. Not as an indictment of any user, but more of a commitment to quality.Languageseeker (talk) 18:50, 4 April 2021 (UTC)

Not supported. If a template is amended or added, then it would flag under such a rule. There is no sensible way of distinguishing content from presentation in the text. Beeswaxcandle (talk) 19:11, 4 April 2021 (UTC)

Wikisource Discord server

Reminder that there is an unofficial Discord server for the English Wikisource. If you have a Discord account and would like to join and chat with other editors there, please feel free to do so: https://discord.gg/cVv8hjbF (invite is permanent). The server's been around for a number of months now and a few of us have been talking there pretty regularly, but it'd always be nice to have more members. PseudoSkull (talk) 04:24, 5 April 2021 (UTC)

The Tragedy of Romeo and Juliet (Dowden)

I tested a download of this work (PDF) and found that the Notes did not work as they should. The "Prologue" will be sufficient to test this issue. Instead of getting the two groups of notes in the download, I get error messages for the ref tag.

Is this issue the result of a failing of some kind in the work itself (that I could not find), or a problem in the conversion to PDF that cannot handle the notes? --EncycloPetey (talk) 15:40, 5 April 2021 (UTC)

@EncycloPetey: It's a known limitation. See phab:T274654. The Parsoid team are working on it, but the ETA is unpredictable. --Xover (talk) 18:13, 5 April 2021 (UTC)
Thanks for letting me know! --EncycloPetey (talk) 20:07, 5 April 2021 (UTC)

19:41, 5 April 2021 (UTC)

Phabricator ticket: Selective means to exclude (sub)pages from Special:UnconnectedPages

Special:UnconnectedPages for the Wikisources is polluted and often unuseful due to a proliferation of subpages of works that would not typically get WD items, eg. chapters of novels. I have created a phabricator ticket to ask for a means for us to tag works where the subpages and identified pages should be excluded from that special page. — billinghurst sDrewth 02:01, 6 April 2021 (UTC)

Maintenance task : Creating WD items

We need to improve our rigour into our linking created items when works are being created. We aren't that brilliant.

We should also be setting up some maintenance tasks to get onto some of the huge backlog where we have subpage of collected works (poems, short stories, ...). — billinghurst sDrewth 03:13, 6 April 2021 (UTC)

Removing br from Index:The torrent and The night before.djvu

For the current PoTM, many of the pages have a br appended to every line that needs to be removed. Is there any way to do that automatically? Languageseeker (talk) 18:54, 4 April 2021 (UTC)

Shouldn't be removed. They are a correct method of presenting poetry (in conjunction with block templates). Beeswaxcandle (talk) 19:14, 4 April 2021 (UTC)
On the proofread pages, it uses the poem tag without breaks. On the unproofread pages, it uses break. Isn't poem the correct tag here or are the proofread pages incorrect? Languageseeker (talk) 21:12, 4 April 2021 (UTC)
The poem tag is unpredictable and does not work well when applied across pages. As someone who regularly works on poetry and drama, I've switched to using br tags almost exclusively, because of the issues with using the poem tag across pages. --EncycloPetey (talk) 21:32, 4 April 2021 (UTC)
yeah - we might want to think about deprecating poem code, since we are going back to br. Slowking4Farmbrough's revenge 23:17, 6 April 2021 (UTC)
FYI: I (and @Xover:) am working on a new system: {{ppoem}}, intended to be a good replacement for poem. Specifically, it comes with hanging indents so it wraps better on small screens and uses spans for each line, as well as rendering out as a single div overall when transcluded. It's not quite ready for prime-time just yet, mostly because dropinitials are a major PITA to get right and the current approach slightly skews positioning on the page. And I do want it "right" before suggesting general use. It's not quite at the point where I'm canvassing general opinion on it (I will, but not just yet), but if anyone has smart CSS ideas, I'm all ears :-) Inductiveloadtalk/contribs 08:51, 7 April 2021 (UTC)

Rotated pages in PDF

Is there a fix for this PDF? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:30, 6 April 2021 (UTC)

Problem: Poems attributed to Poems (Botta)

We have the work Poems (Botta) which is an 1853 edition of the 1848 work by author:Anne Lynch Botta and we have the subpages special:prefixindex/Poems (Botta)/, however we have an issue that there are quite a few works claiming to be from the work that I cannot see in the work Special:WhatLinksHere/Poems (Botta) (flick to page 3) and look at search "from Poems (1848)".

I have done internal text searches and can definitely show only the one version of these works here and not in the transcluded form. Exact text and title searches sometimes can confirm works by Botta, sometimes we have the only copy of a text (searches Google, Bing, Duckduckgo). Poetry and its provenance is not my area of expertise, and hoping that some can confirm my searches and also hoping that some can provide guidance. — billinghurst sDrewth 23:50, 6 April 2021 (UTC)

@Billinghurst: These appear to have come from Index:Memoirs of Anne C. L. Botta - 1894.pdf (c.f. this bit of the TOC). Inductiveloadtalk/contribs 00:18, 7 April 2021 (UTC)
Good get, I knew that a couple were in there, and had checked the ToC, seems that some are embedded in the body with out ToC or index. <thumbsup> <sigh>. I will relocate them. — billinghurst sDrewth 08:30, 7 April 2021 (UTC)
moved to subpages of the identified work. Someone will have fun at some point in time (or not). — billinghurst sDrewth 09:41, 7 April 2021 (UTC)

Tennyson author page &c

I am looking for recommendations for a strong, comprehensive PD collection of Tennyson's works that we could bring over to WS. His Author page could use some cleaning up, in my opinion, to include an Index of Titles subpage perhaps. Currently, the majority of poems listed are unindexed (many verified against "The Complete Poetical Works of Tennyson" ed. Frederick Page (pub. Oxford University Press, 1953)). Over at IA, I am seeing The Works of Alfred Lord Tennyson in ? vols. in several editions. I notice we have the Index:The works of Alfred Lord Tennyson (1899, v 1).djvu—hardly worked on—but is that a preferred source? Thanks for any input, Londonjackbooks (talk) 08:47, 7 April 2021 (UTC)

Comment, is the Page edition out of copyright? Otherwise, that's a copyright violation. Languageseeker (talk) 20:49, 7 April 2021 (UTC)
I see ten volumes of IA Languageseeker (talk) 20:55, 7 April 2021 (UTC)

Public domain comics on DigitalComicMuseum

I just stumbled across digitalcomicmuseum.com, which has quite a few comic books whose rights weren't renewed. It might be worth checking out, especially since portal:Comics is in pretty poor shape. Mcrsftdog (talk) 20:45, 5 April 2021 (UTC)

Note: It might be a good idea to do a bulk upload of these comics to Wikimedia Commons to make doubly sure they are preserved. PseudoSkull (talk) 15:45, 6 April 2021 (UTC)
I've seen it, but comics take too much work for Wikisource and we add too little to them to really make them worth the time for me.--Prosfilaes (talk) 00:58, 7 April 2021 (UTC)
I just spent a couple of hours there, surprised at how much good stuff was available, eg Brain Bats of Venus. I agree with the other comments, and there is little benefit in adding transcript, but wonder if merely presenting the images might be desirable; I assume that is reasonably straightforward if they are at Commons. If Jack Cole had an Author page here I would be using it. CYGNIS INSIGNIS 07:00, 8 April 2021 (UTC)
I don't think the effort is worth the result in practice, but I'll still note that there is definitely some value to transcribing any text content. Much like transcribing old movies and posters etc., the transcription makes it searchable and possible to access for people with vision impairment. But I think we'd need better tooling to make the cost—benefit add up. Our current tooling doesn't make the comics format easy to work with, and won't let us present the results in a useful way without massive manual effort.
I could be persuaded that just presenting the images was a useful stop-gap, but it'd take some thought and style guidelines + supporting templates to make it more than just an image dump; and even for that I would prefer to see an active WikiProject coordinating. I don't want to see haphazard addition of a couple of comics, done up inconsistently, "just because we can". The form is specialised enough that someone needs to really care about it to really do it justice. (oh, and we'd need to adjust our formal scope to permit it, let's not forget)
But if someone were to throw resources at an VisualEditor-type editor for comics pages, and a dedicated comics viewer in MediaWiki, I would certainly be all for it. With sufficiently fancy tooling, comics would be easier to do up than books and periodicals, and we could have a sizeable library of them in relatively short order. --Xover (talk) 09:54, 8 April 2021 (UTC)
As well as searchable, the text can be easily translated by online translators, making them at least somewhat accessible to non-English speakers.--Prosfilaes (talk) 03:21, 9 April 2021 (UTC)

Maintenance Proposal: Remove PG imports and flag Incomplete Projects w/o scans for deletion

This site has many imports from Project Gutenberg from its earlier days that detract from the high quality of the scan backed works that it currently produced. Even worse, these tend to be of the most important works of the English language. I propose mass deleting them all. It’s time to replace them with scan backed versions.

In addition to Project Gutenberg imports, this site has many abandoned transcription project without scans from its early days. I propose flagging them all deletion and automatically removing them after three months if nobody has merged-or-split them to a scan, Languageseeker (talk) 12:36, 6 April 2021 (UTC)

@Languageseeker: I'm not sure I'm on board with deletion on any kind of short-in-WS-terms horizon (i.e. less than years), because with the resources we have, that's certain death for most of them, and they're not quite zero value IMO (at least not the complete ones). I would support a proposal to move {{Project Gutenberg}} from being hidden on Talk pages to being on main pages like {{incomplete}}. I'd also support the deletion of the PG version after a scan-backed alternative is transcluded and making a WS:D#Precedent entry for pro-forma nominations of them. Inductiveloadtalk/contribs 12:49, 6 April 2021 (UTC)
@Inductiveload: Do need a formal proposal for moving {{Project Gutenberg}} from being hidden on Talk pages? Also, we should probably check to make sure that the work has not been transcluded because on a quick glance, I found 3 that are flagged as PG but have been transcluded. Languageseeker (talk) 13:23, 6 April 2021 (UTC)
Addendum, there are also works such as Talk:Nightmare_Abbey that do no use the {{Project Gutenberg}}. Is there anyway to automatically add {{Project Gutenberg}}? Languageseeker (talk) 13:44, 6 April 2021 (UTC)
@Languageseeker: Category:Works possibly copied from Project Gutenberg for works that mention "Gutenberg" in the {{Textinfo}} source field. There will be lots of overlap with works that already use {{Project Gutenberg}} as well as works that are now transcluded. It'll take some time for the category to fill on the server side. Inductiveloadtalk/contribs 14:10, 6 April 2021 (UTC)
  • We don't even have a hard requirement that new works be scan-backed, so, no, deleting non-scan-backed works (including PG) is premature. Focus your efforts on proofreading high quality scan-backed versions of such works instead. --Xover (talk) 13:59, 6 April 2021 (UTC)
@Xover: Should we discuss making that a requirement? It would make sense from both a copyright and long term perspective. Languageseeker (talk) 19:14, 6 April 2021 (UTC)
The issue is that there is no way to verify the accuracy of these texts. Since Wikisource don't keep up with PG, the corrections of PG remain unimported. To me it seems that the PG text detract from the quality of scan-backed text. They also deincentivize the creation of scan backed replacements. Wikisource's strongest selling point is scanned backed transcriptions and PG texts undermine that. Languageseeker (talk) 02:29, 7 April 2021 (UTC)
the issue is we have a consensus to migrate non-scanned backed to scan-backed, and we have perpetual proposals to change that consensus without success. low quality work does not detract from high quality work. increasing the scrap rate does not improve the process. if you want to top-down dictate the process of transcription go over to German wikisource. that's how they act over there, and we are leaving them in the dust. we are increasing proofread pages at a higher rate. make a maintenance category of non-scanned back and we can work the backlog. delete the pages and we cannot work the backlog. Slowking4Farmbrough's revenge 00:40, 10 April 2021 (UTC)
@Slowking4: Yes, I heard the community and I accept that nobody is on board with a mass deletion. Would you be ok with a proposal that said no new PG imports without migration to original scan? This way the backlog can be cleared slowly over time? Languageseeker (talk) 01:10, 10 April 2021 (UTC)
We're here in part as a text archive for Wikipedia. Deleting all these works that Wikipedia links to hurts that. Gutting a large set of core works is going to offer theoretical gains at a huge practical cost, and could possibly even delay the creation of scan baked replacements by making us less visible and less enticing to new users.--Prosfilaes (talk) 20:11, 7 April 2021 (UTC)
Agreed. We should be seeking quality scans of good provenance so that we can convert such texts into scan-backed editions. There is a lot of work being done right now to accomplish this for the plays of Shakespeare. We have people doing the same for notable novels and works of poetry. The best way to reduce the number of PG texts and works not backed by a scan is to find a high-quality scan, set up the scan, add full supporting info for the scan at Commons and Wikidata, set up and start the Index page, and link the Index page from with the Author page and the current text. --EncycloPetey (talk) 01:31, 8 April 2021 (UTC)

Based on the feedback, it seems that the community opposes a mass deletion. What about this version

  1. Allow for the sdelete of any none scanned-backed text when a scanned-backed version exists.
  2. Allow for the sdelete of any none scanned-backed text that is incomplete and not potentially useful for a merge-and-split, e.g. a later reprint of an earlier edition.
  3. Disallowing the creation of new project that are not scanned backed.

Languageseeker (talk) 20:31, 7 April 2021 (UTC)

I don't understand your proposal. Why would we ever delete scan-backed works? --EncycloPetey (talk) 01:27, 8 April 2021 (UTC)
_@EncycloPetey: Whoops, some major Errata. I meant non-scanned backed. Languageseeker (talk) 01:29, 8 April 2021 (UTC)
I can agree with number 1 provided that the non-scan-backed text is not another important edition. For some works, there are multiple valuable editions, even multiple valuable first editions. Editions of Shakespeare plays, even the early ones, can have huge differences, so deleting one because another edition exists is insufficient. Likewise, many classical works exist in multiple translations, and even one person's translation can have more than one important edition. I cannot agree with number 3, since there are texts where a scan is not feasible, but the text could be created. We have had hand-made transcriptions of historical documents, or hard-to-find public domain documents that were published within another work that is under copyright, so the work cannot be scanned. --EncycloPetey (talk) 01:39, 8 April 2021 (UTC)

New texts

What are the criteria for adding works to {{New texts}}, and thus to the main page? I added Boy Scouts and What They Do yesterday, but another editor has removed it, without asking me first. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:29, 8 April 2021 (UTC)

discussion prior to temporarily removing text CYGNIS INSIGNIS 16:09, 8 April 2021 (UTC)
In which you mentioned removing the work from the template (let me check...) zero times, before doing so. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:08, 8 April 2021 (UTC)
It has always been the case that anyone with a concern about a work should remove it so that the issues can be raised and addressed. When addressed then please feel welcome to relist it. — billinghurst sDrewth 14:12, 10 April 2021 (UTC)
Are you able to answer the question: "What are the criteria for adding works to {{New texts}}"? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:28, 10 April 2021 (UTC)
The usual minimum criteria (as I have listed them recently elsewhere):
(1) The work must be fully Proofread (in the Wikisource sense) with no pages unproofread or problematic and all images and major formatting included in this
(2) The work must be fully and correctly transcluded, so that the entire work is available in a correct and logical sequence
(3) A suitable license template must be placed on the work's primary Mainspace page.
It's also nice if the Author page exists (so we don't have redlinks on the Main page), and if categories have been added (so that it can be found through category searching). If you look at Boy Scouts and What They Do, it has not been fully proofread. --EncycloPetey (talk) 01:33, 12 April 2021 (UTC)

Help Cleaning up Author:Charles John Huffam Dickens

Poor Dickens has got a terrible author's page. Most of the editions there are from posthumous reprints that have little value and in general it's a bit of a mess. Could someone help me clean up this page. We'd probably want to create subpages for each one of his works with the periodical versions, published edition, cheap edition, 1858 Library edition, and the 1867 Complete Works of Charles Dickens. Anything after that is just reprints until the Clarendon editions that are in copyright. Charles_Dickens_bibliography is a help, but that could also use expanding. Languageseeker (talk) 05:26, 8 April 2021 (UTC)

We would not want to create Author: subpages for each work. Versions pages in Mainspace for those who search by name of work and a simple indented list on the Author: page for those who come in by searching for Dickens is sufficient. Beeswaxcandle (talk) 06:04, 8 April 2021 (UTC)
@Beeswaxcandle: Are you sure? Oliver Twist has 12 authors authoritative editions alone. Languageseeker (talk) 20:46, 8 April 2021 (UTC)
Yes, I'm sure. Author: subpages for a single work is not the intention of such pages. Beeswaxcandle (talk) 05:11, 9 April 2021 (UTC)
@Beeswaxcandle: So, is something like Wuthering Heights a mistake? Languageseeker (talk) 05:16, 9 April 2021 (UTC)
I think (?) this is just a simple misunderstanding based on the way each of you is using the jargon. If I'm understanding correctly, @Languageseeker: means that we should pages for each work, to list the various editions of the work. If so, that seems sensible and common, and Wuthering Heights looks like as good an example as any of what's common practice around here. But we wouldn't call that a subpage -- I think what @Beeswaxcandle: was understanding you to mean was something of the form of [[Author:Charles John Huffam Dickens/Hard Times]], which would certainly be an anomaly. -Pete (talk) 05:30, 9 April 2021 (UTC)
Yes, I think that you’re exactly right. I want make Hard Times the page that list all the versions of the work instead of containing the text of one specific edition. Languageseeker (talk) 05:53, 9 April 2021 (UTC)
Then, why did you call it a "subpage"? You mean a versions page. Once we are hosting multiple versions (editions) of Hard Times, then we'll need a versions page. Until then, the single version can stay where it is and the list of other versions/editions should be on the Author: page. Generally, we prefer to avoid redlinks on the three types of disambiguation pages (disambiguation, versions, translations). [Knowing as I say this, that there are redlinks on some versions pages, but these do not set precedent.] Beeswaxcandle (talk) 06:49, 9 April 2021 (UTC)
Sorry for the poor terminology. Probably the result of a sleep addled brain. I'm still a little concerned about posting the lists on the Author Page because it's already starting to look like a mess because of the large number of versions that Dickens contributed to and the need to distinguish between those and other versions. Languageseeker (talk) 14:11, 9 April 2021 (UTC)
Uh, since when is using the incorrect terminology a good topic for public discussion? We're all learning here. We got past the misunderstanding, maybe we can look forward, not litigate minor irritations.
It seems to me that in the case of Oliver Twist, the challenge is that a specific edition is occupying the title that would be used for a versions page. So a page move would be required, which given all the subpages might be something to approach with caution. That's what I see Languageseeker doing here; in another section they have asked if the title could be changed, but they've gotten no response. I think if we can just establish what the best practice is for that kind of move, the issue would be resolved. (And, I'm happy to help out with this in the coming weeks, you're right, the page is not very useful in its current state.) -Pete (talk) 16:59, 9 April 2021 (UTC)
We don't disambiguate works until there is the need. When there is the need then we do it. There being multiple editions is not a need, use the author page, though for Dicken's works I would think that there are dozens of editions of many of his works, and not a lot of value in simply edition listing unless there is true value. — billinghurst sDrewth 14:20, 10 April 2021 (UTC)
I'm not listing every edition that has ever been printed. Only the editions that Charles Dickens directly contributed to as identified in the Clarendon Dickens. The names that I give to the editions are the standard one used to identify them. The page needs works and I'm trying to fix it up. As an example, Great Expectations is not even a version done by Dickens. Others are missing the original illustrations. Languageseeker (talk) 14:26, 10 April 2021 (UTC)
We recently went through this process for the plays of William Shakespeare. Yes, there were existing editions at each title, but the pagenames were needed for disambiguation pages. The actual examples would read like spaghetti, so I'm going to describe a theoretical situation that's a bit simpler. Let's say there was a copy of the play Much Ado About Nothing at that title, but we need that title for the disambiguation. The play text was moved to Shakespeare - First Folio facsimile (1910)/Much adoe about Nothing, a specific edition location. The versions of Shakespeare's play are listed at Much Ado About Nothing (Shakespeare), and works with the title Much Ado About Nothing are dismbiguated from each other at the main title.
All this points out something that isn't mentioned in the discussion above: There might be other works titled Oliver Twist, such as book reviews, encyclopedia articles, literary articles, dictionary entries, etc. For the plays of Shakespeare, there are retellings by Charles and Mary Lamb, and summaries of the plays by Hazlitt, and articles in the Encyclopedia Americana. The works of Charles Dickens are of a similar stature in the English language, so I urge checking around to see whether we need both disambiguation pages like Oliver Twist and versions pages like Oliver Twist (Dickens) before undertaking such a monumental revision. For Shakespeare we had the added headache that some of his plays, such as Julius Caesar and King Richard II were also the names of authors. So, I urge anyone planning to undertake revision of the works of Mr. Dickens to put in some planning first, or you may find that everything has to be changed twice.
That said, the works of Dickens are long overdue for cleanup, and I heartily welcome such cleanup. --EncycloPetey (talk) 23:37, 12 April 2021 (UTC)

Template: Periodical header, not-actually-orphaned pages, etc.

The {{header periodical}} template seems to link pages in a way that is not detected by the "what links here" function. This, in turn, has resulted in users tagging the articles as "orphaned" when they are not. An example is Oregonian/1915/February/28/Miss Hobbs' place goes to C. Abrams, which is linked from Oregonian, but I've seen a number of others recently. Ping CalendulaAsteraceae

Relatedly, I'm a little unsure about the proper use of {{Header periodical}} in general, and its docs do not go into much detail about how it is to be used. My perception is that it's trying to be two different things, but maybe I'm missing something. On the one hand, it seems like a sort of stopgap, intended to deal with periodicals where there has been no effort to date to manually index the Wikisource pages related to it; it auto-generates a list of subpages. On the other hand, it creates a header that is specific to periodicals, which seems like a nice alternative in these cases to the header generated by {{header}}, even for periodicals where the manual curation has been done; but if it's used in such a case, it results in redundant entries. How, for instance, should Oregonian be adjusted to best fit our standards? -Pete (talk) 15:10, 11 April 2021 (UTC)

Can we please unconflate the argument between unlinked and the template. Pages should 'not be unlinked, and that periodical header does not do it is probably misrepresenting its purpose, or is missing the point.
  • If one is working on a periodical systematically then you should be having a proper header and be curating the root page. The subpages get linked. Typically periodicals do have a ToC and a flow.
  • We had cases of ad hoc articles of newspapers which were transcribed for a purpose, eg. linked to from author pages—as author, or about author—or linked from other articles, or from portals. And newspapers don't have a ToC and don't flow. We also had people creating pages at root that should/could be at subpage level.
So we created the stop-gap/fallback of the template, and we kept the name generic matching other language we used. So yes we have a template there to assist, it is not meant to exclude the work of linking to pages.

So if you are working systematically on the Oregonian, then please curate it and that would include article linking. If you are working on specific article for specific purposes then please provide suitable article linking. — billinghurst sDrewth 23:03, 11 April 2021 (UTC)

Ah, I understand now, thank you. I had indeed missed the point, my mind-reading skills are not very advanced :) I've added this info to the template docs. -Pete (talk) 02:00, 13 April 2021 (UTC)

Chapter Navigation at bottom of the page

I tried reading on my phone today and I noticed that there is no way no way to navigate to the next chapter when a user reaches the bottom of a page. Instead, a user needs to scroll back up. Would it be to add automatically add a navigation footer when the header exists? Languageseeker (talk) 17:00, 12 April 2021 (UTC)

Can you reword your question? —Justin (koavf)TCM 21:16, 12 April 2021 (UTC)
@Languageseeker: Done It should be working now. Hopefully the gadget cache has cycled by now. Thanks for the heads up. Inductiveloadtalk/contribs 21:27, 12 April 2021 (UTC)
To phrase that a little differently: We noticed this problem a few weeks ago, and discovered it had been caused by a template update. The problem was corrected, but it will take time for the newly made template change to propagate through every work on the site. --EncycloPetey (talk) 23:07, 12 April 2021 (UTC)
Thanks! Languageseeker (talk) 23:15, 12 April 2021 (UTC)

Index:Works of Thomas Carlyle - Volume 17.djvu

Can the two missing pages from this index be inserted from [3] or the entire source replaced? No idea how to make DJVUs. Languageseeker (talk) 13:10, 23 April 2021 (UTC)

The link provided is to volume 19. Perhaps some research to find a nicely scanned set. CYGNIS INSIGNIS 13:43, 23 April 2021 (UTC)
Oops, link updated. Languageseeker (talk) 13:47, 23 April 2021 (UTC)
@Cygnis insignis: Are you saying that it would make sense to replace all 30 volumes with the full-color ones? Languageseeker (talk) 13:49, 23 April 2021 (UTC)
I would suggest replacing them with scans from the NY Public Library or IA collection:University of Toronto, texts in monochrome (eg. Google, and to a lesser degree Cornell) should be viewed with caution due to other compression and poor text layer. The effort in these works is transcribing, better to get the scans right for 30 volumes. I looked at this when setting up some Carlyle to transcribe, I don't remember what swayed me to the scans I ended up finishing. CYGNIS INSIGNIS 14:15, 23 April 2021 (UTC)
Agreed. I didn't create this collection, so I didn't pick the scans. But, I'm happy to create a list of volumes for replacement if someone will commit to creating the DJVUs and replacing the existing ones. Languageseeker (talk) 14:27, 23 April 2021 (UTC)
Check with the creator, it was @Ratte: if I recall, perhaps you could replace the files at commons (assuming that there is no work on the indices since yesterday). CYGNIS INSIGNIS 14:33, 23 April 2021 (UTC)
Done by Xover. Ratte (talk) 16:04, 26 April 2021 (UTC)
This section was archived on a request by: — billinghurst sDrewth 12:48, 10 May 2021 (UTC)

Google often unable to find works at Wikisource

I have noticed many times that Google is not able to find works hosted in English Wikisource. E. g. today I tried searching for "zawis and kunigunde" wikisource and the result was that Google found a subpage of my userpage where I just mentioned the work, a talk page where I asked for something connected with the work, it found even a Wikidata item connected with the work, but did not find the work itself. Is there anything we could do to make works better discoverable? --Jan Kameníček (talk) 12:43, 3 April 2021 (UTC)

@Jan.Kamenicek: Google finds that as the first result for both "zawis and kunigunde" wikisource and, for that matter, just "zawis and kunigunde" (both on the mobile website, oddly) as the first result. Since this is a recent page, it might just be that the Google crawler takes time to discover the page and add it to the index. The mobile website hit might be just that the mobile website has been indexed, but the main website hasn't yet. Inductiveloadtalk/contribs 13:38, 3 April 2021 (UTC)
It's now returning the main website (not mobile) page, so I guess the spider got there, and some magical algorithm decided (correctly) that main website was a better result. Inductiveloadtalk/contribs 19:22, 3 April 2021 (UTC)
if we linked at wikipedia, and wikidata, then works might be more findable. Slowking4Farmbrough's revenge 01:14, 4 April 2021 (UTC)
FWIW DuckDuckGo and Bing both found the pages without issue. — billinghurst sDrewth 06:26, 6 April 2021 (UTC)

@Whatamidoing (WMF): Do you have anybody who has an "in" with Google who could explain why this is happening? Or someone who can assist us with what is reasonably missing in our metadata? — billinghurst sDrewth 06:23, 6 April 2021 (UTC)

I don't. The last time I heard of anyone working on SEO stuff, it was @Deskana (who reclaimed his volunteer status a couple of years ago, so it's been a long time). Let me ask around. I'll let you know if I learn anything. Whatamidoing (WMF) (talk) 19:11, 6 April 2021 (UTC)

@Whatamidoing (WMF), @Billinghurst: Google populates its search engine a variety of ways, most of which are Google secrets. The most likely explanation for why it took a little bit to show up in Google is that Google doesn't crawl Wikisource as much as it crawls sites like Wikipedia, because Wikisource doesn't get as much traffic as Wikipedia so it's not as critical if it's slightly out of date. The page was created on 31st March, so it's really not surprising that it took a few days for it to be picked up. The fact it picked up other pages first is also not surprising, as Google doesn't necessarily crawl websites linearly. It's now first in the results for me if I search for "zawis and kunigunde", which is excellent. All the metadata in the page HTML looks good and I don't really think there's much you could do to improve it. The most important thing is the link to the item Wikidata in the schema.org format in the HTML, which I can see is in there (search the page source HTML for "sameAs" and you'll see it). In fact, that good metadata is probably why Google switched over the link to the desktop version from the mobile version, as the "canonical" URL for the page is given as the desktop version. I don't think there's really anything to do to improve things, things are already pretty great, and it'll just take a few days to pick things up sometimes. --Deskana (talk) 10:05, 7 April 2021 (UTC)

@Deskana: thanks so much for that fulsome explanation, it helps, and this bit
The most important thing is the link to the item Wikidata in the schema.org format in the HTML, which I can see is in there (search the page source HTML for "sameAs" and you'll see it). In fact, that good metadata is probably why Google switched over the link to the desktop version from the mobile version, as the "canonical" URL for the page is given as the desktop version.
indicates that we need to rouse up our transcribers to do a better job of adding decent Wikidata. We have not been rigorous in getting all users to do it, though it is a tricky beast which Wikidata does not particularly assist. Some of the bot operators create shell items, which may not be particularly better. — billinghurst sDrewth 11:11, 7 April 2021 (UTC)
I talked to a couple of folks, but they didn't have anything else to add. These are good ideas, and so is patiently waiting for Google to take notice of the page's existence. Whatamidoing (WMF) (talk) 22:56, 15 April 2021 (UTC)
This section was archived on a request by: — billinghurst sDrewth 12:50, 10 May 2021 (UTC)

Help with Splitting Pages

Could somebody help me split the pages in commons:File:Paradise Lost 1674.djvu Languageseeker (talk) 00:14, 8 April 2021 (UTC)

Nothing at that link, and I would want to know a lot more about what is being proposed prior to acting. We stopped a lot of splitting and matching years ago due to issues with edition matching issues. — billinghurst sDrewth 14:14, 10 April 2021 (UTC)
@Billinghurst: Whoops. Link fixed. The scan does not have the pages split in half, i.e. page 1 and 2 are in the same image. Languageseeker (talk) 15:51, 10 April 2021 (UTC)
Why do you want to split them? Proofread them as they are. Just page number with on the odds, or evens. No different essential from doing a work with columns. Done numbers of works like that, works with no issue, just a little work on the page numbering, and there are plenty of meansto do that. — billinghurst sDrewth 16:22, 10 April 2021 (UTC)
@Billinghurst: Good to know. thanks. Didn’t want to create a mess.Languageseeker (talk) 05:37, 12 April 2021 (UTC)
This section was archived on a request by: — billinghurst sDrewth 12:50, 10 May 2021 (UTC)

FYI: Author:William Shakespeare to be disambiguated

FYI, due to the presence of multiple author pages for individuals named "William Shakespeare", it is necessary to move Author:William Shakespeare to Author:William Shakespeare (1564-1616) to make room for a disambiguation page. This process has already begun, and I am updating links to the page. Due to the sheer enormity of the pages that link to that page, I am going to make a few passes with AWB before requesting assistance.

Note: this is a routine operation in accordance with our policies and established practice. This is not the thread in which to discuss the pros and cons of our existing disambiguation practices, or whether we should make an exception due to Shakespeare's importance. If you feel that a change is worth proposing, I encourage you to submit a proposal above, under WS:S#Proposals. —Beleg Tâl (talk) 00:28, 11 April 2021 (UTC)

Ouch. Did you also check enWP? We probably need to look to regular maintenance on the disambiguation page as typically no one is going to check. — billinghurst sDrewth 00:05, 12 April 2021 (UTC)
@Billinghurst: thanks for reminding me about enWP, I'll have a look. Speaking of enWP, they have a bot that checks for links to DAB pages, which might be a good idea for us -- this is far from the only DAB page in which one item is notable and the others are nobodies. —Beleg Tâl (talk) 01:22, 13 April 2021 (UTC)
Looks like the worst offenders are case disambiguation pagesBeleg Tâl (talk) 03:13, 13 April 2021 (UTC)
If you are expecting there to be pros and cons, and a possible exception, then why did you start the process before making an announcement or seeking feedback? --EncycloPetey (talk) 01:19, 12 April 2021 (UTC)
@EncycloPetey: As you may have noticed, every time a notable work or author gets disambiguated, there is always a chorus of editors saying "this is notable, it should be an exception", after which we kindly remind them that no, we do not make exceptions based on notability, this is the way things are done around here. I was hoping to save them the bother but it seems I made it worse :S —Beleg Tâl (talk) 01:20, 13 April 2021 (UTC)
The issue isn't Shakespeare's importance (although that factor might also be relevant to a discussion of how to handle it); it's the more practical issue that in the world at large there are a massive (massive!) number of works by Shakespeare (due to all the editions) and works about Shakespeare and works that refer to Shakespeare; and the same applies for works that are in scope for Wikisource; and the disproportion only gets greater for works actually currently on Wikisource. That means that we currently have a massive number of links to his author page, both actual and potenmtial, and are adding links to his author page at a far higher rate than other authors, and we will keep adding links to [[Author:William Shakespeare]] for the foreseeable future.
Meanwhile, John William Thomas Shakespeare has all of one published work that is in scope, and a small handful of references in other works (mostly biographical dictionaries); William Goodman Shakespeare has a small handful of works and slightly more references ditto; and w:William Shakespeare (inventor) (the only actually ambiguous name) wrote some patents and while they might theoretically have an author page here eventually, I very much doubt that will happen in our lifetime. In other words, from a practical perspective, moving Shakespeare to a disambiguated name and putting a dab at "William Shakespeare" makes about as much sense as preemptively disambiguating all author pages: it's theoretically pure but a practical mess.
All of which is to say, I agree with EP that this would have better been discussed before starting a huge AWB run. It's possible the outcome would have been the same, but it is by no means a given. --Xover (talk) 06:17, 12 April 2021 (UTC)
@Xover: As I said on my talk page, I really do understand this point of view, but we have had this discussion so many times and the result has always been that these points don't matter. Hence why I said, this notification of a significant but routine operation is not the place for this debate to resurface, this is a matter for a proposal to change our existing practices. —Beleg Tâl (talk) 01:25, 13 April 2021 (UTC)
@Xover: It has always been my habit, when addressing a discussion in which a matter seems to be in a gray area, to add content in order to move it into clear black and white. You will have seen me add scans to works nominated for deletion, to add works to authors nominated for deletion, etc. So even though I suspect that this might be taken the wrong way, please know that this is the only reason why I have created United States patent 446529 by Author:William Shakespeare (1869-1950) to make this discussion more concrete. —Beleg Tâl (talk) 01:39, 13 April 2021 (UTC)
@Beleg Tâl: You're entirely correct that this action does not come across in the way you here characterise the intent to be. Doubling down on unilateral action in the face of pushback from multiple members of the community does not tend to strengthen that impression either. In fact, using information I gave you, that you were previously unaware of, to artificially create a situation where you "win" the discussion does not particularly help convey an impression of respect for other points of view. You may also wish to consider that when you force a situation to be "clear black and white" you also preclude the possibility of nuance and compromise (i.e. it is effectively a scorched earth tactic). But, hey, congratulations on your win… --Xover (talk) 06:32, 13 April 2021 (UTC)

Style issue across the site

I have seen several instances of hyphens that are used in places where en dashes should be. Does the community here think that we should continue using hyphens for purposes other than hyphenation or should we replace them with endashes for things like date ranges or separating asides in running text, etc.? (I am making an exception here for disambiguation of author pages as hyphens are easier to type: please ignore this particular instance of the misuse of hyphens in favor of endashes, as I think it will just complicate the discussion. Note also that I am not suggesting changing the punctuation of an original work as it was published, just for documentation, etc.) —Justin (koavf)TCM 02:09, 12 April 2021 (UTC)

You need to be a bit more specific about where you think we ought to be using (or not using) hyphens. The thread on my Talk page concerned its use in Portal Headers, which concerns neither date ranges nor "separating asides in running text". --EncycloPetey (talk) 03:09, 12 April 2021 (UTC)
I am referring to any instances where a hyphen is used and it isn't performing the function of hyphenation: joining two surnames of a person, breaking up a word over a line wrap or for showing syllable stress, or showing how certain parts of words are prefixes or suffixes (e.g. "The prefix Sino- refers to things from China...") Rather than list every way that hyphens are misused (e.g. in date ranges), it's easier to list the three times that they should be used. —Justin (koavf)TCM 03:37, 12 April 2021 (UTC)
I think it will be quite difficult to push this through so generally. Current practice is that dashes are not used in page titles, although I personally would allow it. I am quite hesitant about replacing hyphens by dashes in the main namespace in cases when the hyphens were (though incorrectly) used in the original work. Otherwise I support using hyphens and dashes in the way described e. g. at Wikipedia:Hyphens and dashes. --Jan Kameníček (talk) 17:13, 12 April 2021 (UTC)
Thanks. Agreed that this is a good rough guide. Note again that "I am not suggesting changing the punctuation of an original work as it was published, just for documentation, etc." —Justin (koavf)TCM 21:17, 12 April 2021 (UTC)
"Documentation, etc." is a very vague description of where you think hyphen-policing should occur. I also point out that, just because a hyphen is used incorrectly, it does not follow that an en-dash should be used instead. That is a false dichotomy. --EncycloPetey (talk) 23:14, 12 April 2021 (UTC)
I mean all pages other than the transcription of works themselves, which may have different or otherwise inappropriate typography. I propose that we use the guidelines that Jan just appealed to but in which cases would there be an inappropriate hyphen usage that wouldn't be replaced with an en dash? —Justin (koavf)TCM 03:37, 13 April 2021 (UTC)
I'm not sure what you're asking for here. Are you a) asking for permission to fix punctuation on Help and other such documentation pages? b) requesting a global replace on such pages? c) an amendment to the style guide? d) a new policy page on punctuation? e) something else? Beeswaxcandle (talk) 05:31, 13 April 2021 (UTC)

Adding Disclaimer for all US-Gov docs that are not scan backed.

I think it’s important to add a disclaimer to all US-Gov that are not scan back that the text may be inaccurate. Such a disclaimer is present on all US-Gov website for non-pdf files. Also, all pages that link to White House.gov need an access date because the site is not durably archived. Languageseeker (talk) 15:16, 13 April 2021 (UTC)

Is this not covered by Wikisource:General_disclaimer#Accuracy? Inductiveloadtalk/contribs 15:27, 13 April 2021 (UTC)
To me, the disclaimer seems aimed at saying that Wikisource users might make a mistake, but, in this case, the Government Printing Office explicitly states that these texts are not guaranteed to be accurate. So, this should GPO disclaimer should be part of the template. Also, all the non-pdf versions of Congress publications are incomplete at the source. Languageseeker (talk) 15:36, 13 April 2021 (UTC)
Don't see the necessity. We are working from editions, and we are accurate to the published edition, so if anything needs saying it belongs in the general disclaimer for all works and the sources. We could weave into the text ... "//that we have editions based on a date of publication and they are believed to be true to the date retrieved, and not reflect later updates to the source. Errors in the source at the date of retrieval will generally be reflected in the edition at Wikisource.//" — billinghurst sDrewth 13:37, 14 April 2021 (UTC)
The issue is that the Congressional Documents are not being uploaded from their published sources, but the rough transcription provided by the GPO. I'm not against importing governmental documents, but they require more care than his. I think that there's a need for standards to make sure that the way they're being added does not create more work for the future. At minimum, this would require the importation of the GPO authenticated pdfs and standard templates for things such as line numbering, sponsors, etc. Languageseeker (talk) 15:07, 14 April 2021 (UTC)
Doesn't change what I said about we proofread to the source, cite the source. Also for what you are explaining in no way would justify a change of wording to the licence. The appropriate place to add a note is the talk page of the work in a {{textinfo}} box — billinghurst sDrewth 12:51, 10 May 2021 (UTC)
This section was archived on a request by: — billinghurst sDrewth 12:51, 10 May 2021 (UTC)

Can our OCR software be tweaked?

The latest version of our OCR does an excellent job, except it interprets a double quote as two single quotes. Can this be remedied?— Ineuw (talk) 20:38, 14 April 2021 (UTC)

@Ineuw: I'm not sure of the state of play of the OCR project. Probably The Phabricator project is the place to ask. Otherwise, in the meantime, a double single quote is probably fairly easy to hit with a regex replacement script. Inductiveloadtalk/contribs 20:58, 14 April 2021 (UTC)
Thanks for the info. In any case they must also be checked individually, and evaluated as italics, single, or double quotes. But then, this is not new with OCR, but now the scan is about 90% wrong. I wasn't sure if this was a local undertaking. I will post it on Phabricator.— Ineuw (talk) 21:13, 14 April 2021 (UTC)
I'm not sure what exactly is happening with the OCR, i.e. if they've changed anything with the normal OCR tooling yet. It's possible something has changed, or maybe just the OCR gremlins are around today. Remember than the normal OCR tool doesn't actually do any OCR if the file has a text layer, so it might be embedded in your file. If it's something that trips us the normal OCR, the Google OCR might work better, since that genuinely does go an OCR the image? Inductiveloadtalk/contribs 21:24, 14 April 2021 (UTC)
I had both on the toolbar, but currently our version is better. Google OCR doesn't insert an empty row between paragraphs and merges two columns text line by line instead of adding the right column below the left. So, this is a lesser issue.— Ineuw (talk) 21:35, 14 April 2021 (UTC)
Now that e-book exporter has been unbroken, they say they are turning to OCR. you can also leave comments on meta. meta:Talk:Community Tech/OCR Improvements. Slowking4Farmbrough's revenge 21:47, 14 April 2021 (UTC)
@Slowking4: Much thanks for the link.— Ineuw (talk) 00:23, 15 April 2021 (UTC)

Can our OCR software be tweaked? REVISITED

I posted an issue at Meta with which I am struggling with. I just want to bring it to the attention of the community because we have experts here who may not be participating in the meta discussions.— Ineuw (talk) 07:46, 20 April 2021 (UTC)

@Ineuw: Judging by a superficial look at the issue, this is a problem with the local JavaScript here on enWS that's showing up now due to changes in MediaWiki. The way the script translates the data it is getting back from the OCR server into usable text in the text field is … not optimal, and may need to be rewritten. I'll try to take a closer look when time allows. --Xover (talk) 12:35, 20 April 2021 (UTC)
@Xover: Thanks for looking into it. I can't understand why is the problem intermittent. Could we not copy a better working script from another Wiki? — Ineuw (talk) 13:36, 20 April 2021 (UTC)
@Ineuw: This is the same script that has been copied between wikis. It hasn't really been actively maintained since 2015 (and the same goes for the server backend), so it has a lot of half-a-decade old technical assumptions. That it's starting to fail as web browsers and MediaWiki continue to change is not really unexpected.
Meanwhile, in debugging this it would be useful to know which pages you were trying it on when it was very slow and a couple of pages where the performance was more or less normal. I've found some obvious problems, which is what I was referring to above, but on reflection I'm not at all certain they would explain the symptoms you are describing.
BTW, how certain are you that it's something to do with the OCR tool and not something local to your computer? The reason I ask is because on meta you say the OCR text suddenly showed up after 5 minutes; but after 5 minutes all the network requests etc. would have long since timed out, so if the server was that slow you should have gotten nothing or an error message. If it was a local problem (slow computer, web browser hanging, that kind of thing) on the other hand, you might see effects like that. It'd be odd if that only affected the OCR button, so unless you're seeing similar problems with local applications or other web pages (or the Google OCR button for that matter) that probably isn't it. --Xover (talk) 14:41, 20 April 2021 (UTC)
Thanks for the reminder to check my browsers' setups again.— Ineuw (talk) 15:15, 20 April 2021 (UTC)
This has nothing to do with the browsers, or the OS. I tried Firefox and Vivaldi in Windows and Linux, and the OCR script behaviour is the same.— Ineuw (talk) 05:17, 21 April 2021 (UTC)
This section was archived on a request by: — billinghurst sDrewth 12:52, 10 May 2021 (UTC)

DMCA for Mind, Character and Personality

Hello Wikisource - In compliance with the provisions of the US Digital Millennium Copyright Act (DMCA), and at the instruction of the Wikimedia Foundation's legal counsel, one or more pages have been deleted from Wikisource. Please note that this is an official action of the WMF office which should not be undone. If you have valid grounds for a counter-claim under the DMCA, please contact me. The takedown can be read here.

Affected file(s) are Mind, Character and Personality and all subpages, which are:

Subpages deleted

Thank you! JSutherland (WMF) (talk) 22:30, 14 April 2021 (UTC)

This act is something really unprecedented. Wikisource community has always paid thorough attention to copyright issues and never tolerated any copyvios, and a WMF employee with zero contributions here does not consider it necessary to discuss deleting our content with us and simply appears and deletes what he wants to delete, ignoring the community. Really unbelievable how the communities are valued by the WMF…
Could @JSutherland (WMF): at least additionally explain the grounds of the deletion in more detail? The provided link does not go anywhere. --Jan Kameníček (talk) 23:19, 14 April 2021 (UTC)
Link should go to wmf:Legal:DMCA_Mind_Character_and_Personality. That's the WMF's job; they protect the wikis from the potential consequences of copyright infringement by following the DMCA, which requires them to take down works when they receive a properly filed DMCA notice. I'm actually surprised at how lenient they are, instead of just mechanically taking down DMCAed works without concern.
In this case, it seems that Mind, Character and Personality is a 1977 work, an edited compilation that gets a new copyright.--Prosfilaes (talk) 23:26, 14 April 2021 (UTC)
If the things are as Prosfilaes has described, the community would not refuse to delete the work. Copyrighted 1977 works are habitually deleted here. But we do not deserve to be ignored. --Jan Kameníček (talk) 23:31, 14 April 2021 (UTC)
Legally speaking, I feel the hands-on approach the WMF gives us for these deletions is excessive; they'd be better off legally, as I understand it, deleting on DMCA and refusing to discuss with it us. They're bound by the law here, and I don't find it a problem.--Prosfilaes (talk) 02:56, 15 April 2021 (UTC)
Apologies, for the broken link - our tool is designed for Commons-based deletions since, as you say, these are extremely rare on Wikisource. (Also I made a typo.) Unfortunately the DMCA Policy doesn't really allow for much wiggle room here. If however you do have valid grounds for a counter-claim under the DMCA (more info here), please do let me know. If it'd be helpful, I can ask an attorney to comment here, though I am not sure they would be able to speak much to individual cases. Thanks, JSutherland (WMF) (talk) 00:06, 15 April 2021 (UTC)
I'm not entirely surprised because many users have trouble understanding that a particular edition of an out-of-copyright work can in copyright. The WMF has a duty to protect itself from lawsuit. This is one of the problems with non-scan backed works that this community is going to have to accept. Languageseeker (talk) 00:15, 15 April 2021 (UTC)
I have started a copyright discussion here. PseudoSkull (talk) 01:09, 15 April 2021 (UTC)
Of more immediate concern is to determine the status of the other works uploaded by User:Rcrowley7 at around that time. This one did not have sourcing or licensing information provided, and a quick glance at a couple of the others shows them to be in the same position. Education (White) has a source given on the Talk page, but it leads to website that claims copyright and I suspect this site is the source for most of the other of White's work. There is also the side-question of the works' non-compliance with our formatting guidance. Beeswaxcandle (talk) 07:21, 15 April 2021 (UTC)
Agree, the foreword is signed by "The trustees of the Ellen G. White Publications" which suggests that this edition was published posthumously. --Jan Kameníček (talk) 08:44, 15 April 2021 (UTC)
@JSutherland (WMF): Does the information that you are not sure "they would be able to speak to individual cases" mean that they in fact do not know anything about ít because they did not really check the copyright status of this particular work? I believe that the work could be a copyvio, but why did you not tell us which steps were made to check it and with what result? Why did you delete it without letting us know and instead of giving us proper information so that we could use our processes to get rid of a copyvio? --Jan Kameníček (talk) 08:30, 15 April 2021 (UTC)
@Jan.Kamenicek: because that is the process. WMF gets a DCMA notification and makes a determination per wmf:The Wikimedia Foundation Digital Millennium Copyright Act (DMCA) Policy and acts. We have the ability to challenge. In this case it would be reasonably obvious that we have not much of leg to standupon. You would need to find an archive version of that work that shows with something that puts it into the public domain. — billinghurst sDrewth 16:08, 15 April 2021 (UTC)
time for a road trip to university of alabama. this copyrighting of the public domain needs to be opposed. it's bad enough when the nephews rent seek, but the great-grand-children? Slowking4Farmbrough's revenge 11:22, 16 April 2021 (UTC)
They have put the work online, and it is available to be read. How about you change the link on the author page so it points to the available copy of the work. — billinghurst sDrewth 12:20, 16 April 2021 (UTC)
i would just as soon route around their web1.0 text dump, and go back to the 1889 base text. [4] we need the free alternative text without all the revisionist prefaces, and copyright abuse. Slowking4Farmbrough's revenge 00:41, 17 April 2021 (UTC)
Great news, I look forward to your transcription. In the meantime, you can link to what is available so that those who are interested in the work can find it. — billinghurst sDrewth 01:07, 18 April 2021 (UTC)
This section was archived on a request by: — billinghurst sDrewth 12:52, 10 May 2021 (UTC)

Purpose of Proofread of the Month

@Cygnis insignis, @Billinghurst, @Prosfilaes, @Xover, @Inductiveload, @ShakespeareFan00: I cannot seem to figure out why we have a POTM and how the works are chosen. On the talk page for POTM, the text seems to be selected by one user and then everyone else goes shrug, whatever. I don't think that POTM has ever exceeded 500 pages completed a month. When I try to raise discussions about it, nobody seems to want to discuss it over there. So, I'm asking here. Why do we have a POTM? How are the works selected? Is it working? How does Wikisource define success when it comes to POTM? Languageseeker (talk) 15:22, 23 April 2021 (UTC)

  • Not sure why, something about collaboration and getting new users involved. There is also a badge. The selection of interesting and important works would see more contributions, from me at least. CYGNIS INSIGNIS 16:37, 23 April 2021 (UTC)
  • There isn't a single goal. The activity raises awareness of what we do, brings in new proofreaders, provides opportunity for even long-time editors to learn new approaches, increases the diversity of our works. . . . Usually a success is the validation of a work, so we pick a work that we expect could be completed in the time of one month. But not always: we have picked more challenging works when there was a clear plan, such as the huge book of Scottish songs we did, where we tackled a complex work and taught advanced editing technique to participants. To my mind, the best benefit is that it attracts new editors interested in the subject of the work, and since the editors are often new, we need a work that will be simple enough for newcomers to contribute to. To make that happen, the scan must be carefully prepared and checked so no unexpected issues will stop progress of the work. There should be few or no tables, limited illustrations, and relatively simple formatting throughout. My response may not fully answer your questions, but my answers aren't the only answers to your question. Part of what PotM is differs from editor to editor, and that is a good thing. --EncycloPetey (talk) 16:48, 23 April 2021 (UTC)
The archives of this page tell us that PotM was set up in August 2008. "Each month the community will select one text to be proofread and hopefully we can completely proofread the text in that amount of time." In those early days, the works were randomly selected according to the interests of the proposers. When Billinghurst and then I co-ordinated the selections, we tried to put in some coherence around the domains of knowledge selected and awarded badges for involvement in any month's work. RL got in the way of my heavy involvement in PotM and awarding the badges fizzled out. We encourage new editors to do some pages in the PotM so that they can get a feel for our processes and many of our most experienced contributors have started there and still contribute from time to time to particular projects. I'm only here because the June 2011 PotM was interesting to me and I somehow stayed. The challenge in selecting a work (as Cygnis insignis says) is to find one that is interesting. The "shrug, whatever" reaction to a selection is usually because the work doesn't appeal. We've also learnt over the years to avoid works with complex formatting. They simply don't work as collaborations, and aren't useful for teaching new editors. Beeswaxcandle (talk) 20:22, 23 April 2021 (UTC)
  •  Comment In the beginning it was sitting beside a range of OF THE (TIMEFRAME) components, though that was pretty well prior to us having a truly functioning proofread page system. When I rejigged, because it wasn't working, we discussed the purpose of being a better gateway to newbies, and to use it as avenue to spread our subject coverage. It was meant to pique interest and not be scary, able to be accomplished within the month, hopefully through validation. Part of the means to ensure that not one person or subject dominated was to have rough themes planned ahead where people could make future nominations. I also had a November validation month where we took already proofread works and validated them. I did set up quirky rewards for participants. I did it for a while setting up templates and processes, then BWC joined me, and others joined so I moved onto other things that no one else was doing, I also think that coincided with me becoming a steward. One also needs some variety and changes. — billinghurst sDrewth 23:16, 23 April 2021 (UTC)
"I think the entire idea of a work a month does not work." what is there to discuss? Slowking4Farmbrough's revenge 23:44, 23 April 2021 (UTC)


  • These are great replies that clarify some things. For me, the single greatest drawback to selecting one text is that it will have limited appeal. French Wikisource has demonstrated that having a variety of texts attracts more users. Wouldn’t it make sense to not just limit to only one text? Languageseeker (talk) 23:56, 23 April 2021 (UTC)
This section was archived on a request by: — billinghurst sDrewth 12:52, 10 May 2021 (UTC)

Clarification for WS:SHORT

Hi! So WS:SHORT's policy states that:

Reserved for Wikisource project reference pages (WS: namespace) only.

Intuitive. However, there is a note in the shortcut parameter of {{header}} stating that:

This is normally reserved for very large reference works (e.g. EB11)

Okay. So, the laws of the Philippines are often abbreviated (e.g. Republic Act No. 9003 -> RA 9003). The shortcut parameter is a perfect fit for this. However, I don't know if this is an accepted use for shortcuts. It would be awesome if this is acceptable.

Also, there are already some redirect pages created by others (RA 9188, RA 9189, RA 9190, RA 9191). If these shortcuts are not acceptable, then what to do with these already existing ones? If they are acceptable, then I can create shortcuts for all other Philippine law pages (Portal:Law of the Philippines).

TY! — 🍕 Yivan000 viewtalk 08:02, 27 April 2021 (UTC)

As long as no other works use the "RA" abbreviation, then creating redirects is fine. If the "RA" abbreviation is used for other works, then any that need to point at more than one work will need to be changed into disambiguation pages. The shortcut parameter in the {{header}} template is not appropriate for this purpose. Per the instructions at the Header template, the field should be very rarely used in the mainspace. Beeswaxcandle (talk) 08:14, 27 April 2021 (UTC)
Oh, okay then. I'll just add a separate box in the notes. TY! — 🍕 Yivan000 viewtalk 08:36, 27 April 2021 (UTC)
 Comment @Yivan000: I don't think that using RA is a good choice as a shortcut on its own, it is too ambiguous in my opinion, whether it is currently used or not. I can think of a number of general use for RA, outside of here. Please try something more distinct and universal, or think of a nomenclature that could describe the work. I would be more thinking PH-RA or something like that. In terms of RA nnnn redirects, that seems okay to me at this point of time. — billinghurst sDrewth 11:42, 27 April 2021 (UTC)
@Billinghurst: I think the redirects as is are already fine. For me, the PH-RA shortcut redirects are too much. Note that there are other abbreviations for other laws (like CA, PD, BP, BAA, PP, ...), and having two redirects for each is too much. Also, I searched all abbreviations in Special:PrefixIndex and there are no conflicts whatsoever.
All is good. — 🍕 Yivan000 viewtalk 13:49, 27 April 2021 (UTC)
This section was archived on a request by: — billinghurst sDrewth 12:53, 10 May 2021 (UTC)

McClure's Magazine

A note to say that I have been through the lists of volumes and moved any page that was linked and described as being from the magazine to a subpage per the existing schema. I have updated all the volume pages so we won't have issues of mislinks that existed. I am starting on a search of the main ns to see if there are pages from McClure's now sitting in main ns though outside of the magazine subpage hierarchy. [No promises of no errors, and happy to be pinged for anything that I broke and need to fix. The whole magazine definitely is in need of scans to facilitate proofreading.

template:McClure's link now exists if you find any pages and move them and need to relink through from the author ns. — billinghurst sDrewth 05:58, 13 April 2021 (UTC)

Quite a lot of tedious organizational labor -- thanks for doing that. -Pete (talk) 06:09, 13 April 2021 (UTC)
Oh yeah (*′☉.̫☉) 3 days and that is with helpers in TemplateScript. Had to get done at some point. — billinghurst sDrewth 08:20, 13 April 2021 (UTC)
The volumes are largely incomplete (and most of the bluelinks are to author pages[!]), can there be more assistance to someone browsing the works. I assume that is the purpose of the page. CYGNIS INSIGNIS 09:09, 13 April 2021 (UTC)
@Cygnis insignis: The volume pages were essentially indiscriminately rootpage linked, and that led to other works, other versions, version pages or disambiguation pages. I was just starting the process of tidying what has been problematic for a while, and letting the community know. Multiple times I have maybe done one work, and then walked away. Others have done the same thing at other times. This week I just put the nose to the wheel. Nothing more, nothing less. — billinghurst sDrewth 13:44, 14 April 2021 (UTC)
\o/ Good going. BTW, I have (just) made a script that can move root pages to subpages. Might be of use for similar endeavours: User:Inductiveload/Scripts/Move to subpage. Obviously it's alpha-level and might have quirks and issues. I will one day get all this crap up in a Git repo and have it properly maintained. But until then, safety squints required! It worked decently enough for ~400 entries of The Complete Poems of Paul Laurence Dunbar.
I would offer to bulk upload McClures but 1) I'll need to get round to gathering the upload data (job for anyone who is bored (yeah, I know, me neither) and 2) Commons has been erroring on large uploads for weeks now and I'm struggling to upload even one item, so even if I can organise a scan set, I can't actually offload them onto Commons and I'm now running out of local disk space. Inductiveloadtalk/contribs 13:48, 13 April 2021 (UTC)
Thanks for building a script. Having a break from McClure's for a while. Trying to not add any new major works and work on the existing maintenance tasks and tidying that are problematic. This task was not so neat and tidy to automate, each one had to be manually reviewed and sorted, and then the header data updated, prev/next found and added, etc. I will consider the script when I have more than 100 pages to move. — billinghurst sDrewth 13:50, 14 April 2021 (UTC)
@Billinghurst: No expectation that you should do more of anything! BTW, that script will (attempt to) do the next/prev links if it can. As long as you have at least one entry on each side of each moved page (whether it exists or not) the links will be generated. Of course, building the ordered list for ingest is still a timesink, especially when you need to figure out what does and doesn't exist on-wiki! It works for well under 100 page runs too, for example, it made moving the 16 pages of The_Heart_Of_Happy_Hollow out of the root level fairly easy. Inductiveloadtalk/contribs 16:27, 14 April 2021 (UTC)


I created a template{{IAu}} to make using the IA upload tool a bit easier. For example, if you want to upload McClure 10, the format would be {{IAu| mccluresmagazine10newy|McClure - Volume 10|pdf}} Upload McClure - Volume 10. Even if it doesn’t work, the link will remain on the page. It also does the usual of warning if the file is already on commons or if the IA link is invalid. You can batch upload links, wait for all of them to load, and then press upload in each individual file to start the uploads. It takes about 5m to load the metadata for about 70 files and probably around 5 minutes to hit all 70 buttons. Not saying anyone should do it, but it’s an option. Languageseeker (talk) 18:09, 14 April 2021 (UTC)
@Languageseeker: If you generate a file like this User:Inductiveload/Requests/Batch_uploads, I can do a batch for you. The problem with spamming straight uploads from the IA via IA-Upload is that their metadata is generally really, really bad and to actually fix it file by file takes a long time via the IA Upload interface, and even longer to fix retrospectively. For example File:The Atlantic Monthly, Volume 30.djvu vs File:The Atlantic Monthly Volume 41.djvu. Inductiveloadtalk/contribs 18:29, 14 April 2021 (UTC)
@Inductiveload: I agree that IA metadata is subpar, but you can still find the book. Moreover, I think it would be best to tag metadata fields while proofreading and then update the data rather than trying to capture all the metadata first. This way, a user would only need to type the metadata once and there could be more detailed metadata. For example, the TOC or articles could be included as part of the metadata. Languageseeker (talk) 20:41, 14 April 2021 (UTC)
@Languageseeker: If you get the metadata correct up front, the fill index gadget will nearly always import it correctly. If you just import the IA's dross, then you will need to edit it. For example: all 73 volumes of Index:The Jesuit relations and allied documents (Volume 73).pdf will now need to be manually fixed (or a bot run configured, which is fiddly and annoying). Dumping index pages with zero care over the metadata clogs up Category:Index - File to check (9) (Which was almost cleared a few weeks ago) and takes hours and hours of others' time to sort out after the fact. Whereas if you had prepared as a spreadsheet, you can copy most of the metadata to every row with "Ctrl+D" and it's done.
Yes, the fact that this has to be stored and replicated around the place is a bit clunky and I hope one day WD will become useful for us. However, it's not there yet. Inductiveloadtalk/contribs 20:50, 14 April 2021 (UTC)


@Inductiveload: I meant that it should be possible to fill the Index and metadata from the Pages by tagging the parts in the Pages ns with something like {{title|}}, {{subtitle|}}, etc. and then reading it back in. Instead of WD to Wikisource, it would be Wikisource to WD. Languageseeker (talk) 23:05, 14 April 2021 (UTC)

Why a Facsimile of First Folio than an actual First Folio?

Since Shakespeare is having a renaissance, I have to ask why we are proofreading a facsimile of the first folio instead of an actual copy of one? There are so many copies of it available on line that this makes no sense to me. Why not West 6 available on the Cambridge Digital Library or West 150 (external scan)? Languageseeker (talk) 01:41, 13 April 2021 (UTC)

What Wikisource content are you talking about -- could you provide a link? (And have you asked whoever started that transcription? Possible they were simply unaware of better sources..?) -Pete (talk) 01:48, 13 April 2021 (UTC)
This is the Index. Index:Shakespeare_-_First_Folio_Faithfully_Reproduced,_Methuen,_1910.djvu Languageseeker (talk) 01:55, 13 April 2021 (UTC)
there you go Index:Mr. William Shakespeare's Comedies, Histories, & Tragedies (1623).djvu have a nice time. Slowking4Farmbrough's revenge 02:59, 13 April 2021 (UTC)

Unless we are pushing a version through a program or project I think that it is inaccurate to say that we are preferring proofreading anything over anything else. We typically free proofread what we want to proofread, not a coordinated consensus on what work is next, and then why we have a consensus on our projects when we have them. — billinghurst sDrewth 06:03, 13 April 2021 (UTC)

@Billinghurst: It’s not about what user are working on, but the specific source file being worked on. There are numerous scans of the first folio available online while the one on Wikisource is a facsimile of an unknown printing of the first folio. Is there a way to propose moving the text from the facsimile to scan? Languageseeker (talk) 11:38, 13 April 2021 (UTC)
You are judging. We work from published editions of works, and that is what we have here, and as long as people source it and label it appropriately then that is all we ask. We will republished great works, crap works, sexy works, dirges, misogyny, murder notes, trials, enlightenment and so on. Our requirement is published. If you have a better version then start it. I don't understand lots of works that people work upon and that I do not value, and I don't put that down to their failure. — billinghurst sDrewth 06:03, 14 April 2021 (UTC)
Not sure if I'm understanding correctly, but doesn't the question boil down to the following?
  • Might we one upload a superior scan, or a scan of a superior work (and the answer, presuming that somebody has such a scan and it's clearly out of copyright, is yes)
  • What would it take to begin transcribing that alongside the existing work (not sure whether or not that's part of the question, I'm happy to address it if there's a need)
  • Maybe there is a desire to move the pages from the existing one to a superior scan? Not sure about whether or not that is what's sought, or would make sense in this instance. If so, there are script writers who could probably be persuaded to do such things...
Is there more to it than that? If there's a judgmental bit going on, it's going over my head. (Personally I find it's really valuable to learn why other people have made certain choices here...sometimes I learn something from the answer, sometimes I'm able to share useful info. But just because I ask the question, doesn't mean I'm judging.) -Pete (talk) 06:14, 14 April 2021 (UTC)
Index:Shakespeare - First Folio Faithfully Reproduced, Methuen, 1910.djvu is not an 'unknown printing'. The possibility exists, as you are aware, for Match & Splitting for another edition, but that might as well be direct import (with blame for errors being attributable to another site, not compounded by appearing to be a transcript of the linked index). CYGNIS INSIGNIS 12:22, 13 April 2021 (UTC)
By unknown, I mean that the West number is not known. The printers copy edited the w:First Folio as they printed it so there are differences between different printing of the First Folio, so shifting from one copy to another would require reproofreading which is less than desirable. Right now, I'm trying to get West 192 on Commons so that the text can be shifted over prior to validation. It seems a shame to proofread and fully validate a text only to have to redo it again. Also, the images and text in facsimile are of lesser quality than the original one. I think that Shakespeare's First Folio is important enough to warrant creating as best of a copy as possible. Languageseeker (talk) 13:29, 13 April 2021 (UTC)
@Languageseeker: The word "facsimile" here is a description of the type of edition, much like a "diplomatic edition" or a "limited edition" or a "hardcover edition". I.e. it's just another edition. The Methuen facsimile is also a known and well-renowned edition (and has been a standard edition since it came out; you'll still find courses thought based on it). If we really needed to we could find out which copies it used (it used multiple as I recall), we'd just need to slog through the literature. But, bottom line, there is absolutely nothing wrong with the Methuen facsimile, and it isn't interchangeable with any other edition. We could (and ideally should) host all 235 extant copies of the First Folio (and at least one copy each of the Second, Third, Fourth Folio, and the False Folio)—not to mention most of the Qu—artos—but that's an entirely separate issue. --Xover (talk) 17:28, 13 April 2021 (UTC)
Alright, good to know. Although, we might have to do some detective work to find the missing ones. Just kidding. Would there be an objection to adding West 192 to have a single edition First Folio? Languageseeker (talk) 17:31, 13 April 2021 (UTC)
BTW, I’m working on adding the quartos to Commons from the BL although the pages aren’t split. Languageseeker (talk) 17:37, 13 April 2021 (UTC)
@Languageseeker: West 192 is the NSW copy, I think? But any of the extant copies are fine for hosting. Some of them have missing pages and such, but surprisingly few due to the fact that owners have continually patched them over the centuries (some of them are real FrankenFolios). But if we're transcribing the First Folio rather than a later collected edition it is specifically for such unique features (like the page numbering snafu), so a couple of missing pages isn't really a problem.
If we're going to pick just one copy for a concerted proofreading effort we'd want give it a lot more thought (and I'm not at all certain West 192 would be it), but that's a mainly hypothetical question until we actually have a half a dozen or so volunteers ready to dig into the proofreading. The First Folio is really tough to proofread (properly).
BTW, I wouldn't recommend bulk uploading scans that have unsplit pages and similar problems. We can work around most such issues after the fact, but that's mostly applicable to when you run across an existing file and can't get an alternate scan. If your goal is just to prepare these for others to proofread, uploading lots of scans that need workarounds or doctoring has limited value (i.e. the odds are that someone else will have to fix them before they get proofread). --Xover (talk) 18:40, 13 April 2021 (UTC)
@Xover: Thanks for great reply. I normally wouldn't upload unsplit scans, but they are such important works that I figured that it's worth having a copy on Commons. The British Library hosts the scans in unsplit format so I'm making uncompressed pdfs from them to preserve them the future. Languageseeker (talk) 23:35, 13 April 2021 (UTC)
@Xover: It took 27 batch download jobs and over a day, but The second, third, and fourth folios now have indexes. Languageseeker (talk) 15:32, 15 April 2021 (UTC)

Please help correct my small error

I clicked and saved the wrong button on Page:A Wild-Goose Chase - Balmer - 1915.djvu/281 when I meant to validate the page. It appears that I proofed a page that was already proofed. If someone will Vallidate this one page, this will complete the validation for the book. Thanks to anyone who can take care of this. Maile66 (talk) 19:22, 15 April 2021 (UTC)

@Maile66: Done Inductiveloadtalk/contribs 19:24, 15 April 2021 (UTC)
Thanks. That was quick. Maile66 (talk) 19:33, 15 April 2021 (UTC)
@Maile66: You're welcome and congrats on the completed validation! Inductiveloadtalk/contribs 19:34, 15 April 2021 (UTC)

5,000 works fully validated

A few hours ago, our five thousandth work was validated. Index:American Rescue Plan Fact Sheet - Impacts on Kentucky.pdf (a White House publication) was validated by User:Clay. For a complete list of our validation milestones see Portal:Proofreading milestones. Beeswaxcandle (talk) 23:13, 16 April 2021 (UTC)

Woot, looking at the statistics, it took 4 years to reach 1,000 and about 7 months to go from 4,000 to 5,000. Look's like we must be doing something right. A huge achievement. Languageseeker (talk) 02:47, 17 April 2021 (UTC)

Creation of Version Pages for the Individual works of Charles Dickens

In the discussion of the proposed move for Oliver Twist to it’s correct edition name and the creation of a version page on Oliver Twist, User:Xover informed me that such a proposal requires a formal vote prior to a second version of the text being proofread. Therefore, I’m submitting a formal proposal to the community to discuss the pros and cons of creating version pages for the individual works of Charles Dickens. It’s important to note that Dickens usually published 6-12 distinct editions with his editorial revisions and listing them on the author page occupies a significant amount of space. Languageseeker (talk) 15:45, 11 April 2021 (UTC)

I think this discussion (which is occurring in several places) is a bit stuck, and could benefit from a more explicit articulation of the basic problem you're trying to solve. Personally I think you've made it pretty clear, but that's just my perception…it also seems like others have had trouble seeing it. So I'll take a crack at that:
Up until recently, the author page for Charles Dickens dedicated a few lines to the novel Oliver Twist. (The same issue applies to many of his novels, but I'll use this one as an example.) These lines identified several editions of the novel; but they were not comprehensive, nor did they offer complete bibliographic information. More recently, this was fleshed out in much greater detail (see here). But this increased detail, which helps the reader readily learn about the various editions, has an undesirable side effect as well, i.e. it makes the Author page unwieldy and difficult to parse. One might reasonably assume that most readers are interested in an overview, like "which novels did Dickens write," and that only more specialized readers want to delve into the weeds of which editions are which, etc. A natural organizational principle for a website is to put the high-level information on the main page, and then move the more detailed information to the page linked, which was accomplished by creating Oliver Twist versions. This appears (to me) to be in keeping with common practice and with documented procedures at Wikisource.
Other experienced editors seem to disagree. I believe the principle is that a "Versions" page should only exist when there are multiple editions or versions transcribed on Wikisource, not merely published elsewhere.
I think the introduction of this last principle is the root of the present problems. As far as I can tell, this principle is not in Wikisource's documentation of versions pages, so an assumption that other editors would know it strikes me as potentially a bit rude. But more significantly, I think the principle itself is a bit myopic. A reader may well want to learn about multiple editions, and they may be glad to find bibliographic information that will permit them to find them in a traditional library, and/or links where they can find it at Gutenburg, Internet Archive, LibreVox, information at Wikipedia, etc. Does the reader really care, as a primary principle, whether or not they find a transcription specifically on English Wikisource? To me that seems like a principle that overvalues our specific work, at the expense of broadly respecting the work of scanning, transcribing, documenting, and otherwise preserving published works (which occurs at many websites and traditional institutions).
I would argue that we should freely move detailed information about a work that has had multiple editions published to a "versions" page, irrespective of whether or not any particular number of them has been transcribed here at Wikisource, whenever that action serves to make an "Author:" page more readable/useful (or for any number of other reasons). It doesn't strike me as particularly controversial, nor contrary to any policy I've found, to do so. -Pete (talk) 18:45, 11 April 2021 (UTC)
Thank you for your extremely detailed and cogent summary of the matter. You've said it better than I probably could. One of the biggest issues is that the works of Dickens have an extremely complicated publication history that requires access to either his letters or the Clarendon Dickens to sort out. In certain years, Dickens released multiple distinctive corrections of a particular work. Therefore, you can't even say the 1838 edition of Oliver Twist. Instead, you have to describe the title page to identify the edition. Does it say Charles Dickens or Boz? To add to the confusion, in at least one case, Dickens published two different revisions of one edition differing in one plate and several other corrections. Furthermore, Dickens corrected a number of errors in each revision and the printer made even more. Therefore, the scholarly consensus is that the texts got worse as time progressed. Not all of these versions are available as scans on IA. Therefore, to make sure that future users can be sure that they're tracking down the right editions, I'm recording the descriptions of the individual versions on Wikisource so that future editors know what to find. Languageseeker (talk) 19:39, 11 April 2021 (UTC)

Not convinced. Disambiguation pages are meant to be simple directing pages. They are not meant to be long explanatory documents, and definitely not the encyclopaedic articles. If we need to do something special then can we look to do it in the author: ns, probably a subpage where we can curate things in a more holistic sense. LBT has done that with quite a few detailed descriptive and explanatory subpages to her poetic authors. — billinghurst sDrewth 14:38, 12 April 2021 (UTC)

Why not? Either we're going to have these on author pages or author subpages or versions pages, and having them on author pages would make them huge, so put them on the versions page.--Prosfilaes (talk) 23:19, 12 April 2021 (UTC)

Why not? Because there is no version page and means unnecessarily disambiguating when we have no guarantee that the works will ever exist on-site. Get the works and we disambiguate. This is not the encyclopaedia, and encyclopaedic explanations do not belong on versions pages. — billinghurst sDrewth 00:14, 13 April 2021 (UTC)
The bigger question is do works by an author belong on the author page even if there are no scans available? Each edition of Oliver Twist is a work by Charles Dickens distinct from his other works. Perhaps, we can just group them in by headings. Languageseeker (talk) 00:29, 13 April 2021 (UTC)
We have tons of lists of works we may never have, some of which won't be uploadable for decades. It seems entirely within our mission and practice to offer lists of important variants of one work. There might be a place where this gets too much, but a carefully selected set of works by Charles Dickens seems pretty far from that place.--Prosfilaes (talk) 00:57, 13 April 2021 (UTC)
(e/c)A list of works by an author belongs on the author's page, even if they are currently redlinked. At some time in the future a scan will appear. The purpose of a versions page is to list the versions we do currently host so as to provide a way of finding them easily via a search for the work's name. Most readers coming here to read a particular work just want to know if we have a copy and don't really care about a list of copies we don't yet have. In other words, I don't support the proposal to create version pages for each of Dickens' works in anticipation of hosting other editions at some undefinable time in the future. Beeswaxcandle (talk) 05:00, 13 April 2021 (UTC)
For many works, a scan may never appear, and if it does, it will be at some undefinable time in the future. There are a lot of works that I'd expect a scan for long after all the editions of Oliver Twist are complete. I'm not dead-set on this going on version pages, but we have works that people are actually interested in eventually doing, and putting them all on Dickens' page is going to make too much noise.--Prosfilaes (talk) 07:48, 13 April 2021 (UTC)

A scenario worth considering: How does a reader come to land on a Wikisource page about Oliver Twist, and does Wikisource serve up something that is appropriate? I imagine that most people who arrive here would be referred by another Wikimedia site, or by a Google search. If they land on a specific edition which provides no indication that there are other versions, or why they are looking at this version, is that a good outcome? If not, how should we address it? One approach would be the one Languageseeker has suggested. I have not yet seen an alternative proposed, but I'd like to. I suppose we could have an extensive "notes" section at Oliver Twist -- is that what those opposing the existence of a versions page would like? Or if not that...what? -Pete (talk) 04:52, 13 April 2021 (UTC)

I've kind of answered this in my previous post. However, we do not want an extensive note about other editions at the sole edition we currently host. However, a note could point to the Author and/or the Wikipedia page for detailed bibliographic information. The quickest way around this issue for Oliver Twist is to locate another edition, proofread it and then make it available. At that point create a versions page and put the {{other versions}} template on each of the hosted editions. The versions page would then have two editions on it. Beeswaxcandle (talk) 05:06, 13 April 2021 (UTC)
Poor Oliver Twist really did come into this world to make trouble. If this is the policy, it should be stated and all pages that violate it should be sdeleted. For now, it seems time to create a version page for The Pickwick Papers because there are two versions. May I suggest calling the present one The Pickwick Papers (Project Gutenberg). Languageseeker (talk) 05:36, 13 April 2021 (UTC)
@Beeswaxcandle: I appreciate your direct answers, and I think they demonstrate clearly where we disagree. (As a side note, I think it's unfortunate that these sort of basic philosophical questions about what we're doing here are so unresolved, and given that they are, it makes sense that we'd have strong disagreements about specific things like this. But I digress.)
Anyway, I disagree with this:
"The purpose of a versions page is to list the versions we do currently host..."
I believe that is a purpose of a versions page, but not the be-all, end-all reason. I believe that readers come here with expectations and needs that vary according to the user, and according to the kind of work they're seeking; but I doubt that very many of them are especially concerned with the distinction of whether a work lives here on Wikisource or not. I believe the reader is well served if our pages provide some context about the work they're viewing; and in some cases, there might be particularly important context to provide. From what Languageseeker has said, it seems that Dickens' works are just such a case: there are many editions with substantively different content, and varying provenance, and the Wikisource editors who happened to choose one of those editions may or may not have even been aware of the variety of choices available, much less made a well-informed decision about which to transcribe. In my view, the least we can do is prominently and concisely provide contextual information about what editions exist, before guiding them to the one we happen to have.
It's worthwhile to keep in mind that Wikimedia's structure beyond Wikisource may deepen the sometimes erroneous impression that Wikisource's transcription is of the most authoritative work. A Wikidata item about the work (as opposed to the edition) may link to the Wikisource transcription of a specific edition. That might be OK in a case where there are no major differences among editions; but where there are such differences, is it really a good thing for Wikisource to have no page for the Wikidata item to link to? The Wikidata item for Moby Dick links to a nice, concise versions page which contains only one transcribed version, and one version that appears not to be transcribed here. Is there a reader that is harmed by the existence of that versions page? Personally, I am pleased to know that there are different UK and US first editions, and I'm fine with clicking through that page to the one that exists on Wikisource. Is it really better that the Wikimedia universe convey the impression, in all the places that consult Wikidata, that Wikisource has nothing about Moby Dick?
In short, I think it's important that our policies around versions page leave some room for individual judgment by people who know something about the work. I'm not arguing that we need to create a versions page for every single work that has multiple editions, but rather that where an editor believes there is value in doing so, we should not generally interfere with that editor creating such a page, linking it appropriately on Wikisource and Wikidata, etc. I do not see the harm in taking this approach, and I'd been under the impression up until now that it's the approach we do take. -Pete (talk) 21:18, 15 April 2021 (UTC)
  • this last principle is the root of the present problems I disagree with this assessment. I think the root of the misunderstanding here is a desire for Wikisource to be something which it is not. In particular, Wikisource is not a bibliographic database. The purpose of Wikisource is not to collect all bibliographic data for a single edition, nor to collect bibliographic data for all editions. It is especially not for collecting extensive bibliographic data on an arbitrary subset of editions of a given work that someone finds particularly interesting.
    We do collect such data when it aids our primary work, but then we do it on WikiProject pages (if there is active ongoing work related to it) or Author talk pages (long term storage of research for the benefit of other contributors). If there are (semi-)objective criteria for whatever the subset is, it may also be appropriate to create a Portal: for that subset (a portal being a thematic collection of works or editions of works).
    This is sometimes sinned against: we list lots of editions on Author: pages that should ideally only contain works, and our versions pages tend to amass entries for editions that we do not yet have (for various reasons), and mostly this is fine and bothers no one. It's not something that really has to be enforced as a bright-line rule, and I think a lot of contributors find them useful or at least not in any way bothersome.
    But the problem comes with special pleading for one's pet project to override the wider good of the project. For example by insisting we move aside a proofread work (our primary purpose) for an arbitrary selection of bibliographic data (not at all our purpose), for editions we do not currently have and have no particular expectation of appearing any time soon. Such a plea from someone actually working on proofreading multiple editions of Dickens' works would get a very sympathetic hearing with me, because at that point there is a genuine need to structure pages to accommodate multiple editions. This current request, and all the words expended on it, is not that. --Xover (talk) 06:26, 16 April 2021 (UTC)
But we frequently are a bibliographic database. We have author pages, like Author:Isaac Asimov, stuffed with works that will not be PD in decades. A careful distinction between works and editions is purely bibliographic, artificial and not productive for us. I can see an argument that we should only link works/editions that are actually in the process of being worked on, but that's not what we do. If people on Wikisource want to list a set of works they want to work on, I don't think it matters whether they're "works" or "editions" by some definition.--Prosfilaes (talk) 08:22, 16 April 2021 (UTC)
I would actually support removing all the dashed redlinks for works in Author ns. Completely ugly and unnecessary. I would also comment that works under copyright should use {{copyright until}} and not be plain redlinked. Though having these arguments here will detract from finishing the pertinent argument which I still do not support. — billinghurst sDrewth 12:30, 16 April 2021 (UTC)
No, we frequently amass bibliographic data as a by-product of our main purpose, but that's fundamentally different from being a bibliographic database. Even ignoring the issue of our purpose, we do not have any tools that would set us up to be useful for creating such a database or for using us as such (one primary mode of use would be semantically rich querying: even the monstrosity that is WorldCat can distinguish queries for authors vs. titles). If your goal is a bibliographic database you and everyone else will be better served by working on either Wikidata or OpenLibrary, both of which have far far superior tools for the purpose than we do, and are set up along similar principles (crowed-sourced, openly licensed).
The distinction between works and editions is not artificial, but definitional; and it happens to be a good distinguisher for us in practice. Listing all works by an author is desirable and achievable, but listing all editions quickly becomes impossible. That's why this thread only proposes to collect an arbitrary subset of them. It also makes author pages entirely unwieldy and no longer fit for their purpose, which is why this thread proposes to move them to a separate page. In other words, ignoring the work—edition distinction and the difference between bibliographic data collected as necessary and incidental to our main purpose, and as a primary purpose, would lead to preemptively creating versions pages for every single work and producing—byte for byte, character for character, and page for page—more bibliographic data than actual content.
And you turn the issue on its head with If people on Wikisource want to list a set of works they want to work on, I don't think it matters whether they're "works" or "editions" by some definition. It's always ok to only list the works you actually want to work on. We're a wiki, so someone else will hopefully add the rest at some point. It's preemptively listing editions that you have no intention of ever working on, and supplanting actual proofread content that someone else actually has worked on, that is the problem here. That's why it's necessary to talk of high-falutin' principle stuff like "the purpose of Wikisource": doing that only makes sense if our primary purpose is amassing bibliographic data. It is also only a problem when you start focussing on the editions as the primary information: so long as you stick to works there's no need to move mainspace content around (except possibly to correct titles), until you're actually proofreading an additional edition of it. --Xover (talk) 09:16, 16 April 2021 (UTC)
The distinction between works and editions can be both definitional and artificial; working on LibraryThing has shown me whether two books are the same work can be a fraught and complex discussion. There are real Ship of Theseus issues in some works; are all the editions of the Encyclopedia Britannica the same work? If yes, you're combining volumes that have no text in common as the same work. Are all editions different works? Then you're designating as different works things with merely orthographic changes. Not only is the third option complex and fact dependent, the whole Ship of Theseus thing means that's there's no clear lines.
No, listing all works by an author is often not achievable, and is frequently problematic in the same way that listing all editions is. A newspaper journalist may have at least one work in every daily newspaper for thirty years. A proper list of Dickens' works includes any number of articles from Household Words and other periodicals, plus 12 volumes of letters. I fail to see why adding select editions that people want on Wikisource is a problem.--Prosfilaes (talk) 07:46, 17 April 2021 (UTC)
Shrug. There are certainly edge cases; but the difficulty in making the determination does not affect the nature of the distinction. And as I said, amassing metadata about some additional editions isn't a problem in itself so long as it is treated as a by-product of our primary purpose, a secondary concern. It is when it is promoted to a primary concern it starts causing problems. For example by making Author:Charles Dickens so unwieldy that those "tidying" it (that is what a tidied author page looks like?!?) feel it imperative to move the one actually proofread edition of Oliver Twist we have aside in favour of their own subjective selection of "important" editions, and, when declined, to create a Oliver Twist versions and a subsidiary network of things like Oliver Twist (Charles Dickens Edition) (and disagreeing is apparently so rude that one becomes persona non grata with the proponents). It leads to pseudo-encyclopedic non-neutral link-farms like David Copperfield (Authoritative? egads! Why don't we just create an Amazon Affiliate account and be done with it?).
Meanwhile, not a single page of any Dickens work appears to have been proofread, beyond a single Match & Split that didn't even bother to clean up the obvious breakage afterwards. Why would we privilege being a poor man's LibraryThing (which is not our purpose, and for which we have no even remotely adequate tools) over actually proofreading works (the purpose for which the project does exist)? I'm more than averagely fond of bibliography too, and would like to do it within the Wikimedia movement, but the way to do that is advocating for improvement to and integration with Wikidata and the front ends / user interfaces to it, that the WikiCite folks have been having conferences about for going on a decade without any measurable progress.
It is entirely appropriate to discuss individual exceptions where merited. Dickens is a big (notable/important) author, and a prolific one. I am sure there is a coherent argument to be made related to the nature of the early editions of his works (it just hasn't been presented yet). As an exception I'd certainly be willing to entertain the notion, though I suspect it would lack some key factors to persuade me (part of the need can be met by one or more Portal: pages, and the other parts are moot unless a significant number of actually proofread works exist to create a practical, rather than theoretical, need). But as a principle I am vehemently opposed, both because it turns the nature of Wikisource on its head for no good reason, and because it just simply is not a good idea. --Xover (talk) 09:54, 17 April 2021 (UTC)
As I said, these are not some random editions that I happened to hear about. These are the editions that Charles Dickens revised himself. Such revisions range from correcting errata, to rewriting passages, to replacing plates. I'm planning on adding transcription projects as soon as Commons fixes its system. Honestly, from this conversation, I believe that we need an actual vote system and not just long discussion threads after which an administrator decides what to do. Languageseeker (talk) 12:44, 17 April 2021 (UTC)
Also, by properly defining the edition, it allows the reader to know which edition they are reading. Imagine, if the first version of Hamlet on Wikisource was Q1, would you vote against moving it to Hamlet (Quarto 1) and listing the other Quartos and Folios? This is an exact parallel case. For Oliver Twist, besides the errata, Dickens rewrote much of the book in 1846 making there two very distinct texts. At this point, a few of the Dickens works have versions created prior to me and others do not. Languageseeker (talk) 12:52, 17 April 2021 (UTC)
@Languageseeker: We are not saying any of that isn't the case, you have told us over and over and we understand. When you or someone else has another edition we will disambiguate it. Until then, record the information that you want in the Author namespace as that is where we have typically done that work, and we have not prematurely moved editions just because there may be another edition. — billinghurst sDrewth 13:32, 17 April 2021 (UTC)
On fr. we do our best so that access to the latest version of the text corrected by the author is the privileged goal of the work of wikisourcists, but all editions are welcome too. --Zyephyrus (talk) 15:10, 17 April 2021 (UTC)
Great to hear from fr., you're the reason why I discovered the English Wikisource. I'm astonished by how many texts of major authors you have scan backed. Merci!
For Dickens, it's actually the opposite. The first editions are the best and the last edition is usually considered the worst. Languageseeker (talk) 15:19, 17 April 2021 (UTC)
The nature of the distinction is that it's an artificial one. For any two works, there is a set of works in Borges' library (or generatable by computer program) that are one letter away from each other, connecting the two works. It's not difficult to make the distinction; it's impossible to make in any non-arbitrary way.
My problem is not that we should collect bibliographic data; it's that we shouldn't. We shouldn't have long lists of works that no one is going to upload any time in the next decade on pages like Author:Isaac Asimov, but as long as we do, we should not use bibliographic rules to decide whether or not an item gets listed.--Prosfilaes (talk) 16:12, 17 April 2021 (UTC)

I recently saw a nicely constructed page of editions in the main-(ie reader-)space, that contained a link to a complete text here, being routinely edited to add a light blue (external) link to IA.org for the 1st ed. A solution to any objection might be to add the index to consolidate its inclusion in main-space, implying that someone [else] might want to labor on proofreading that choice as well; an emergent property of this example is moving that [more or less tolerated] bibliographic data from the Author namespace (our catalogue?). CYGNIS INSIGNIS 12:30, 17 April 2021 (UTC)

Allow the Shakespeare Quarto's with splitting the Pages

This proposal has two components. First, expand the scope of Wikisource to allow all copies of Shakespeare Quarto's. Second, allow the transcription of these Quartos without splitting the individual pages. Languageseeker (talk) 01:03, 18 April 2021 (UTC)

(1) What is the advantage to Wikisource of transcribing multiple copies of the same printing? (2) How do you propose to preserve pagination if there are multiple pages to each scan page in the scan copy? --EncycloPetey (talk) 02:39, 18 April 2021 (UTC)
All 32 copies of Hamlet have proofread text that needs mainly formatting. There can be printer variants between the texts. Also having all 32 copies will attract visitors, especially since the original site has been taken offline because OUP decided not to upgrade the flash infrastructure to html. For transclusion, you can use Help:Transclusion#Adding_section_labels. Languageseeker (talk) 02:48, 18 April 2021 (UTC)
I am not understanding the proposal as written. How is adding versions of a work expanding a scope. If that is not the proposal, then stop speaking in jargon and have an expectation of an author's work or what is done at another site. Are you proposing no scans, etc. If you are talking bout 32 different sourced versions of Hamlet, +++, that is within scope. Versions are versions, versions are allowed. Otherwise, I simply don't understand the scope of the proposal. — billinghurst sDrewth 03:03, 18 April 2021 (UTC)
@Billinghurst: Because you said it was out of scope and removed the links.

Universal Code of Conduct – 2021 consultations

Universal Code of Conduct Phase 2

The Universal Code of Conduct (UCoC) provides a universal baseline of acceptable behavior for the entire Wikimedia movement and all its projects. The project is currently in Phase 2, outlining clear enforcement pathways. You can read more about the whole project on its project page.

Drafting Committee: Call for applications

The Wikimedia Foundation is recruiting volunteers to join a committee to draft how to make the code enforceable. Volunteers on the committee will commit between 2 and 6 hours per week from late April through July and again in October and November. It is important that the committee be diverse and inclusive, and have a range of experiences, including both experienced users and newcomers, and those who have received or responded to, as well as those who have been falsely accused of harassment.

To apply and learn more about the process, see Universal Code of Conduct/Drafting committee.

2021 community consultations: Notice and call for volunteers / translators

From 5 April – 5 May 2021 there will be conversations on many Wikimedia projects about how to enforce the UCoC. We are looking for volunteers to translate key material, as well as to help host consultations on their own languages or projects using suggested key questions. If you are interested in volunteering for either of these roles, please contact us in whatever language you are most comfortable.

To learn more about this work and other conversations taking place, see Universal Code of Conduct/2021 consultations.

-- Xeno (WMF) (talk)

20:45, 5 April 2021 (UTC)

I am interested in hearing the input of Wikisource users about the application of the Universal Code of Conduct, especially from the perspective of interactions on Wikisource. Xeno (WMF) (talk) 23:56, 17 April 2021 (UTC)

Should mdash be surrounded by space or not?

Always used mdash without surrounding spaces which I believe is/was the community standard. However, the "Clean up OCR" script surrounds it with spaces as it wraps the paragraph lines. For me, line wrapping is the last step in proofreading and final spell checking.

As shown in the above comparison, I have my own AutoHotkey text standardization and line wrapping script, which is identical in every respect to "Clean up OCR", but I can modify it to whatever the current community requirement is.

I needed an additional an AutoHotkey script which does a partial cleanup of the text without line wrap, to help me identify the OCR errors between mdash and the hyphen, by surrounding the mdash with a space, and I left it so in final format. But some editors noted this and that is why this post. Which should it be?— Ineuw (talk) 23:53, 11 April 2021 (UTC)

We replicate the work. In the works that I have seen, typically not. I have no idea what the grammaticians and the editors of the world have a rule to apply at WP. We also have not typically done the half spaces, and all the other unicode that can apply. — billinghurst sDrewth 23:58, 11 April 2021 (UTC)
If there is a space, make sure that it's a non-breaking space with &nbsp;. —Justin (koavf)TCM 02:18, 12 April 2021 (UTC)
IIRC, em dashes will always linebreak even with &nbsp;? They break when set close-up against letters or other punctuation. Pelagic (talk) 01:28, 18 April 2021 (UTC)
Not. CYGNIS INSIGNIS 04:26, 12 April 2021 (UTC)
The "rule of thumb" usually offered in the past is that there should not be spaces around em-dashes. Where spacing appears around such dashes in older works, it is usually half-spacing that flanks the em-dash rather than full spaces. Wikisource has opted not to use half-spacing, and so in most cases we collapse half-spacing around em-dashes. However, there are works where there is clearly a full space or even a double-space next to one (or both) sides of an em-dash, such as at the end of a line of dialogue, so there isn't a simple rule that would apply in every situation. Hence a "rule of thumb", although the Wikisource:Style guide (s.n. Formatting, 7. Punctuation) explicitly advocates for "Whichever dash is used, it should not be flanked by spaces." (emphasis mine) This statement applies in the majority of situations. --EncycloPetey (talk) 23:21, 12 April 2021 (UTC)

Importing the Shakespeare Quarto Archive

So, the Shakespeare Quarto just got taken down this week because of the end of flash. However, the texts and images are available on [5] is there any easy way to import the texts and files of the proofread quartos into Wikisource? The encoding scheme is here [6]. Rescuing all 32 quatros from the digital oblivion seems like a worthy project. Languageseeker (talk) 04:16, 17 April 2021 (UTC)

The Pickwick Papers

Move The Pickwick Papers to The Pickwick Papers (Gutenberg) because The Posthumous Papers of the Pickwick Club (Charles Dickens edition) also exists. Languageseeker (talk) 13:50, 17 April 2021 (UTC)

I have moved the work and left a redirect. I am sorry but The Posthumous Papers of the Pickwick Club (Charles Dickens edition) is simply not ready for display as it is basically a pointer to another page set of pages, and a set of pages under a complete unwieldy name. The target is not a suitable target. Display of the works like that as two volumes is in my opinion an efficient display methodology. Just because a work was published in two volumes due to limitations in publishing methodology is no reason for us to present it like that. — billinghurst sDrewth 03:15, 18 April 2021 (UTC)
I have converted the page The Posthumous Papers of the Pickwick Club (Charles Dickens edition) to a redirect as that is pretty much how it should be at this stage. — billinghurst sDrewth 03:20, 18 April 2021 (UTC)

IA-upload offline (error 503)

@Samwilson: If you are around, would you be able to have a look at IA-upload, it is telling me that the service is not available. Thanks if you can. — billinghurst sDrewth 12:55, 17 April 2021 (UTC)

@Billinghurst: Sorry about this! I've restarted the web service and it's back online now. It was down for 23 hours and 23 minutes. The error log was being filled with URL rewriting debug output (I've turned that off now) and I couldn't see at a glance what went wrong. Will keep an eye on it. The uptime log is here: https://stats.uptimerobot.com/BN16RUOP5/782616657Sam Wilson 00:39, 18 April 2021 (UTC)
@Samwilson: Could you investigate the situation, the IA upload has been mostly down for the past few days. Languageseeker (talk) 02:00, 20 April 2021 (UTC)

The Tragicall Historie of Hamlet Prince of Denmarke

What exactly is this? --EncycloPetey (talk) 00:47, 18 April 2021 (UTC)

The first Quarto of Hamlet from the defunct Shakespeare Quarto Archive. Languageseeker (talk) 00:49, 18 April 2021 (UTC)
Then why are there two different redlinks with bad syntax and a transcription project for one of the redlinks? Is this the First Quarto of Hamlet or a mishmash? --EncycloPetey (talk) 00:52, 18 April 2021 (UTC)
There are two copies in the world. The shelfmark identifies the copy. There is one Index because Commons is not playing nice with Pattypan. Languageseeker (talk)
@Languageseeker, @EncycloPetey: it is out of scope and unnecessary. We don't have either work, and we don't create work pages like that in main namespace. — billinghurst sDrewth 00:58, 18 April 2021 (UTC)
Um, there was a transcription project. Languageseeker (talk) 01:00, 18 April 2021 (UTC)
(ec) I have moved it to user namespace. There is a whole section above about creating ugly stick pages. I keep hoping that I never see {{ext scan link}} in main namespace again. So very tempted to make it so that template does not display in main ns. — billinghurst sDrewth 01:03, 18 April 2021 (UTC)
I moved the links to the main version page of Hamlet and created a proposal to make them in scope. Languageseeker (talk) 02:13, 18 April 2021 (UTC)
And I have taken them out. Get your proposal through first, red links typically not there unless they are appear somewhere else onsite — billinghurst sDrewth 02:53, 18 April 2021 (UTC)

 Comment I have made the above link as a redirect. We would never have that as a versions page, as we only ever have one versions page for a work. If/when there is a completed transcription, they will appear on Hamlet (Shakespeare) as per all other versions. — billinghurst sDrewth 02:56, 18 April 2021 (UTC)

And how are they supposed to be finished if the scan link cannot be posted? Languageseeker (talk) 03:31, 18 April 2021 (UTC)
It can be posted, but not in the mainspace. This template belongs in the Author: and Portal: namespaces. Beeswaxcandle (talk) 05:47, 18 April 2021 (UTC)
@Languageseeker: What makes you think that you are the only person with works that are unfinished and unlinked from the main namespace? These works are no different from any other work, so please use the existing system. If you want to set up a project please look at Wikisource:Wikiprojects just like the rest of us have had to do. — billinghurst sDrewth 06:07, 18 April 2021 (UTC)
@Beeswaxcandle: There is nothing in any of the documentation or pages about style and format that states the template belongs in the Author: and Portal: namespaces only. Nor in fact is there any documentation about its use in any particular namespace(s). But—supposing for the moment the link must be placed in the Author ns—where on Author:William Shakespeare (1564-1616) should the small scan link to the First Quarto of Hamlet be placed? --EncycloPetey (talk) 16:16, 18 April 2021 (UTC)
Templates are typically required for the purpose of the page, and the addition of this external link template to a "versions" page is changing the purpose of the page then to disambiguate our versions. The typical use of the template is to point to things from our curated author and portal namespaces. There may be a special case for its use in main ns, but it is not on version pages. So it is a false argument to say that it doesn't say it cannot be used in main ns, it is the use that you are undertaking. — billinghurst sDrewth 00:39, 19 April 2021 (UTC)
So maybe the argument that you would like to develop is how does its use on a "versions" page fits with what we are doing, rather try and argue that the template doesn't have a rule that it cannot be used in another namespace. Context is far more important. — billinghurst sDrewth 00:43, 19 April 2021 (UTC)
[ec] But the Versions page come into being when content from the Author namespace moves into the Main ns as a Versions page. That is, when we have only a single edition, that edition is listed on the Author page. when there are multiple editions to manage, that content moves from the Author namespace to the Main namespace. It remains the same content; only its location has changes. Why then would a template apply to that content when it occurs in one location but not in the other?
When I created all the disambiguation and versions pages for the plays of Shakespeare, I advertized that process here in the Scriptorium and asked for feedback on what I was doing. At no point did anyone raise objections about the use of the {{small scan link}} and {{ext scan link}}. I also point out that, prior to the creation of those pages, we had only copy-paste Gutenberg texts for most of Shakespeare's plays. Once the versions pages went up, we had (at least) eight people join in with the transcription of various editions of Shakespeare, and they found those editions because there were links to the Index pages of the scans on the Versions pages. There were Index pages, and some of them had languished for years, but making those pages known led to the completion of several texts and the cleanup of several others.
I still do not understand why there is such strong resistance to the use of {{small scan link}} next to the listed works they contain. They allow editors to see that a transcription has been started and find it. We repeatedly have cases where an unlisted scan is finished, only to find it is a duplicate of an existing work, and the person who spent all that time is usually (and rightly) furious that no one bothered to coordinate listings. Removing {{small scan link}} from versions pages will exacerbate rather than ameliorate that issue. And as I have pointed out, such listing have directly led to the completion of long-languishing works.
If the matter of the listings is the same, why does it matter which namespace is located in? If we had the Versions pages in a separate "Work:" namespace like the Italian Wikisource has done with "Opera:" would that make it OK to use the template? If not, then why not? If so, then the argument against the use of {{small scan link}} in the Main ns is circular. --EncycloPetey (talk) 00:54, 19 April 2021 (UTC)
So this comes back to the basic questions
  • What is a versions page, our versions, some versions or all versions?
  • When should a versions page exist?
  • When would we make exceptions to either of the prior two questions?
At this stage we are appear outside the scope of Wikisource:Versions and as I attempted to address in the next section, we have got there by creep, not through clarity of our scope. If we are going to morph, then we need the central directional pages to move with them. We also need to be clear and explicit to the broader community that scope is being expanded, not implicit or having to guess. — billinghurst sDrewth 03:43, 19 April 2021 (UTC)

16:48, 19 April 2021 (UTC)

Creep of complexity and business of disambiguation pages, especially version pages, and other cruft into main namespace

Over time our simple listing disambiguation pages of works onwiki have been creeping to more and more complexity. They are becoming pseudo-stub pages and getting listings off-site, and pages of less relevance, definitely not transcluded works. We are getting into more and more disagreements about the purposes of these pages. And having to have the argument about what is appropriate on these pages is sucking up good time and simply becoming tiring repeated argument. We need to resolve the issue at a higher level so we don't have to have this continuing bickering of the detail.

Templates in use on the pages include

The past couple of months we are now getting people talking about future works and adding redlinks to works that we don't have, may never have. These pages don't rely on future promises, they should be used for works on site now, and possibly where we have had to disambiguate for them, eg. in a journal that we hold the scans.

Links of pertinence

In my opinion, we are not the encyclopaedia, we shouldn't be having discussions about all the versions of a work, or encyclopaedic argument about the differences or their histories in our main namespace. All that belongs at English Wikipedia, or what we have designated our curated spaces and we have author and portal namespaces for those sorts of pages. For many years we have been trying to tidy up the cruft, and keep our main namespace to be for presented work; it was to be our quality space, just for works. I feel we are drifting to cruft, so what type of product are we trying to put into our namespace? Otherwise what is the purpose of the main namespace? Up until now it has not been designated a find aid for things but works we have. — billinghurst sDrewth 06:03, 18 April 2021 (UTC)

I think that a lot of this creep has to do with a few factors. First, the difficulty of adding a text. Users find a scan on IA, but become too challenged by uploading the file to Commons and then decide to an external scan link to preserve their work. Uploading to Commons should be performed by a bot to clear that backlog. We can even gather all the IA ids and ask Fæ to upload them. Second, this site lacks a clear direction in creating a central corpus of key works in English. Instead, users decide to work on whatever strikes their fancy. By contrast, French Wikisource has the Mission 7,500 that provides a set of works to work on every month. This has resulted in French Wikisource having transcribed texts. Because English Wikisource does not have a core set of key texts, there are more version pages with no scans or texts or indexes. Version are a problem of important texts and not of minor texts.
For important texts, version pages can serve a critical function by designating which texts should be on Wikisource. Researching the publishing history of a book takes time and access to certain sources. In general, I believe that Wikisource should only include version of texts that the author contributed to, those that have important scholarly value (critical editions), or with new illustrations. A version page can serve as a space to detail these editions.
For now, the IA does not have scans of every book ever published. Therefore, there is work to be done by future generations. The version pages can serve as key guides for the future. Research today, proofread tomorrow.
As for the presence of small small scan link, they allow the user to see how they can contribute to this project. This is a collective proofreading platform. It makes no sense to hide the proofreading from users. Nor does it make sense to ask an individual user to proofread an entire work by themselves and then post it to mainspace. Transcription projects belong in mainspace.
I would recommend the following changes to Wikisource.
  1. Only allow scan backed new texts unless the user provides a compelling reason why that is not possible.
  2. Restructure the POTM to follow the French model of presenting several texts and focus on building core texts.
  3. Allow for version pages when a text has a complicated publishing history and more than three editions.

Languageseeker (talk) 04:23, 19 April 2021 (UTC)

 Comment This issue is presented as something that has happened "The past couple of months", but the issues have been around as long as Wikisource has. Anyone working on large projects such as the 1911 EB knows that we have had long-standing redlinks in the Main namespace for years. With regard to versions and disambiguation pages, we had a case where the community opted in favor of red linked titles in 2015. Calling all of this "cruft" and appealing to abstract philosophy is not a practical approach. Saying that "it has not been designated a find aid for things but works we have" misses entirely the fact that disambiguation pages are find aids for locating other things; they are not themselves works, but are dynamic and changing entities by their very nature. The same is true of versions pages; these pages will and do change. Hiding works and their progress from listing on the basis of some abstract philosophy will not get the work of Wikisource done. --EncycloPetey (talk) 15:33, 19 April 2021 (UTC)
That is a different story with a different argument.
  • We all have texts that we want done, and our means for having these seen and listed has been through the the Author and Portal namespaces. This selective means of putting additional editions as available is outside how we have been doing them and outside of our existing guidance per Wikisource:Versions.
  • Redlinks that I mentioned were specifically related to versions/disambig, not the general case. On version pages they would appear to be outside of our guidance at Wikisource:Red link guidelines#Main namespace, and not at this time our guidance on the Versions either. Citing EB1911 is interesting though not relevant to the case.
  • To cite the 2015 example as a community consensus should be seen as a misrepresentation that the community undertook a conversation to change our policy or our guidance, and it should not make it incapable of review and change. Or it could be that this is an example of some exceptions to the rules, rather than the new normal.
  • I am aware that matters evolve, though it should be through open conversation and agreement with the expectation that our guidance will do so too. It is also reviewable, which is what I am doing here. However, the basics are that these are a variant disambiguation pages. There is a basic concept behind these types of pages.
If we are moving "FINDING AIDS" into the main namespace? Is it only going to be version pages? What makes them special? Why would they include links to off-site? What is the equity? What is the benefit? How does it work in the holistic sense? Should we start version pages for every work that has multiple editions? How will it work for biblical works? It is a can of worms and once you open it without thinking about the broader consequences rather than your own particular desires. It is why we have the guidance and why we achieve consensus for these changes and their scope. — billinghurst sDrewth 00:25, 20 April 2021 (UTC)
As I said, I don't think that every work needs a version page. However, having a version page can help identify the specific subset of a work that should be on Wikisource. To give an example, we don't want every copy of Hamlet ever printed on this site. There are only a few specific editions that should be in scope. Posting an edition on a version page can be seen as an invitation to comment on the in-scope nature of the work prior to a user sinking a significant amount of work into it. I've seen many mere reprints of 18th century works that I've replaced with links to scans of original editions.
Second, by providing what you term a "finding aid," Wikisource can prevent users from adding editions that this site would not want to host. For example, A Child's History of England (1900) is the only scan backed copy of this text. However, this edition is a mere reprint with no unique illustrations or input from the author. It's a random edition pulled from the IA because the PG text was based on it. It basically has no value. As soon as another copy of this work appears, I would submit it for deletion.
Third, without knowing about other editions, it makes it more difficult to properly name a text leading to future work. For example, image that the first version of Hamlet posted on this site was Q1. Would be call this Hamlet. If so, then in the future, an administrator would need to move it Hamlet (Quarto 1). If we had a version page, we would first call it Hamlet (Quarto 1). Languageseeker (talk) 02:24, 20 April 2021 (UTC)
I think you're too prescriptive on versions. I will not support the deletion of any scan backed version, because it's not the "right" version. Every printed edition is in scope. Certainly, with Hamlet, every different bowdlerization and every different abridgment have an interest to someone. We don't generally proscribe the best of anything; we include what people want to add, provided it has been printed. I encourage people spending their time on good versions, but I don't support proscribing what is the good version; if someone insists on a version of Oliver Twist respelled and mildly abridged for an American audience, that's their right to work on that here.--Prosfilaes (talk) 03:11, 20 April 2021 (UTC)'
For a specific example of where a random edition is important, if you want to know how The Pilgrim's Progress affected Carl Sandburg's writing, you don't want the best edition; you might want the version the American Tract Society published in the 1850s, because that was the copy in his library. Maybe he also read others, but you want to get the ones he had hands on, not the one's influence by Bunyan.--Prosfilaes (talk) 05:29, 20 April 2021 (UTC)
This is not outside how we have been doing it; the Weird Tales pages have been up since 2009. Citing EB1911 is also a general example, quite relevant to your last statement. As for your last statement, I think it's pretty clear we who are advocating this want versions pages to point to all versions of works that users are interested in working on. There's some disagreement about details, but it seems that any place we're pointing to multiple works, we should have links to works that people are interested in doing. As for me, we shouldn't be making author pages, links on author pages, or version pages if nobody is realistically going to work on them; Author:Isaac Asimov should be trimmed to the bones. Allowing any authors to have an author page with all their works is a can of worms; what if someone imports the entire LoC authority list? In practice, however, it's not much of an issue.
Recently, you begged off writing guidelines. That doesn't seem to be anyone doing that job here, but it's hard to make official changes if no one is willing to clearly document them, and it's hard to debate when there's unwritten guidelines that are leaned on if and only if it's convienent.--Prosfilaes (talk) 03:02, 20 April 2021 (UTC)
Is this a helpful page Panchatantra? It that a versions or is that an article? Which bit are we wanting for clear navigation at enWS. — billinghurst sDrewth 10:44, 20 April 2021 (UTC)
@Prosfilaes: I think that I am entitled to cry off some things around here, and I never said that they were unimportant. I think that I do enough in a range of areas where no one else is contributing, so please excuse me from one of the things which I find difficult. — billinghurst sDrewth 10:48, 20 April 2021 (UTC)
Meh. That's certainly not how I would write Panchatantra, I would definitely remove all the modern works and cut down on the header, but it's certainly useful to have other editions that aren't yet on Wikisource listed there.--Prosfilaes (talk) 14:33, 20 April 2021 (UTC)

Line numbering coming soon to all wikis

-- Johanna Strodt (WMDE) 15:08, 12 April 2021 (UTC)

Questions: (for those who understand the techno-speak) Will line numbering apply in all namespaces? Will it appear in the proofreading window of the Page namespace? And can it be deactivated by an editor if they find it distracting? --EncycloPetey (talk) 23:10, 12 April 2021 (UTC)
It was said "... you can enable line numbering ...". I would suggest that you ask over on the extension talk page whether it was tested ad will function in the Page: namespace. — billinghurst sDrewth 06:07, 13 April 2021 (UTC)
@EncycloPetey, I think this only appears if/when you have the colored syntax highlighting turned on. Do you normally use that? If not, then you shouldn't see it. Whatamidoing (WMF) (talk) 05:30, 21 April 2021 (UTC)

Disabled music scores

Nine months have already passed since the music scores produced by the score extension were disabled and looking at task T257066 it seems that nobody really bothers about contributors’ complaints. At w:Help talk:Score#Can we please re-enable display of the score images? it was suggested to replace vorbis="1" with %vorbis="1"% or %sound="1"%, which at least enables to see the image of the score. I tried it and it seems it works this way. What do others think about such a temporary workaround (especially as "temporary" can mean "really long" in Wikimedia environment). --Jan Kameníček (talk) 13:27, 19 April 2021 (UTC)

@Jan.Kamenicek: I ran a bot job to do that replacement in January (IIRC ~450 instances). I don't think any more have appeared since, because only cached scores can be shown until "they" fix it (i.e. the "fix" will re-enable existing scores, but not allow the creation of new ones, AFAIK). Inductiveloadtalk/contribs 13:35, 19 April 2021 (UTC)
@Inductiveload: I am afraid there are some which the bot probably did not catch, like Page:The music of Bohemia.djvu/32. --Jan Kameníček (talk) 13:39, 19 April 2021 (UTC)
Huh, quite right. Perhaps because it's not transcluded it didn't get caught in the dragnet. Please hold. ^_^ Inductiveloadtalk/contribs 13:45, 19 April 2021 (UTC)
@Jan.Kamenicek: OK, I hit another 40-ish. Let me know if you see any more. Inductiveloadtalk/contribs 14:19, 19 April 2021 (UTC)
It was transcluded, but now it is OK (i.e. at least the picture is visible), thanks! --Jan Kameníček (talk) 14:58, 19 April 2021 (UTC)
Huh, so it is. So Now I have another bug in my random pile of JS junk to figure out! Anyway, thanks for the heads-up. Inductiveloadtalk/contribs 15:12, 19 April 2021 (UTC)

 Comment The way to get resources for such a fix is through the annual/biannual priorities lists, or finding a hacker. SCORE only came about after a long time as one of our participants, GrafZahl, showed a particular interest and even that took ages due to security issues. This will be a case of developing a case for a fix and lobbying. Sitting, waiting, hoping will not bring the solution. — billinghurst sDrewth 22:44, 19 April 2021 (UTC)

Contributors’ job is to contribute, tech team’s job is to provide technical support, WMF’s job is to provide funding. Any project can work well only when everybody does their job well. It is quite enough when contributors report bugs, why should they lobby for the bugs to be fixed, why should they lobby for something that should be just natural? What else should they do? Sorry, but I am definitely not "sitting". I want to be adding content above all, but also try to do some little maintenance too, report bugs, and in better times I also take part in or organize various wikiactivites in the real world. I do not want to waste more of my wikitime on neverending pleading the tech team for support. System where volunteers have to lobby for support can result only in losing them (or at least in not gaining them). The bug was properly reported in the phabricator which is the place determined for such reports. Now it is the tech team’s turn to fix it (and it has been their turn for 9 months already). --Jan Kameníček (talk) 23:51, 19 April 2021 (UTC)
I hear you, I am not saying this is the ideal, I am mentioning my view of the reality. Reality != perfect world. If we had a perfect world, I would not be manually adding information to WD, writing spam filters, deleting spam, undoing edits and telling people to read the guidance, ... — billinghurst sDrewth 00:35, 20 April 2021 (UTC)
yeah, scores was always unsupported, and a miracle it worked at all. one of these days a musical hacker will reverse engineer lilypond in open source, and we will have another mission to transcribe all that public domain sheet music. but until then, it will remain locked away behind paywalls and hard to find archive scans. Slowking4Farmbrough's revenge 00:19, 22 April 2021 (UTC)

Index transclusion status now in the Index page edit form

As many of you may have noticed in your watchlists, Index page transclusion status and validation dates are no longer recorded in a template, but are a proper part of the Index page edit form. All existing uses of the templates have been migrated over. The usage remains unchanged - transclusion status refers to how much of the work is transcluded, and is somewhat independent of the proofread status - it is possible for a work to be fully transcluded but not validated.

There was a brief period where some Indexes had multiple categories while the usage was changed over. These should naturally resolve as the categories update, or you can force it by purging or editing the page. As always, let me know if something is looking broken even after a purge. Inductiveloadtalk/contribs 13:00, 22 April 2021 (UTC)

Can pickwick19_20 0037a and pickwick19_20 0037b be inserted from [here] into File:Posthumous papers of the Pickwick Club (Serial Volume 19).pdf Languageseeker (talk) 13:42, 22 April 2021 (UTC)

A book is currently to be used on a contest

Dear Wikisource fellows. I would like to inform you that Index:Scented isles and coral gardens- Torres Straits, German New Guinea and the Dutch East Indies, by C.D. Mackellar, 1912.pdf is currently used on a proofreading contest. You will expect new users editing the book. I will monitor the progress and provide feedback to participants if needed. In case I missed something, kindly let me know. Much appreciated. ··· 🌸 Rachmat04 · 02:49, 22 April 2021 (UTC)

@Rachmat04: Sound exciting. Thanks for organizing this. Languageseeker (talk) 03:23, 22 April 2021 (UTC)

Suggested Values

Timur Vorkul (WMDE) 14:08, 22 April 2021 (UTC)

Formatting in header template

  1. I propose (again) that the title of the whole work be italicised.
  2. I propose (again) that the author of the work—or their 'contributed section'—not be italicised.
  • <emphasis>Supprot</emphasis> I also tink that, cheers to nom. CYGNIS INSIGNIS 12:44, 17 April 2021 (UTC)
  •  Oppose Italics are usually fine for native speakers, but having worked on a multi-lingual dictionary, I can attest that italicizing text makes it harder to read for many non-native readers, especially those whose native writing system is not based on the Latin alphabet. There are things we take for grated about italicized text that do not apply in languages like Russian or Japanese. Personally, I do not think the name of the author or translator should be italicized either. --EncycloPetey (talk) 02:44, 18 April 2021 (UTC)
    that's one support for item 2, which is what constantly reminds about item 1. It's an interesting point, interfering with access. CYGNIS INSIGNIS 14:26, 23 April 2021 (UTC)

Accessibility on this site

I have recently added some accessibility features to tables such as table captions and was encouraged to post about it. These are required by WCAG best practices and provide a very high impact on making the site useful for the blind with very low effort. Note that I have also ported over {{sronly}} so these won't display, so there are no concerns about styling. Is there any good reason to not include table captions on data tables? Should we implement best practices as decided by Web authorities and accessibility advocates or is there some reason why we shouldn't? —Justin (koavf)TCM 02:12, 12 April 2021 (UTC)

I asked you to support your assertion that table captions are required, and you failed to do so. I have also pointed out that the place where you applied them is not a data table, but is applied purely for layout and is temporary for the benefit of proofreaders. The so-called data are copyright statuses for the works we cannot yet import and links to Index pages of works that we have. As works on the list are validated, the so-called "data" is removed and will eventually disappear altogether, hopefully within the next two years. I also asked several times for you to start a discussion, and am glad to see that you have now done so. --EncycloPetey (talk) 03:02, 12 April 2021 (UTC)
Excuse me, please they are data tables. What is it you think that a data table is? I have provided citations for using captions as best practice for data tables. Please do not keep on asserting that data tables are for layout when they are not. I also don't think that the implication that information should only be accessible if it's going to last an indefinite amount of time withstands even the mildest scrutiny. Just because the Sun is going to swallow the Earth in a few billion years, that doesn't justify us not using best accessibility practices. My point is relevant to all data tables, not only the ones that you think someone will remove at some point. Rather than make this discussion about a single table that you keep on misidentifying as not being a data table, I am asking a broad question that refers to a site-wide culture of using best practices for persons with disabilities. Are you in favor of doing that or are you opposed to it?Justin (koavf)TCM 03:33, 12 April 2021 (UTC)
Let’s try to keep this conversation civil. Accessibility is an important issue and should be a priority. Maybe a detailed proposal would be a good start. Proofreading is an important task that recognizes differences in ability and helps to make texts more accessible. However, it depends on certain visual abilities. There’s an need for a discussion of what areas need accessibility features and what kind. Languageseeker (talk) 05:41, 12 April 2021 (UTC)
I propose that data tables have captions. —Justin (koavf)TCM 06:05, 12 April 2021 (UTC)

U Comment Use of term "data tables" is ambiguous, please give specific examples. Please consider using WS:Sandbox for some permanent exaples. Thanks. — billinghurst sDrewth 14:01, 12 April 2021 (UTC)

"Data tables" are distinguished from "layout tables" by the former showing the sorts of things that actually belong in a table (e.g. Nations in a certain Olympics and how many medals they won or vendors who owe your company an invoice per month) and the latter is the misuse of a table to provide the layout of elements on a webpage instead of using CSS. Here is a data table of school lunches. Here is a fictional budget and items sold in a data table. All data tables should have captions and semantics for columns and rows. —Justin (koavf)TCM 21:15, 12 April 2021 (UTC)
We reproduce what was published. Are you advocating that we generate captions and add semantics that are not in the published works? — billinghurst sDrewth 06:13, 13 April 2021 (UTC)
To be fair, correct semantics like scope=row/column on header cells actually probably should be used wherever appropriate. Not all tables have a caption in the original, but technically speaking, when they do, we also should be using the |+ syntax rather than a direct styled thing like {{center|Caption}} (per-work CSS can help to target those captions for auto-styling anyway). Perhaps when they don't, a {{sronly}}-type affair could be the table equivalent of an image alt attribute (something else we should really be doing for accessibility. We should also (technically) use {{lang}} whenever encountering non-English text.
For fully-correct semantic table markup, we are a bit hamstrung by the ongoing failure of Mediawiki to provide <col>/<colgroup> elements, as well as the "direct" nature of our data (semantic markup is easier if you're a site generating a table from a database of numbers - you can just change the server to generate the table as such).
I think while there is a lot of work that can be done to improve accessibility, if we're going to get anywhere it needs to be a more structured effort, perhaps in concert with https://phabricator.wikimedia.org/tag/accessibility/ (maybe a column in https://phabricator.wikimedia.org/tag/wikisource/?) than just ad-hoc threads at WS:S with no clear end state or process to get there.
The biggest problem I have with accessibility is that it is, in general, extremely hard for any "abled" editor to know if they have just produced something accessible, or almost-entirely inaccessible - it looks the same to them. Screen-reader software is rare, finicky about platforms and can be expensive. I have (after non-inconsiderable effort) managed to get Orca working, but it's very laborious to check what is written comes out OK. It would be good if someone knew of a service that could directly translate hunks of HTML into "screenreader text" to see how what we currently have comes out. Perhaps some kind of interactive tool could be built around it. User:Koavf: any ideas? Also, if you are serious about improving accessibility, writing documentation about what is and is not good practice will help others (including myself) understand how we can improve. Inductiveloadtalk/contribs 13:10, 13 April 2021 (UTC)
we did do some meetups with DCPL braille library with editing on screen readers (adding alt text to images), and at Gallaudet, but it would take some grant resources to do some UX. but in the mean time, maybe an accessibility user group on meta would be a start. Slowking4Farmbrough's revenge 22:08, 13 April 2021 (UTC)
Barring Foundation-wide standards and lacking any particular local ones, would you be in favor of adopting en.wp's as a rule of thumb? —Justin (koavf)TCM 02:59, 14 April 2021 (UTC)
I have no idea of enWPs, and anyway they generally are free-creating tables rather than replicating a work. I also know that (some of) our tables can be very busy and complex and the idea of adding further complications and complexity does not enchant me. I would want something easy and reproducible that does not pollute the work, can be done in wikitext, and not make me have to work at undoing any system imposed formatting. So at this stage I see that we are needing some developed guidance and priority in what can be done to improve the readability of a work. I would think that for a while we would look to voluntary compliance and encourage the WMF to consider it s part of their development. — billinghurst sDrewth 14:35, 14 April 2021 (UTC)
@Billinghurst: All you do is add "+Caption" at the beginning. It is done in wikitext and couldn't be much easier. —Justin (koavf)TCM 12:13, 24 April 2021 (UTC)

Firefox extension: TitleCase

Transform strings into Title Case, Proper Case, Start Case, Camel Case, Upper Case, and Lower Case. You have 2 ways to change text. Either by right clicking on the field and changing the case or by highlighting and only changing what you highlighted.

TitleCase

Just tripped over this Firefox extension that allows case manipulation of text through block, right click. I know that I regularly manipulate case when transcribing, though not as easily as this with the forms that I have. — billinghurst sDrewth 14:45, 20 April 2021 (UTC)

I've been using this for a while; its a great time-saver. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:11, 24 April 2021 (UTC)

IA tool allowing duplicates

Yesterday, the IA tools allowed me to upload a number of duplicates leading to a large waste of time concluding with the duplicate indexes being deleted. Anyone else having the same issue or know what is going on? Languageseeker (talk) 18:25, 25 April 2021 (UTC)

Relatively recently (but I don't remember exactly when) some of us were complaining that the IA Upload tool wasn't permitting us to upload a DjVu file when Faebot had already uploaded a pdf of the same. It looks like this has now been fixed. Beeswaxcandle (talk) 18:47, 25 April 2021 (UTC)
Could it at least warn about duplicates? I don't want to create redundant indexes everywhere wasting everyone's time. Languageseeker (talk) 19:16, 25 April 2021 (UTC)
Often, you need to manually check for duplicates first. Sometimes the IA copy has to be edited to repair pages, remove duplicate pages, or strip Google notices. The IA tool won't always catch those as "duplicates". --EncycloPetey (talk) 19:18, 25 April 2021 (UTC)
The phabricator task T269518 which was closed as resolved short time ago should allow duplicates if they are of different formats (after a warning is launched) but should not allow exact duplicates, see the comment of @Samwilson: from 9 April there. So if some duplicates were uploaded in the way described above and without warning, it is a bug. --Jan Kameníček (talk) 21:52, 25 April 2021 (UTC)
They weren't exact duplicates—.djvu vs .pdf, which is correct behaviour. Also, some were a different printing of the same edition, which no tool is ever going to pick up. Beeswaxcandle (talk) 22:57, 25 April 2021 (UTC)
  •  Comment The responsibility to check Commons for the existence of a work does not lie with the IA-uploader tool, it lies with the uploader. There are many means that copies of a work can occur at Commons. There can be multiple editions, there can be copies of the work from multiple sources, so there needs to a be more mature approach than relying on the upload tool, especially as the PDFs were uploaded by a person to cover their needs, though they are generally inferior text layers than the DjVus. There is not even the guarantee that the best copy of the work has been uploaded; nor that the copy uploaded is the best to proofread. If you are using a PDF layer, you are often making things harder for your proofreading and more likely to get errors in the produced work.

    To more fully comment on an identified issue it is more helpful to have some examples and processes followed rather than just react to general complaint. — billinghurst sDrewth 00:14, 26 April 2021 (UTC)

Looking for (ancient) Armenian speakers to help transcribe a text on Wikisource

Recently the French Wikisource community started to work on the transcription of Grammaire de Denys de Thrace. So far you might wonder what the relation with Armenian speakers? Have a look to the full title instead:

GRAMMAIRE de DENIS DE THRACE, tirée de deux manuscrits arméniens de la bibliothèque du roi. Publiée en Grec, en Arménien et en Français, et précédée de considérations générales sur la formation progressive de la Science glossologique chez les anciens, et de quelques détails historiques sur Denis, sur son ouvrage et sur ses commentateurs ; PAR M. CIRBIED, membre de la société royale des antiquaires de france, professeur d’arménien à la bibliothèque du roi. extrait des mémoires de la dite société.

That's a bombastic title if you want my opinion. 😂 However it gives a good overview of what it contains. Especially, it make transparent that the book includes a huge load of material in Armenian. So the text is basically translation and comments of a text in Armenian, itself a translation of an ancient Greek text - also provided in the book. As the French Wikisource community has limited skills on Armenian, it makes the work of transcription far more complicated. Thus this call.

We are looking for people able to make transcriptions of text in Armenian alphabet. It would be even better if we could find people with ancient Armenian, since it is expectable that the text will likely contain oddity of the past. And if we can find someone who can moreover speak French or English to interact in discussions with the community, it would be perfect. Note that the main requirement is simply being able to read Armenian alphabet and to write it in simple Unicode transcription. In particular, there is no expectation that the potential helpers would make any work on the formatting that often require to deal with locale templates.

Please be bold in spreading the word wherever you think that could reach potentially interested Armenian speakers.

With all my warm love, Psychoslave (talk) 07:31, 26 April 2021 (UTC)

21:24, 26 April 2021 (UTC)

Call for Nomination of Texts

There's good signs that English Wikisource might try the French Wikisource approach of having a set of texts for the community to work on with new texts added each months and incomplete texts rotated out after three months. Therefore, I'm calling upon the community to submit nominations for texts that they would like to have on this site. These texts should be important and have a broad appeal. Here is a list of some texts that I can think of and that others have suggested in various places.

  1. Masnavi I Ma'navi (transcription project) (Wikisource Islam)
  2. The Mahabharat (external scan) (Wikisource India)
  3. The Portrait of Dorian Grey (start transcription) (Wikisource LGBTQ+)
  4. Paradise Lost (transcription project) (Seventeenth Century)
  5. Clarissa, or the history of a young lady (transcription project) (Eighteenth Century)
  6. Manhattan Transfer (transcription project) Etsu Inagaki Sugimoto, A Daughter of the Samurai (transcription project) (Celebrating the Public Domain)
  7. The Wanderer (Fanny Burney) (transcription project) (Women Writers)
  8. The new Negro: an interpretation (transcription project) (Black Writers)
  9. Uncle Tom's Cabin (1851 First Edition) (transcription project) (Slavery in the USA)
  10. Librarian's Copyright Companion (Legal Texts)
  11. Anna Karénin (start transcription) Russian Fiction
  12. Enquiry into plants (transcription project) Classics of Science
  13. The sidereal messenger of Galileo Galilei (transcription project) Classics of Science
  14. Remarks on prisons and prison discipline in the United States (external scan) Reformists
  15. Commentaries on the Laws of England (transcription project) (Legal Texts)
  16. Commentaries on the Constitution of the United States (transcription project) (Legal Texts)

Languageseeker (talk) 00:59, 24 April 2021 (UTC)

Manhattan Transfer is up for an upcoming PotM. It would be inappropriate to poach titles from that project. --EncycloPetey (talk) 01:10, 24 April 2021 (UTC)
Fair enough, I replaced it with a different work. Languageseeker (talk) 01:39, 24 April 2021 (UTC)
Define "important". All texts published through a non-vanity press are/were important to someone—otherwise they would not have been published. To go a step further, all 12,456 works in Category:Index Not-Proofread were important enough for someone to create an Index here for them. But you can't put all of those into your proposed rotation. Beeswaxcandle (talk) 01:51, 24 April 2021 (UTC)
The definition of important is left to the individual user. To prevent an overwhelming number of nominations, I’m asking each user not to nominate more than five texts.
For now, I’m planning on having 15 texts to start with. I know that important is a tricky and nebulous term and I know that I don’t know every text. That is why I’m asking the community for nomination. I’m looking for a diversity of texts. What text from your community or interest would you like in Wikisource? Which text do you think would others find interesting? Languageseeker (talk) 02:00, 24 April 2021 (UTC)
There are definitely some works that are more important than others. Quite a few historically-notable texts only have Project Gutenberg text needing match-and-split (if I had to suggest one: Uncle Tom's Cabin) As for one needing proofreading, there's the recently-declared-C.C. Librarian's Copyright Companion. Mcrsftdog (talk) 02:04, 24 April 2021 (UTC)
@Mcrsftdog: Added to the list. Thanks. Languageseeker (talk) 02:56, 24 April 2021 (UTC)
The Harvard Classics contains many significant works, with Index pages set up already. Among them are Don Quixote, Two Years Before the Mast, and Anna Karenina. We are very weak in world literature. In science, Galileo's The Sidereal (or Starry) Messenger is a short but seminal work which is not scan-backed, and Theophrastus' On the History of Plants is a work of monumental importance that we don't have at all. --EncycloPetey (talk) 04:13, 24 April 2021 (UTC)
I think that these are all good texts to have. My one concern is that Anna Karenina is a reprint. What do you think about the Leo Wiener translation (external scan). It also has the advantage of being part of a set of complete works. Languageseeker (talk) 04:41, 24 April 2021 (UTC)
I am not experienced enough with the translations to recommend one or another. I will, however make one more recommendation: any works by Dorothea Dix who pushed for reform in the care of the institutionalized. We have zero works from her, which is tragic. --EncycloPetey (talk) 04:49, 24 April 2021 (UTC)
I would like to suggest Commentaries on the Laws of England and Commentaries on the Constitution of the United States. We don't have classic jurisprudence at WS at all. Ratte (talk) 14:01, 24 April 2021 (UTC)
@Ratte: For the Blackstone Commentaries, do you have any strong feeling about featuring the original edition vs the 12th edition? Languageseeker (talk) 15:08, 24 April 2021 (UTC)
You mean 3rd edition of Book I? we don't have 12th edition here. When Blackstone has published the first edition of Book III in 1768, he decided to republish other books together with Book III. It's the one-time publication of all the four books, which have cross references to each other's pages. Ratte (talk) 15:32, 24 April 2021 (UTC)
@Ratte: Wikipedia states that the 5th and 12th edition are considered the best editions and HathiTrust has the 12th edition available. I'm wondering if it' better to feature the 3rd edition to finish it or go on to the 12th edition. Languageseeker (talk) 17:26, 24 April 2021 (UTC)
Now it is clear. Yes, I agree, it is better to go on to the 12th edition. Can you upload a scan of it? Ratte (talk) 17:47, 24 April 2021 (UTC)
My main reaction is looking at including humanities/philosophy/theology in the next set of texts to pair with the Classics of Science (probably more accurately STEM as we may want math as well). I like the idea of having some categories highlighting geographic (India / Islam / Russian), temporal (18th, 17th), author (LGTBQ+, Women, Black) diversity and then a set around professional (legal), STEM, and the humanities as subject diversity as a breakdown. Then within some of those broad groups we can then pick works (e.g. pick a century --> nominate an important work missing from that time period, pick an area --> pick a work, etc.). MarkLSteadman (talk) 22:48, 24 April 2021 (UTC)
@MarkLSteadman: Glad you like it. Is there any text from humanities/philosophy/theology that you would like to add to the list? Languageseeker (talk) 22:55, 24 April 2021 (UTC)
In philosophy we have no works by Moses Mendelssohn or Johann Joachim Winckelmann. We also don't have any works by Humboldt. Friedrich List was the founder of the historical school of economics and we have none of his. My background is in physics actually so I would defer to any humanists for gaps here... MarkLSteadman (talk) 23:33, 24 April 2021 (UTC)


@EncycloPetey, @Beeswaxcandle, @TE(æ)A,ea., @Mcrsftdog, @Ratte, @MarkLSteadman: I’ve begun to build the page for the Monthly Challenge for May and I would love some feedback. Still adding texts. See user:Languageseeker/MC Languageseeker (talk) 15:04, 25 April 2021 (UTC)

  • Languageseeker: It looks good, more or less, to me, from an aesthetic point. I would recommend putting a border around the flags, because the flags of Japan and England blend in with the white background. Also, the Librarian's copyright companion title stretches over two lines, which is unappealing; I think the title should be shortened (in the display) so that it remains on one line. As for the content, I agree (generally) with the idea of some longer-term works and some shorter-term works, but I believe more care should be taken as to the selection of categories. For new users, especially, some more fiction and less technical or scientific writing would be appreciated. TE(æ)A,ea. (talk) 21:08, 25 April 2021 (UTC)
Maybe an indicator of beginner-friendly or beginner-beware? Also, it is not clear why Islam / India have `Wikisource` before them while the other's don't. MarkLSteadman (talk) 22:00, 25 April 2021 (UTC)
@TE(æ)A,ea.: Thank you for your feedback. I'll add the border for the flags to my to-do list. Are there any works of fiction that you think should be featured? I'm especially interested in key texts that have been proofread and need to be validated. Over the next few days, I'm going to add a few more works of fiction.
@MarkLSteadman: I'm trying to make sure that the majority of the texts will not be too difficult. There will also be a separate talk page so that users can ask questions. I see this project as a means of training new users so I'm trying to strike a balance between being easy material for users to learn the skills and also giving users a few challenges. Let me know if any text strikes you as being to difficult. Languageseeker (talk) 00:30, 26 April 2021 (UTC)
I thought one of the differences here was that it was supposed to allow collaboration on works with more difficult formatting (e.g. Plays, tables (Heller), or footnotes), characters (e.g. s as in the 17th Century works featured, diacritics, or the Greek characters in the Loeb), or content (e.g. images or equations). Not for all the works, but for some of these categories. Currently I can at least some of these challenges with the Richardson, Milton, Shakespeare, Theophrastus, and Heller selections. My suggestion was maybe thinking about indicating these are good ones to get started, while these might be a little more challenging? Maybe something to think about for later? Or looking to provide a little more guidance around some of those issues in the discussion page? MarkLSteadman (talk) 00:48, 26 April 2021 (UTC)

BTW, if anyone wants to add a text to the page the format is {{MC-Cover|Index|Cover Page number|Title to Display|Publication Date|Author|Subject (the green text)|N|Country for the Flag}}

  • page = replaces the author with the specific page
  • cover = use a specific image for the cover
  • author = Text to display in the Author Field


@EncycloPetey, @Beeswaxcandle, @TE(æ)A,ea., @Mcrsftdog, @Ratte, @MarkLSteadman, @Billinghurst:@Xover, @Inductiveload: The near final version of the Monthly Challenge page is done Monthly Challenge. Any and all feedback welcome as always. Languageseeker (talk) 22:40, 26 April 2021 (UTC)

  • By the way, pinging (by {{re}} or otherwise) only works if you sign your comment. This page looks good, if we are comparing it to the main page of WS:PotM; however, if it will be placed on the front page, there needs to be some small part that can be transcluded there. The Community collaboration is represented on Template:Collaboration, which you can use as reference. A short blurb, for lack of a better word, would be readily accepted on the front page, once you finished setting up this project. TE(æ)A,ea. (talk) 21:45, 26 April 2021 (UTC)
I think that the consensus was that this will remain separate from both PotM and Current Collaboration for now. To get this fully running will also require the importation of the Bookworm Bot from French Wikisource which might take a while because the user the bot has largely left as far as I can tell. The Monthly Challenge is designed to help introduce new users and to improve the core collection of Wikisource. I need to translate/write-up the FAQ from the French Wikisource. Most of my time was taken up in creating the template for adding the texts to the page. I'm trying heavily to make it visually appealing.
The only consensus is that this should not replace PotM. The possibility of it coming in as a community collaboration has not been discussed. In the 12 years I've been contributing here, there have been a total of seven (7) collaborations in that box on the Mainpage. Billinghurst's suggestion is worthy of consideration. You must also consider how this proposed project will work on the Mainpage. There is limited space in the "Current Collaborations" box, which is where it needs to appear. Beeswaxcandle (talk) 01:15, 26 April 2021 (UTC)
@Beeswaxcandle: My comment is that if setup it can become the current community project, which then becomes a very simple decision of the community to migrate from the existing project to the next. It also then becomes a future decision for the community when we migrate to the next. The community has already had the discussion and the consensus for priority and flag projects, so the angst about having the project doesn't need to be had, we want these flag projects, WHEN someone is willing to run them. — billinghurst sDrewth 01:30, 26 April 2021 (UTC)
This idea is based on the French Wikisource Mission 7500 and the box should ideally be in the same location: Upper Right hand corner of the Front Page above the New Text section of a similar size to the Explore Wikisource box. It should be very visible so that new users can see it right away. The French Wikisource Mission 7500 project is yielding around 8,000 to 12,500 pages proofread and validated a month. So, this seems like a very worthy model to attempt to replicate. Languageseeker (talk) 01:46, 26 April 2021 (UTC)
Meh, you don't ask a lot. It is a community project and belongs in the community project space. Get your project up and sorted, with its requisite documentation and valid pages and system, and prove the concept. If we then want to have a conversation about the main page, and the various positions, then that is entirely a different conversation. — billinghurst sDrewth 04:08, 26 April 2021 (UTC)
I’m asking to give the experiment a fair chance. For me, the placement matters because it makes it extremely visible on the front page. I don’t want to get buried. Even if the Current Collaboration section is replaced with this Monthly Challenge, it would still require a reconfiguration of the template controlling that section. I’m not entirely sure how to do that. I’m happy to do the translation of the French if that is necessary. If this requires a community discussion, I’m happy to do that if you point the way to where. The Bookwormbot also requires the granting of a bot flag and importation that an administrator needs to do. However, it can be plugged into the current template easily as just another field. Languageseeker (talk) 05:13, 26 April 2021 (UTC)
Actually what you are asking for is for more than a fair trial. You are asking for more focus than any other project has had. You are asking for your project to have more focus than our completed works. I don't rate your proposal higher than completed works, especially as the completed works will appear in the completed list. You are suggesting proposed works are ahead of completed content. Anyway, that is getting way ahead of yourself as you don't even have a working project yet. Do the leg work, produce your system. When you have a project to which we can point people then we can progress. — billinghurst sDrewth 08:22, 26 April 2021 (UTC)
@Languageseeker: Do not remove line breaks, but please remove any hyphens when they are used to break words across lines I think this is a difference in en/fr process that shouldn't be imported. enWS common practice (rightly or wrongly) is to remove the line breaks. It might be better just to link to WS:MOS and Help:Formatting conventions than to call out a single rule? Inductiveloadtalk/contribs 22:03, 26 April 2021 (UTC)


@Inductiveload: I thought that it make more sense not to remove line breaks because it’s much harder to spot and correct mistakes when the line breaks are removed. However, if that’s a hard rule, I’m happy to remove that comment and start a separate discussion about whether removing line breaks makes sense.Languageseeker (talk) 22:40, 26 April 2021 (UTC)
I mean, I get why people leave them in (and we say to take them out, AFAIK for borked interactions between Mediawiki's almost indecent love of P-tags and our templates without consistent newlines in DIVs). But it doesn't make sense to me to have separate style guidelines for MC works. I'd say, just point at the existing guidelines and say "do that". When changing things, generally, change one thing at a time. Inductiveloadtalk/contribs 22:45, 26 April 2021 (UTC)
@Inductiveload: Ok. Removed the offending line. Didn't mean to break any major rules. Languageseeker (talk) 23:00, 26 April 2021 (UTC)
You should not be differentiating from any of the guidance of the style guide. It makes things difficult if you start to have different rule sets. — billinghurst sDrewth 12:03, 27 April 2021 (UTC)
RE: Validation I see you're adding some texts to be validated. This will require some oversight, as we frequently have new editors who do not properly understand what "Validation" actually entails. Some think it's a quick check, without actually comparing against the scan copy. They therefore rush through to get the work to get it "done". Some new editors end up using spell-check, and also do not compare against the scan of the original. Any process that advocates (or seems to advocate) for speed or for completion without caution and training. --EncycloPetey (talk) 02:04, 27 April 2021 (UTC)
@EncycloPetey: I agree that validation can be a tricky thing to master, but I also think that it's important to train new users. To help address, your concern, I added a note about the goal of validation and a link to the validation help page. Languageseeker (talk) 02:49, 27 April 2021 (UTC)
We also do have a Wikisource:Validation of the Month, but it is not advertised on the <Main page. --EncycloPetey (talk) 04:14, 27 April 2021 (UTC)
@Languageseeker: is there a particular reason Mathnawí is using volume 2, even though volume 1 is virtually empty: Index:The Mesnevī (Volume 2).pdf?
Also, I think it should be a "thing" that the index pages have to be fixed up (i.e. status is "to be proofread") before they can be entered for MC. Inductiveloadtalk/contribs 11:14, 27 April 2021 (UTC)
@Inductiveload: Volume 1 of Mathnawí appears to try to establish the definitive Persian text and so is mostly in Persian as far as I could tell. To your second point, that makes sense. Languageseeker (talk) 13:23, 27 April 2021 (UTC)
@Languageseeker: Ah right, then that makes sense! Inductiveloadtalk/contribs 13:34, 27 April 2021 (UTC)
I was working on proofreading the 20 pages of Introduction and then planning on blanking the Persian part for exactly that reason to prevent exactly that confusion and get into proofread status quickly. MarkLSteadman (talk) 13:54, 27 April 2021 (UTC)

Mass move/rename of pages

We have a simple and effective SQL tool for mass deletions, I was wondering if we also have a mass move/rename script? — Ineuw (talk) 15:04, 27 April 2021 (UTC)

IP Masking Engagement

Hello Wikisource community, this is about IP Masking engagement which the Anti-Harassment Tools team is carrying out.

The point of the engagement is to understand how the project will impact editors. Also, we want to know which other tools you will need to be able to effectively govern the projects in absence of IPs.

Please read more on the IP Masking project here.

Please add your comments on the talk page.

Best regards,
STei (WMF) (talk) 12:43, 28 April 2021 (UTC)

Diskussion:Projekte

I just discovered this at de.wikisource,

Rules For New Projects (English Translation)

Each new project with an extent of more than 50 pages must meet the following points:[1]

  • 1. The script meets our requirements. See text base.
  • 2. Scans must be uploaded to Commons and the quality of the scan has to be good enough for proofreading.
  • 3. To meet the requirement of the 4-eyes principle the point a) or b) must be fulfilled:
    • a) Before or while the work on the project a quid pro quo in an equal extent is expected. (e. g. Proofreading)
      or
    • b) The project has found enough backers to be finished in a comprehensible timespan. To search for helpers this page can be used.
  • 4. To be clear, every project over 50 pages must be announced here, before start. Before start means: No index nor articles should be created before approving of the project. Is there no concern in the span of ten days the project can start.[2]

It is strongly recommended to also announce little projects with big parts of non latin letters, like greek, hebrew or handwritten scripts.

Also look at:


Some interesting sentiments are expressed, although it is presumably a reaction as policy by community members to abandoned projects. I don't see a concern where this remains in 'work- or proof-reading space' (the indices and their pages) and it is done with restraint and forethought, but obviously there are many practices here that would not meet the same degree of explicit or tacit approval at that sister's community. CYGNIS INSIGNIS 06:15, 26 April 2021 (UTC)

References

  1. beschlossen im März 2010 (Permalink zum SKR)
  2. Siehe Diskussion Juli 2015
The German Wikisource has decided to do many things differently from the other Wikisources. For one thing, they never adopted the idea of an Author namespace. Their Wikipedia has a number of very different approaches from everyone else as well, such as permanent parallel duplication of categories. It does mean that they often attempt approaches that no one else has tried, because they are not simply doing what everyone else is doing. --EncycloPetey (talk) 17:37, 26 April 2021 (UTC)
I think having such a policy fairly obviously has the exact outcomes you would expect: extremely low proofreading rates (~20k proofread or validated pages/year, vs 230k at enWS and 400k at frWS) as well as low participation (125 vs 428 and 253 active users). A less expected outcome (for me at least) are that the overall deWS proofread:validated ratio is still only ~1:1. Whereas it's "worse" (roughly 3:1 at both enWS and frWS), I'd have expected much better, since it should trend to 0. Also the very low productivity compared to frWS, where the pages/active user is ~10 times higher (enWS is "only" 4 times higher). For more fun stats: https://phetools.toolforge.org/statistics.php
While I imagine the overall quality in mainspace is much better, at what cost? And could we not achieve the same outcome through being stricter on pre-emptive transclusion of unfinished works? For example, adding a date to {{incomplete}} and having them flag up after n months?
Also, I disagree with the underlying implication that a proofread-but-not-validated text is somehow worse than no text at all. Inductiveloadtalk/contribs 18:07, 26 April 2021 (UTC)
With our Translation namespace it was expected that users would be using the Page: ns to do their translations, and this has not particularly been enforced, especially as we move over number of old translations. dWS ddidn't/doesn't use ProofreadPage so they can have a different level of approach/tolerance to translations. If we had them in the PrP environment and not naked in Translation: ns, then we would have no ugliness there at all, and accordingly infinite patience. — billinghurst sDrewth 11:50, 27 April 2021 (UTC)
I think "(English Translation)" just means it is an English translation of the deWS rules for all works, not that it only applies to translations. Particularly, since deWS requires permission (and a 10 day delay) on even creating the index page (which is, I guess, why they only have 142 "unproofread" or "incomplete" indexes: petscan:18955661), they do not have infinite patience, even in the working spaces.
Over here, we have had at least one "textbook" translation recently: Translation:The Three Princes of Serendip. Inductiveloadtalk/contribs 12:11, 27 April 2021 (UTC)
Two ideas I recall seeing around are a section on the author page that separately lists active transcription projects and a search button for linked indexes. I would prefer just the latter to avoid so much self reference on the site, and it is a better practice for avoiding concerns like working on a new index that is already half done elsewhere (as mentioned by Encyclopetey somewhere). If links to indices are manually added wherever, then it becomes remiss not to add that to the list of things to do aside from proof reading. CYGNIS INSIGNIS 13:05, 28 April 2021 (UTC)
it is good to have german wikisource as an example of what not to do. at wikimanias, we ask each other what is up with the "sick man of europe" [hopelessly lagging] wikisource. if you think i am exaggerating, check out the statistics: back in 2010, en & de had the same proofread pages around 100,000. now, de is at 300,000 and en is at 1300000. [16] the unhealthy wikis like de wikisource and wikinews are a choice by the admins at those projects: there is no technical reason they cannot be as productive as the english, french, italians and polish.Slowking4Farmbrough's revenge 22:45, 28 April 2021 (UTC)
@Slowking4: please rephrase that second sentence. CYGNIS INSIGNIS 04:05, 29 April 2021 (UTC)
ok. describing chronic failure has historical antecedents. but the lesson remains, do the opposite - no quid pro quo, no preconditions to start an index, no precondition of announcements or support, no requirement of text base (scanned back) pages. Slowking4Farmbrough's revenge 15:28, 29 April 2021 (UTC)
there might be pertinent comment after that attempt at a rejoinder, but I lost interest … CYGNIS INSIGNIS 15:51, 29 April 2021 (UTC)

Help with Validating a Text

I was wondering if someone could validate Index:Paradise Lost Manuscript. It's part of the upcoming Monthly Challenge series and I think that it would make an extremely impactful first text to validate. The Index consists of the 34 pages that are book 1 of Paradise Lost and the only extant part of the manuscript. The text was written by an amanuensis. Languageseeker (talk) 00:43, 29 April 2021 (UTC)

Taking a quick look, there are two consistent issues I see. (1) Indentations of lines from the original is not replicated, and line indents are to be replicated for poetic works. (2) You've used the poem-tag throughout; this can cause huge headaches for multi-page works when they are transcluded. The poem tag does not always behave predictably, so for poems that span multiple pages, line breaks are preferred. --EncycloPetey (talk) 05:23, 29 April 2021 (UTC)
I read the indent point (1) as a new stanza, without bothering to check if that was what the printer did. What is the plural for 'amanuensis', because that is how this document was apparently compiled. Point (2) should be policy, the resources of this site have laboured to accomodate a tag that does little more than obviate the need to add breaks. CYGNIS INSIGNIS 13:23, 29 April 2021 (UTC)
@EncycloPetey, @Cygnis insignis: Thank you both for your feedback that helps me to clarify/understand what remains to be done on this manuscript. I think that I would like to reproduce the look of the manuscript as much as possible. There are four major tasks remaining when it comes to formatting 1) Replace poem with br tags 2) add in {{ls}} 3) add missing plines 4) add gaps. Languageseeker (talk) 13:43, 29 April 2021 (UTC)
In a printed text I have no hesitation in disposing of the indent and replacing that with an empty line, a la wiki, but in the case of a new transcript I am not so sure. Perhaps I should nominate for deletion as 'self- or un-published' to save me worrying. CYGNIS INSIGNIS 14:04, 29 April 2021 (UTC)
When the text is in prose, I agree with you about dispensing with line indents, but with poetry an indented line does not signify the start of a "paragraph", and may be an internal line of a stanza that has been indented. Deciding when indents are new stanzas and which are internally indented lines is an editorial decision, and choosing to make a break where there was an indent only, will affect the way it is read. --EncycloPetey (talk) 15:03, 29 April 2021 (UTC)

Score needed

Could somebody please oblige by transcribing the short score on Page:S.S. Bremen - G. Howell-Baker - music by E. Edgar Evans.jpg? It's beyond my skills. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:36, 29 April 2021 (UTC)

dentition renderin'

"the cat proceeded to its bowl of biscuits

CRUNCH CRUNCH CRUNCH!. [apol. to Brautigan] :(

I need to add a dental formula in a work, there is some coding for that at the big sister: this. That could be copied over to Template:DentalFormula for the convenience of those familiar with it, but any stable code that allows me to apply numerals and separators above and below a 'line' (or other demarcator) would be useful for the odd instances I have found. This is useful for odd fractions too, 16ths, but I couldn't find anything mentioned in the Help pages. CYGNIS INSIGNIS 13:12, 29 April 2021 (UTC)

Would it be possible to use Template:Sfrac? To produce, for example, 2.1.3.3/2.1.3.3 ? That template supports rendering all characters as far as I know in both the "numerator" and "denominator," and the Wikipedia Template:DentalFormula is based on it as far as I know. Mathmitch7 (talk) 14:45, 29 April 2021 (UTC)
[e/c] There's {{sfrac}}: 2.2.2.2/2.2.2.2. Export is a bit wonky on some less capable clients. I am unsure of the best practice for fractions in the general case, especially w.r.t. accessibility. Inductiveloadtalk/contribs 14:46, 29 April 2021 (UTC)

Importation of Bookworm Bot from French Wikisource

French Wikisource uses BookwormBot to generate useful stastical information about Index such as the number of pages proofread, validated, blank, etc. Would it be possible to import this bot into English Wikisource? Languageseeker (talk) 22:47, 20 April 2021 (UTC)

 Support  Neutral (see below) for the purposes of trying out the frWS model of a page-wise monthly goal, but it's less a matter of "importing" and more a matter of asking @User:Coren what we need to do to enable them to turn it on for us and do that, and also gain consensus to grant the bot flag (which is fine by me). Inductiveloadtalk/contribs 19:10, 23 April 2021 (UTC)
 Comment @Inductiveload: User:Coren does not seem to be extremely active anymore. Would we need him to set up the bot or could we do it ourselves? Languageseeker (talk) 03:51, 24 April 2021 (UTC)
Where is its code? [I will presume that it has been suitably licensed as Coren was good that way.] What are the prerequisites for running the bot? Fro where are we planning to run the bot? It says from the DB, rather than pulled from the API. Which db? Has all the data been added to WD, and pulled from there, or what? — billinghurst sDrewth 08:57, 26 April 2021 (UTC)
 Comment It might also be worth investigating if we could build the raw per-index "page-at-status-X counting" functionality into the ProofreadPage extension and then 1) we wouldn't need any bot and 2) all Wikisourcen can benefit. I do not have a handle of how murderous (or not) that might be on the server side with respect to page render times, so it may be a complete non-starter.
I have opened phab:T281195, but I probably don't have to time to dig into a PHP job any time soon. Inductiveloadtalk/contribs 21:56, 26 April 2021 (UTC)
The Wikisource:Monthly Challenge stats are now (I hope!) being automatically generated by a new script, which was a total duplication of effort that exists in BookwormBot, but seems to be at least functional for now. So I am now neutral - I'm not sure we need BwB any more. Inductiveloadtalk/contribs 22:29, 8 May 2021 (UTC)
This section was archived on a request by: — billinghurst sDrewth 08:25, 28 May 2021 (UTC)

I'd like to use my bot (User:SLiuBot) to import texts of the book from The Integrated Information System on Modern and Contemporary Characters (a database curated by Institute of Modern History, Academia Sinica (Q10875101)). I have already uploaded 10 test entries on my user pages (see User:Stevenliuyi/Who's Who in the Far East (June) 1906-7). Since I am new here, any advices/suggestions are very welcome. Several other English biographical dictionaries in that database are also in the public domain, and I also have the plan to import those dictionaries after completing this one, but that will be a separate bot request. Right now I just want to make the first one right. --Stevenliuyi (talk) 04:20, 30 April 2021 (UTC)

I have fixed some mirror issues in the test pages. If there is no objection, I plan to upload all texts in the book. --Stevenliuyi (talk) 18:04, 4 May 2021 (UTC)
I have finished uploading the texts (see Who's Who in the Far East (June) 1906-7). Please let me know if you have any suggestions. --Stevenliuyi (talk) 01:24, 6 May 2021 (UTC)

Translations and Reprints from the Original Sources of European History

Today I happened to come across the work Translations and Reprints from the Original Sources of European History, published by the University of Pennsylvania's History department from about 1897-1907. As far as I can tell, archive.org has full or nearly full coverage of the series, which has several volumes. I was originally looking for a PD English Translation of "fr:Qu’est-ce que le tiers état ?" by Author:Emmanuel Joseph Sieyès which is available in full on the French Wikisource but not yet validated, and I happened across this series, where excerpts of that text appear in the 6th volume. I guess my question is, would this source be something that others would be interested in having all volumes up on commons and a central page here with a table of contents? It seems quite wide-reaching and may help us expand English-language coverage of minor texts from European authors. I'm just curious if this is a work people think would be useful. Mathmitch7 (talk) 14:40, 29 April 2021 (UTC)

@Mathmitch7: I generally think it's a very good thing to set up this kind of collective work "on spec", because the set-up is quite a lot of faffing about for many users, but dipping in to proofread an article or two is easy once that groundwork is laid. But make sure it's well linked from authors pages and perhaps Portals so it can be found. If it can't be found, no-one will come, even if you build it.
In theory, if you don't proofread any articles, it can't have a mainspace page, however. C.f. Wikisource:Scriptorium/Archives/2021-02#No-content_mainspace_pages for a long but fizzled discussion to try to determine best practices for that. Inductiveloadtalk/contribs 14:56, 29 April 2021 (UTC)
I would be interested in doing some of that, though probably very little overall, however, I have a preference and see no harm in having sections of the volumes with red-linked parent titles that follow this sister's convention of series/vol/sect, eg, I care for about 10% or less of The Emu, linking the 'parent titles' from The Emu/volume 3/Extinct Tasmanian Emu would be, as someone here said, "guaranteed to engender disappointment". CYGNIS INSIGNIS 15:29, 29 April 2021 (UTC)
@Mathmitch7: That looks like an amazing find. I’m sure some of those translations are still probably the only ones ever made. I took a quick look and there appears to be a new series as well. Languageseeker (talk) 19:22, 29 April 2021 (UTC)
Alright, so I've now uploaded one scan of each volume available on the internet archive of this series, you can find them at commons:Category:Translations and Reprints from the Original Sources of European History (UPenn series). I'll try to get them up to wikisource soonish, but I have some other things I have to do today. I'll note that the "new series" mentioned (which I also uploaded) are more monographs than big edited volumes, but I think they will still be helpful. Also, I was unable to find a version of Volume 5 readily available -- there's a version on archive.org but as it's a reprint from the 1971 it's not fully available for download, so that's something to look for in the future. Mathmitch7 (talk) 13:29, 10 May 2021 (UTC)

Moral disclaimers for certain works

There are certain works that have a core message or consistently incorporate certain themes that most people would find offensive and morally reprehensible. I'm thinking specifically about works that were made for the purpose of promoting white supremacy. Some notable examples of these are: Thomas Dixon's The Clansmen and The Leopard's Spots; D.W. Griffith's films The Birth of a Nation (1915) and Intolerance (1916); Henry Ford's The International Jew (1920); Adolf Hitler's works; etc.

I think works such as these definitely need to be transcribed here, so that they can be viewed for historical purposes (as in, to understand what their arguments were and why they were made), and a transcription could for example make it easier for a user of our content to produce a rebuttal to said work. But the issue is that works like these are so bigoted in tone that their messages are simply indefensible, cruel, and morally reprehensible. I imagine many people who read our transcriptions of those works may get the idea that Wikisource's community, or the users who took the time and effort to work on the transcription, actually support the bigoted messages of these works, despite what Wikisource's project pages say about the project being NPOV.

So I propose that we create a disclaimer template, that we can put in the "Notes" section of the front matter page's header template. The template should say something to the effect of:

This text consistently promotes ideas that are particularly hateful or bigoted in nature. Please remember that Wikisource's community and its contributors do not necessarily endorse any opinions or ideas presented in any of its works, including this one. Works are presented as-is with no censorship involved, as transcription is done with a neutral point of view in mind, without bias for or against any particular ideology.

By the way, I think the disclaimer should only be included in works that have a consistently disreputable tone that may easily cause offense. I don't think that works such as Bobbie, General Manager or The Achievements of Luther Trant which casually dropped the n-word in a few times, but don't bring up racial issues much at all, should be given the template. However, a book focusing primarily on racial issues, taking a white supremacist stance, would qualify. PseudoSkull (talk) 19:12, 8 April 2021 (UTC)

I very much appreciate the underlying issue, but I'd be inclined not to do this. While it would have benefit for the most extreme and uncontroversial cases, such as those you list, there would be a tremendous number of works in a "grey area" where editors disagree, and/or where we lack the resources to even detect or evaluate subtle but reprehensible views.
Perhaps an alternative would be to put some careful work into a thorough essay along the lines you suggest, and link to it from somewhere prominent on the Wikisource main page. Rather than trying to attach it to every reprehensible work, simply express our position clearly in one central place.
I always think it's worthwhile to think about the precedent of traditional libraries. Would you expect to find your local library had inserted a position statement into its copy of Mein Kampf? It seems unlikely, though I could certainly see them having a general brochure available at the front desk explaining why they carry such works. -Pete (talk) 19:32, 8 April 2021 (UTC)
@PseudoSkull: Just to tie my comment a little more closely to your proposal, and focus on how things would play out in practice: How would you imagine things going if somebody strongly disagreed with you, and felt that Bobbie, General Manager was indeed reprehensible? (I have no familiarity with this particular work, just following your example.) How would we come to a decision? Would the process tend to deplete the time or emotional energy of various volunteers? Would the end result, regardless of what it is, bring much benefit to the reader? -Pete (talk) 20:12, 8 April 2021 (UTC)
Some editors have had similar discussions regarding the practice of using project disclaimers on works such as encyclopedias. I'm pretty sure no consensus was ever reached, and thus no action ever taken. In my personal opinion, a general disclaimer that covers all Wikisource works, perhaps placed prominently on the Main Page and in the footer, should suffice for both purposes. —Beleg Tâl (talk) 20:11, 10 April 2021 (UTC)
@PseudoSkull: I am not adverse to such a template being added to the corresponding talk page, and the use of "edition = yes" in the header to put the pointer. I have a preference to keep commentary out of main namespace, and keeping it as clean as possible. — billinghurst sDrewth 13:04, 9 May 2021 (UTC)
@Billinghurst: I'm fine with having the disclaimer on the work's talk page, especially since that's where readers will probably go to (inappropriately for talk pages btw) complain about the work itself. Here's a talk page that already had this issue, and had to be autoconfirm-protected, for example. PseudoSkull (talk) 18:10, 6 June 2021 (UTC)

 Comment Noting that in the footer of every page that we produced there is a link to Wikisource:General disclaimer. The text there should be reviewed and suggestions made. — billinghurst sDrewth 00:48, 9 May 2021 (UTC)

 Support making a template to tag sensitive works. How will it be coded?--Jusjih (talk) 01:21, 8 June 2021 (UTC)

Done @PseudoSkull: {{moral disclaimer}} It takes no parameters, as nothing evident came to mind, and we can use what transcludes to find where it is used, so no tracking category at this time. — billinghurst sDrewth 11:59, 9 June 2021 (UTC)

  • reopen and oppose its usage on any text [as per Pete]. As pointed out by Beleg Tâl above, this is not the first time this has been conceived; their notion of a general disclaimer is preferable to the application of the newly created {{moral disclaimer}}. CYGNIS INSIGNIS 13:11, 9 June 2021 (UTC) [edit] 13:15, 9 June 2021 (UTC)~
There was a change to the suggested means to handle this, and it is via the talk page, not in the comments field. Lots of things happen on the talk pages, especially the commentary about the works. This is an annotation only, and complaining about this template and saying that it should not be there, is akin to complaining about people complaining about works. — billinghurst sDrewth 13:39, 9 June 2021 (UTC)
"I am not adverse to such a template being added to the corresponding talk page, and the use of "edition = yes" in the header to put the pointer." Compromise, implementation and closed without further discussion. I didn't oppose when I saw this earlier, because the reasons not to are well outlined. CYGNIS INSIGNIS 14:24, 9 June 2021 (UTC)
This section was archived on a request by: — billinghurst sDrewth 08:24, 18 June 2021 (UTC)

New Request for Comment on Wikilinking Policy is open

I have just opened Wikisource:Requests for comment/Wikilinking policy. You will find there a proposed complete overhaul/rewrite of the current policy, which is now ready for review by the wider Wikisource community. It is proposed that the RfC will be open for two weeks. Please make your comments there rather than here. Beeswaxcandle (talk) 08:33, 14 March 2021 (UTC)

@Beeswaxcandle: I think 2 weeks / 72 hours is a little bit too aggressive, even for a presumed uncontroversial policy proposal like this. I understand the reasoning, but I just don't think the community is able to move that fast. For example, we have several long-time contributors that are currently in a phase where they check in only every couple of weeks. And I know for my own part that the local Covid status could easily make me too busy to check in here for weeks on end. We could still have an accelerated timeline (just not quite as accelerated as 2/72) if we notify of the proposal in an site notice and maybe even a talk page message to any established contributor that has been active in the last three months (or similar).
PS. And let me repeat my previous private kudos in public: you took my ongoing whining about the old policy and turned it into a concrete proposal for a new policy. Great work, for which I am extremely grateful! --Xover (talk) 09:25, 14 March 2021 (UTC)