Wikisource:Scriptorium

From Wikisource
(Redirected from Wikisource:SCRIPTORIUM)
Jump to navigation Jump to search
Scriptorium
The Scriptorium is Wikisource's community discussion page. Feel free to ask questions or leave comments. You may join any current discussion or start a new one; please see Wikisource:Scriptorium/Help. Project members can often be found in the #wikisource IRC channel webclient. For discussion related to the entire project (not just the English chapter), please discuss at the multilingual Wikisource. There are currently 440 active users here.

Announcements[edit]

5,000 works fully validated[edit]

A few hours ago, our five thousandth work was validated. Index:American Rescue Plan Fact Sheet - Impacts on Kentucky.pdf (a White House publication) was validated by User:Clay. For a complete list of our validation milestones see Portal:Proofreading milestones. Beeswaxcandle (talk) 23:13, 16 April 2021 (UTC)

Woot, looking at the statistics, it took 4 years to reach 1,000 and about 7 months to go from 4,000 to 5,000. Look's like we must be doing something right. A huge achievement. Languageseeker (talk) 02:47, 17 April 2021 (UTC)

Index transclusion status now in the Index page edit form[edit]

As many of you may have noticed in your watchlists, Index page transclusion status and validation dates are no longer recorded in a template, but are a proper part of the Index page edit form. All existing uses of the templates have been migrated over. The usage remains unchanged - transclusion status refers to how much of the work is transcluded, and is somewhat independent of the proofread status - it is possible for a work to be fully transcluded but not validated.

There was a brief period where some Indexes had multiple categories while the usage was changed over. These should naturally resolve as the categories update, or you can force it by purging or editing the page. As always, let me know if something is looking broken even after a purge. Inductiveloadtalk/contribs 13:00, 22 April 2021 (UTC)

Proposals[edit]

New Request for Comment on Wikilinking Policy is open[edit]

I have just opened Wikisource:Requests for comment/Wikilinking policy. You will find there a proposed complete overhaul/rewrite of the current policy, which is now ready for review by the wider Wikisource community. It is proposed that the RfC will be open for two weeks. Please make your comments there rather than here. Beeswaxcandle (talk) 08:33, 14 March 2021 (UTC)

@Beeswaxcandle: I think 2 weeks / 72 hours is a little bit too aggressive, even for a presumed uncontroversial policy proposal like this. I understand the reasoning, but I just don't think the community is able to move that fast. For example, we have several long-time contributors that are currently in a phase where they check in only every couple of weeks. And I know for my own part that the local Covid status could easily make me too busy to check in here for weeks on end. We could still have an accelerated timeline (just not quite as accelerated as 2/72) if we notify of the proposal in an site notice and maybe even a talk page message to any established contributor that has been active in the last three months (or similar).
PS. And let me repeat my previous private kudos in public: you took my ongoing whining about the old policy and turned it into a concrete proposal for a new policy. Great work, for which I am extremely grateful! --Xover (talk) 09:25, 14 March 2021 (UTC)

Tweak archive settings for the Scriptorium[edit]

Currently the configuration for automatic archiving of the Scriptorium is set to archive threads in which there has been no new comments for 30 days, and to archive threads which are explicitly marked as resolved 31 days after the date they are marked as resolved. This means that in practical effect nothing ever gets archived by being marked as resolved.

In order to have the ability to clean out this sometimes a bit overwhelmingly long page I propose we change the interval for resolved sections to something more reasonable like a week, or possibly even 3 days. We rarely explicitly close threads here, and when we do it's the "Hey, how do I do this? / Here's how. / Ok, thanks."-type threads. Conversely, the threads that really need long-term visibility are either marked with the "do not archive until" tag (which lets you set an arbitrary future date before which the thread is ineligible for archiving) or, for things like the RFCs and proposals, once the discussion in the "Proposals" section is closed they should be posted as an announcement in the "Announcement" section where they will stay for an additional minimum of 30 days (the ordinary auto-archive interval).

Absent indications to the contrary my expectation is that this proposal is uncontroversial, so if there are no comments on this proposal I will take that as tacit approval. If anyone has any concerns with it I would appreciate comments to that effect, or even just "Wait, I have to think about it first". --Xover (talk) 09:17, 29 March 2021 (UTC)

And since nobody objected or yelled "Wait!", I've now tweaked the settings accordingly. I'll leave this thread open for a bit for stragglers, and after that anyone that wants to tweak the settings further can just open a new thread. --Xover (talk) 19:27, 9 April 2021 (UTC)
@Xover: Meh. This proposal gets lost in the morass of other components at the top of the page. Personally think that we need to rethink the scope of how the page works. As it is not working efficiently. I think that if we are going to have so many active proposals that they need to be subpages of this page and transcluded in, or linked out with some better means of notifications through Mediawiki:Watchlist-announcements (as we used to do. Numbers of the proposals that have come forward are blue sky thinking and more like WS:RfCs for long discussions, rather than simple proposals. — billinghurst sDrewth 00:40, 9 May 2021 (UTC)

I would happily see users who turned up yesterday (apparently) censored from kludging up the page, and this small community's time, with revived proposals. The page would also be a good deal shorter if nearly every thread was spared the often confounding and adversarial commentary of our resident bigot. With that aired, I think the scriptorium provides a lot of information to silent readers wishing to improve their contributions, buried within the commentary are solutions that are difficult to find in the help pages. Three days to a week is a very short time frame for this site, and often discussions need more airing and thoughtful input. I think the proposal addresses some of these concerns, but hope that urgency does not override the development of guidelines through broad and considered opinions. CYGNIS INSIGNIS 11:07, 9 May 2021 (UTC)

@Cygnis insignis: My proposal in this thread (which has been implemented) was only in regards threads that are manually closed (by adding {{section resolved}}), which we almost never do. All other threads are archived 30 days after the last comment. That's why I considered it appropriate to make the change so quickly and without a real !vote: it makes very little practical difference.
Billinghurst's subsequent comment (which echoes my own thoughts on some of the problems with this page) is directed at the broader issue of this page getting unwieldy, and will need both a broader set of changes to address, and broader discussion before any changes can be decided on. Xover (talk) 12:34, 9 May 2021 (UTC)
@Xover: Excuse the moaning then, I haven't seen any archiving here that was objectionable and what you've implemented sounds reasonable. CYGNIS INSIGNIS 12:41, 9 May 2021 (UTC)

Moral disclaimers for certain works[edit]

There are certain works that have a core message or consistently incorporate certain themes that most people would find offensive and morally reprehensible. I'm thinking specifically about works that were made for the purpose of promoting white supremacy. Some notable examples of these are: Thomas Dixon's The Clansmen and The Leopard's Spots; D.W. Griffith's films The Birth of a Nation (1915) and Intolerance (1916); Henry Ford's The International Jew (1920); Adolf Hitler's works; etc.

I think works such as these definitely need to be transcribed here, so that they can be viewed for historical purposes (as in, to understand what their arguments were and why they were made), and a transcription could for example make it easier for a user of our content to produce a rebuttal to said work. But the issue is that works like these are so bigoted in tone that their messages are simply indefensible, cruel, and morally reprehensible. I imagine many people who read our transcriptions of those works may get the idea that Wikisource's community, or the users who took the time and effort to work on the transcription, actually support the bigoted messages of these works, despite what Wikisource's project pages say about the project being NPOV.

So I propose that we create a disclaimer template, that we can put in the "Notes" section of the front matter page's header template. The template should say something to the effect of:

This text consistently promotes ideas that are particularly hateful or bigoted in nature. Please remember that Wikisource's community and its contributors do not necessarily endorse any opinions or ideas presented in any of its works, including this one. Works are presented as-is with no censorship involved, as transcription is done with a neutral point of view in mind, without bias for or against any particular ideology.

By the way, I think the disclaimer should only be included in works that have a consistently disreputable tone that may easily cause offense. I don't think that works such as Bobbie, General Manager or The Achievements of Luther Trant which casually dropped the n-word in a few times, but don't bring up racial issues much at all, should be given the template. However, a book focusing primarily on racial issues, taking a white supremacist stance, would qualify. PseudoSkull (talk) 19:12, 8 April 2021 (UTC)

I very much appreciate the underlying issue, but I'd be inclined not to do this. While it would have benefit for the most extreme and uncontroversial cases, such as those you list, there would be a tremendous number of works in a "grey area" where editors disagree, and/or where we lack the resources to even detect or evaluate subtle but reprehensible views.
Perhaps an alternative would be to put some careful work into a thorough essay along the lines you suggest, and link to it from somewhere prominent on the Wikisource main page. Rather than trying to attach it to every reprehensible work, simply express our position clearly in one central place.
I always think it's worthwhile to think about the precedent of traditional libraries. Would you expect to find your local library had inserted a position statement into its copy of Mein Kampf? It seems unlikely, though I could certainly see them having a general brochure available at the front desk explaining why they carry such works. -Pete (talk) 19:32, 8 April 2021 (UTC)
@PseudoSkull: Just to tie my comment a little more closely to your proposal, and focus on how things would play out in practice: How would you imagine things going if somebody strongly disagreed with you, and felt that Bobbie, General Manager was indeed reprehensible? (I have no familiarity with this particular work, just following your example.) How would we come to a decision? Would the process tend to deplete the time or emotional energy of various volunteers? Would the end result, regardless of what it is, bring much benefit to the reader? -Pete (talk) 20:12, 8 April 2021 (UTC)
Some editors have had similar discussions regarding the practice of using project disclaimers on works such as encyclopedias. I'm pretty sure no consensus was ever reached, and thus no action ever taken. In my personal opinion, a general disclaimer that covers all Wikisource works, perhaps placed prominently on the Main Page and in the footer, should suffice for both purposes. —Beleg Tâl (talk) 20:11, 10 April 2021 (UTC)
@PseudoSkull: I am not adverse to such a template being added to the corresponding talk page, and the use of "edition = yes" in the header to put the pointer. I have a preference to keep commentary out of main namespace, and keeping it as clean as possible. — billinghurst sDrewth 13:04, 9 May 2021 (UTC)

Pictogram voting comment.svg Comment Noting that in the footer of every page that we produced there is a link to Wikisource:General disclaimer. The text there should be reviewed and suggestions made. — billinghurst sDrewth 00:48, 9 May 2021 (UTC)

Creation of Version Pages for the Individual works of Charles Dickens[edit]

In the discussion of the proposed move for Oliver Twist to it’s correct edition name and the creation of a version page on Oliver Twist, User:Xover informed me that such a proposal requires a formal vote prior to a second version of the text being proofread. Therefore, I’m submitting a formal proposal to the community to discuss the pros and cons of creating version pages for the individual works of Charles Dickens. It’s important to note that Dickens usually published 6-12 distinct editions with his editorial revisions and listing them on the author page occupies a significant amount of space. Languageseeker (talk) 15:45, 11 April 2021 (UTC)

I think this discussion (which is occurring in several places) is a bit stuck, and could benefit from a more explicit articulation of the basic problem you're trying to solve. Personally I think you've made it pretty clear, but that's just my perception…it also seems like others have had trouble seeing it. So I'll take a crack at that:
Up until recently, the author page for Charles Dickens dedicated a few lines to the novel Oliver Twist. (The same issue applies to many of his novels, but I'll use this one as an example.) These lines identified several editions of the novel; but they were not comprehensive, nor did they offer complete bibliographic information. More recently, this was fleshed out in much greater detail (see here). But this increased detail, which helps the reader readily learn about the various editions, has an undesirable side effect as well, i.e. it makes the Author page unwieldy and difficult to parse. One might reasonably assume that most readers are interested in an overview, like "which novels did Dickens write," and that only more specialized readers want to delve into the weeds of which editions are which, etc. A natural organizational principle for a website is to put the high-level information on the main page, and then move the more detailed information to the page linked, which was accomplished by creating Oliver Twist versions. This appears (to me) to be in keeping with common practice and with documented procedures at Wikisource.
Other experienced editors seem to disagree. I believe the principle is that a "Versions" page should only exist when there are multiple editions or versions transcribed on Wikisource, not merely published elsewhere.
I think the introduction of this last principle is the root of the present problems. As far as I can tell, this principle is not in Wikisource's documentation of versions pages, so an assumption that other editors would know it strikes me as potentially a bit rude. But more significantly, I think the principle itself is a bit myopic. A reader may well want to learn about multiple editions, and they may be glad to find bibliographic information that will permit them to find them in a traditional library, and/or links where they can find it at Gutenburg, Internet Archive, LibreVox, information at Wikipedia, etc. Does the reader really care, as a primary principle, whether or not they find a transcription specifically on English Wikisource? To me that seems like a principle that overvalues our specific work, at the expense of broadly respecting the work of scanning, transcribing, documenting, and otherwise preserving published works (which occurs at many websites and traditional institutions).
I would argue that we should freely move detailed information about a work that has had multiple editions published to a "versions" page, irrespective of whether or not any particular number of them has been transcribed here at Wikisource, whenever that action serves to make an "Author:" page more readable/useful (or for any number of other reasons). It doesn't strike me as particularly controversial, nor contrary to any policy I've found, to do so. -Pete (talk) 18:45, 11 April 2021 (UTC)
Thank you for your extremely detailed and cogent summary of the matter. You've said it better than I probably could. One of the biggest issues is that the works of Dickens have an extremely complicated publication history that requires access to either his letters or the Clarendon Dickens to sort out. In certain years, Dickens released multiple distinctive corrections of a particular work. Therefore, you can't even say the 1838 edition of Oliver Twist. Instead, you have to describe the title page to identify the edition. Does it say Charles Dickens or Boz? To add to the confusion, in at least one case, Dickens published two different revisions of one edition differing in one plate and several other corrections. Furthermore, Dickens corrected a number of errors in each revision and the printer made even more. Therefore, the scholarly consensus is that the texts got worse as time progressed. Not all of these versions are available as scans on IA. Therefore, to make sure that future users can be sure that they're tracking down the right editions, I'm recording the descriptions of the individual versions on Wikisource so that future editors know what to find. Languageseeker (talk) 19:39, 11 April 2021 (UTC)

Not convinced. Disambiguation pages are meant to be simple directing pages. They are not meant to be long explanatory documents, and definitely not the encyclopaedic articles. If we need to do something special then can we look to do it in the author: ns, probably a subpage where we can curate things in a more holistic sense. LBT has done that with quite a few detailed descriptive and explanatory subpages to her poetic authors. — billinghurst sDrewth 14:38, 12 April 2021 (UTC)

Why not? Either we're going to have these on author pages or author subpages or versions pages, and having them on author pages would make them huge, so put them on the versions page.--Prosfilaes (talk) 23:19, 12 April 2021 (UTC)

Why not? Because there is no version page and means unnecessarily disambiguating when we have no guarantee that the works will ever exist on-site. Get the works and we disambiguate. This is not the encyclopaedia, and encyclopaedic explanations do not belong on versions pages. — billinghurst sDrewth 00:14, 13 April 2021 (UTC)
The bigger question is do works by an author belong on the author page even if there are no scans available? Each edition of Oliver Twist is a work by Charles Dickens distinct from his other works. Perhaps, we can just group them in by headings. Languageseeker (talk) 00:29, 13 April 2021 (UTC)
We have tons of lists of works we may never have, some of which won't be uploadable for decades. It seems entirely within our mission and practice to offer lists of important variants of one work. There might be a place where this gets too much, but a carefully selected set of works by Charles Dickens seems pretty far from that place.--Prosfilaes (talk) 00:57, 13 April 2021 (UTC)
(e/c)A list of works by an author belongs on the author's page, even if they are currently redlinked. At some time in the future a scan will appear. The purpose of a versions page is to list the versions we do currently host so as to provide a way of finding them easily via a search for the work's name. Most readers coming here to read a particular work just want to know if we have a copy and don't really care about a list of copies we don't yet have. In other words, I don't support the proposal to create version pages for each of Dickens' works in anticipation of hosting other editions at some undefinable time in the future. Beeswaxcandle (talk) 05:00, 13 April 2021 (UTC)
For many works, a scan may never appear, and if it does, it will be at some undefinable time in the future. There are a lot of works that I'd expect a scan for long after all the editions of Oliver Twist are complete. I'm not dead-set on this going on version pages, but we have works that people are actually interested in eventually doing, and putting them all on Dickens' page is going to make too much noise.--Prosfilaes (talk) 07:48, 13 April 2021 (UTC)

A scenario worth considering: How does a reader come to land on a Wikisource page about Oliver Twist, and does Wikisource serve up something that is appropriate? I imagine that most people who arrive here would be referred by another Wikimedia site, or by a Google search. If they land on a specific edition which provides no indication that there are other versions, or why they are looking at this version, is that a good outcome? If not, how should we address it? One approach would be the one Languageseeker has suggested. I have not yet seen an alternative proposed, but I'd like to. I suppose we could have an extensive "notes" section at Oliver Twist -- is that what those opposing the existence of a versions page would like? Or if not that...what? -Pete (talk) 04:52, 13 April 2021 (UTC)

I've kind of answered this in my previous post. However, we do not want an extensive note about other editions at the sole edition we currently host. However, a note could point to the Author and/or the Wikipedia page for detailed bibliographic information. The quickest way around this issue for Oliver Twist is to locate another edition, proofread it and then make it available. At that point create a versions page and put the {{other versions}} template on each of the hosted editions. The versions page would then have two editions on it. Beeswaxcandle (talk) 05:06, 13 April 2021 (UTC)
Poor Oliver Twist really did come into this world to make trouble. If this is the policy, it should be stated and all pages that violate it should be sdeleted. For now, it seems time to create a version page for The Pickwick Papers because there are two versions. May I suggest calling the present one The Pickwick Papers (Project Gutenberg). Languageseeker (talk) 05:36, 13 April 2021 (UTC)
@Beeswaxcandle: I appreciate your direct answers, and I think they demonstrate clearly where we disagree. (As a side note, I think it's unfortunate that these sort of basic philosophical questions about what we're doing here are so unresolved, and given that they are, it makes sense that we'd have strong disagreements about specific things like this. But I digress.)
Anyway, I disagree with this:
"The purpose of a versions page is to list the versions we do currently host..."
I believe that is a purpose of a versions page, but not the be-all, end-all reason. I believe that readers come here with expectations and needs that vary according to the user, and according to the kind of work they're seeking; but I doubt that very many of them are especially concerned with the distinction of whether a work lives here on Wikisource or not. I believe the reader is well served if our pages provide some context about the work they're viewing; and in some cases, there might be particularly important context to provide. From what Languageseeker has said, it seems that Dickens' works are just such a case: there are many editions with substantively different content, and varying provenance, and the Wikisource editors who happened to choose one of those editions may or may not have even been aware of the variety of choices available, much less made a well-informed decision about which to transcribe. In my view, the least we can do is prominently and concisely provide contextual information about what editions exist, before guiding them to the one we happen to have.
It's worthwhile to keep in mind that Wikimedia's structure beyond Wikisource may deepen the sometimes erroneous impression that Wikisource's transcription is of the most authoritative work. A Wikidata item about the work (as opposed to the edition) may link to the Wikisource transcription of a specific edition. That might be OK in a case where there are no major differences among editions; but where there are such differences, is it really a good thing for Wikisource to have no page for the Wikidata item to link to? The Wikidata item for Moby Dick links to a nice, concise versions page which contains only one transcribed version, and one version that appears not to be transcribed here. Is there a reader that is harmed by the existence of that versions page? Personally, I am pleased to know that there are different UK and US first editions, and I'm fine with clicking through that page to the one that exists on Wikisource. Is it really better that the Wikimedia universe convey the impression, in all the places that consult Wikidata, that Wikisource has nothing about Moby Dick?
In short, I think it's important that our policies around versions page leave some room for individual judgment by people who know something about the work. I'm not arguing that we need to create a versions page for every single work that has multiple editions, but rather that where an editor believes there is value in doing so, we should not generally interfere with that editor creating such a page, linking it appropriately on Wikisource and Wikidata, etc. I do not see the harm in taking this approach, and I'd been under the impression up until now that it's the approach we do take. -Pete (talk) 21:18, 15 April 2021 (UTC)
  • this last principle is the root of the present problems I disagree with this assessment. I think the root of the misunderstanding here is a desire for Wikisource to be something which it is not. In particular, Wikisource is not a bibliographic database. The purpose of Wikisource is not to collect all bibliographic data for a single edition, nor to collect bibliographic data for all editions. It is especially not for collecting extensive bibliographic data on an arbitrary subset of editions of a given work that someone finds particularly interesting.
    We do collect such data when it aids our primary work, but then we do it on WikiProject pages (if there is active ongoing work related to it) or Author talk pages (long term storage of research for the benefit of other contributors). If there are (semi-)objective criteria for whatever the subset is, it may also be appropriate to create a Portal: for that subset (a portal being a thematic collection of works or editions of works).
    This is sometimes sinned against: we list lots of editions on Author: pages that should ideally only contain works, and our versions pages tend to amass entries for editions that we do not yet have (for various reasons), and mostly this is fine and bothers no one. It's not something that really has to be enforced as a bright-line rule, and I think a lot of contributors find them useful or at least not in any way bothersome.
    But the problem comes with special pleading for one's pet project to override the wider good of the project. For example by insisting we move aside a proofread work (our primary purpose) for an arbitrary selection of bibliographic data (not at all our purpose), for editions we do not currently have and have no particular expectation of appearing any time soon. Such a plea from someone actually working on proofreading multiple editions of Dickens' works would get a very sympathetic hearing with me, because at that point there is a genuine need to structure pages to accommodate multiple editions. This current request, and all the words expended on it, is not that. --Xover (talk) 06:26, 16 April 2021 (UTC)
But we frequently are a bibliographic database. We have author pages, like Author:Isaac Asimov, stuffed with works that will not be PD in decades. A careful distinction between works and editions is purely bibliographic, artificial and not productive for us. I can see an argument that we should only link works/editions that are actually in the process of being worked on, but that's not what we do. If people on Wikisource want to list a set of works they want to work on, I don't think it matters whether they're "works" or "editions" by some definition.--Prosfilaes (talk) 08:22, 16 April 2021 (UTC)
I would actually support removing all the dashed redlinks for works in Author ns. Completely ugly and unnecessary. I would also comment that works under copyright should use {{copyright until}} and not be plain redlinked. Though having these arguments here will detract from finishing the pertinent argument which I still do not support. — billinghurst sDrewth 12:30, 16 April 2021 (UTC)
No, we frequently amass bibliographic data as a by-product of our main purpose, but that's fundamentally different from being a bibliographic database. Even ignoring the issue of our purpose, we do not have any tools that would set us up to be useful for creating such a database or for using us as such (one primary mode of use would be semantically rich querying: even the monstrosity that is WorldCat can distinguish queries for authors vs. titles). If your goal is a bibliographic database you and everyone else will be better served by working on either Wikidata or OpenLibrary, both of which have far far superior tools for the purpose than we do, and are set up along similar principles (crowed-sourced, openly licensed).
The distinction between works and editions is not artificial, but definitional; and it happens to be a good distinguisher for us in practice. Listing all works by an author is desirable and achievable, but listing all editions quickly becomes impossible. That's why this thread only proposes to collect an arbitrary subset of them. It also makes author pages entirely unwieldy and no longer fit for their purpose, which is why this thread proposes to move them to a separate page. In other words, ignoring the work—edition distinction and the difference between bibliographic data collected as necessary and incidental to our main purpose, and as a primary purpose, would lead to preemptively creating versions pages for every single work and producing—byte for byte, character for character, and page for page—more bibliographic data than actual content.
And you turn the issue on its head with If people on Wikisource want to list a set of works they want to work on, I don't think it matters whether they're "works" or "editions" by some definition. It's always ok to only list the works you actually want to work on. We're a wiki, so someone else will hopefully add the rest at some point. It's preemptively listing editions that you have no intention of ever working on, and supplanting actual proofread content that someone else actually has worked on, that is the problem here. That's why it's necessary to talk of high-falutin' principle stuff like "the purpose of Wikisource": doing that only makes sense if our primary purpose is amassing bibliographic data. It is also only a problem when you start focussing on the editions as the primary information: so long as you stick to works there's no need to move mainspace content around (except possibly to correct titles), until you're actually proofreading an additional edition of it. --Xover (talk) 09:16, 16 April 2021 (UTC)
The distinction between works and editions can be both definitional and artificial; working on LibraryThing has shown me whether two books are the same work can be a fraught and complex discussion. There are real Ship of Theseus issues in some works; are all the editions of the Encyclopedia Britannica the same work? If yes, you're combining volumes that have no text in common as the same work. Are all editions different works? Then you're designating as different works things with merely orthographic changes. Not only is the third option complex and fact dependent, the whole Ship of Theseus thing means that's there's no clear lines.
No, listing all works by an author is often not achievable, and is frequently problematic in the same way that listing all editions is. A newspaper journalist may have at least one work in every daily newspaper for thirty years. A proper list of Dickens' works includes any number of articles from Household Words and other periodicals, plus 12 volumes of letters. I fail to see why adding select editions that people want on Wikisource is a problem.--Prosfilaes (talk) 07:46, 17 April 2021 (UTC)
Shrug. There are certainly edge cases; but the difficulty in making the determination does not affect the nature of the distinction. And as I said, amassing metadata about some additional editions isn't a problem in itself so long as it is treated as a by-product of our primary purpose, a secondary concern. It is when it is promoted to a primary concern it starts causing problems. For example by making Author:Charles Dickens so unwieldy that those "tidying" it (that is what a tidied author page looks like?!?) feel it imperative to move the one actually proofread edition of Oliver Twist we have aside in favour of their own subjective selection of "important" editions, and, when declined, to create a Oliver Twist versions and a subsidiary network of things like Oliver Twist (Charles Dickens Edition) (and disagreeing is apparently so rude that one becomes persona non grata with the proponents). It leads to pseudo-encyclopedic non-neutral link-farms like David Copperfield (Authoritative? egads! Why don't we just create an Amazon Affiliate account and be done with it?).
Meanwhile, not a single page of any Dickens work appears to have been proofread, beyond a single Match & Split that didn't even bother to clean up the obvious breakage afterwards. Why would we privilege being a poor man's LibraryThing (which is not our purpose, and for which we have no even remotely adequate tools) over actually proofreading works (the purpose for which the project does exist)? I'm more than averagely fond of bibliography too, and would like to do it within the Wikimedia movement, but the way to do that is advocating for improvement to and integration with Wikidata and the front ends / user interfaces to it, that the WikiCite folks have been having conferences about for going on a decade without any measurable progress.
It is entirely appropriate to discuss individual exceptions where merited. Dickens is a big (notable/important) author, and a prolific one. I am sure there is a coherent argument to be made related to the nature of the early editions of his works (it just hasn't been presented yet). As an exception I'd certainly be willing to entertain the notion, though I suspect it would lack some key factors to persuade me (part of the need can be met by one or more Portal: pages, and the other parts are moot unless a significant number of actually proofread works exist to create a practical, rather than theoretical, need). But as a principle I am vehemently opposed, both because it turns the nature of Wikisource on its head for no good reason, and because it just simply is not a good idea. --Xover (talk) 09:54, 17 April 2021 (UTC)
As I said, these are not some random editions that I happened to hear about. These are the editions that Charles Dickens revised himself. Such revisions range from correcting errata, to rewriting passages, to replacing plates. I'm planning on adding transcription projects as soon as Commons fixes its system. Honestly, from this conversation, I believe that we need an actual vote system and not just long discussion threads after which an administrator decides what to do. Languageseeker (talk) 12:44, 17 April 2021 (UTC)
Also, by properly defining the edition, it allows the reader to know which edition they are reading. Imagine, if the first version of Hamlet on Wikisource was Q1, would you vote against moving it to Hamlet (Quarto 1) and listing the other Quartos and Folios? This is an exact parallel case. For Oliver Twist, besides the errata, Dickens rewrote much of the book in 1846 making there two very distinct texts. At this point, a few of the Dickens works have versions created prior to me and others do not. Languageseeker (talk) 12:52, 17 April 2021 (UTC)
@Languageseeker: We are not saying any of that isn't the case, you have told us over and over and we understand. When you or someone else has another edition we will disambiguate it. Until then, record the information that you want in the Author namespace as that is where we have typically done that work, and we have not prematurely moved editions just because there may be another edition. — billinghurst sDrewth 13:32, 17 April 2021 (UTC)
On fr. we do our best so that access to the latest version of the text corrected by the author is the privileged goal of the work of wikisourcists, but all editions are welcome too. --Zyephyrus (talk) 15:10, 17 April 2021 (UTC)
Great to hear from fr., you're the reason why I discovered the English Wikisource. I'm astonished by how many texts of major authors you have scan backed. Merci!
For Dickens, it's actually the opposite. The first editions are the best and the last edition is usually considered the worst. Languageseeker (talk) 15:19, 17 April 2021 (UTC)
The nature of the distinction is that it's an artificial one. For any two works, there is a set of works in Borges' library (or generatable by computer program) that are one letter away from each other, connecting the two works. It's not difficult to make the distinction; it's impossible to make in any non-arbitrary way.
My problem is not that we should collect bibliographic data; it's that we shouldn't. We shouldn't have long lists of works that no one is going to upload any time in the next decade on pages like Author:Isaac Asimov, but as long as we do, we should not use bibliographic rules to decide whether or not an item gets listed.--Prosfilaes (talk) 16:12, 17 April 2021 (UTC)

I recently saw a nicely constructed page of editions in the main-(ie reader-)space, that contained a link to a complete text here, being routinely edited to add a light blue (external) link to IA.org for the 1st ed. A solution to any objection might be to add the index to consolidate its inclusion in main-space, implying that someone [else] might want to labor on proofreading that choice as well; an emergent property of this example is moving that [more or less tolerated] bibliographic data from the Author namespace (our catalogue?). CYGNIS INSIGNIS 12:30, 17 April 2021 (UTC)

Formatting in header template[edit]

  1. I propose (again) that the title of the whole work be italicised.
  2. I propose (again) that the author of the work—or their 'contributed section'—not be italicised.
  • <emphasis>Supprot</emphasis> I also tink that, cheers to nom. CYGNIS INSIGNIS 12:44, 17 April 2021 (UTC)
  • Symbol oppose vote.svg Oppose Italics are usually fine for native speakers, but having worked on a multi-lingual dictionary, I can attest that italicizing text makes it harder to read for many non-native readers, especially those whose native writing system is not based on the Latin alphabet. There are things we take for grated about italicized text that do not apply in languages like Russian or Japanese. Personally, I do not think the name of the author or translator should be italicized either. --EncycloPetey (talk) 02:44, 18 April 2021 (UTC)
    that's one support for item 2, which is what constantly reminds about item 1. It's an interesting point, interfering with access. CYGNIS INSIGNIS 14:26, 23 April 2021 (UTC)

Allow the Shakespeare Quarto's with splitting the Pages[edit]

This proposal has two components. First, expand the scope of Wikisource to allow all copies of Shakespeare Quarto's. Second, allow the transcription of these Quartos without splitting the individual pages. Languageseeker (talk) 01:03, 18 April 2021 (UTC)

(1) What is the advantage to Wikisource of transcribing multiple copies of the same printing? (2) How do you propose to preserve pagination if there are multiple pages to each scan page in the scan copy? --EncycloPetey (talk) 02:39, 18 April 2021 (UTC)
All 32 copies of Hamlet have proofread text that needs mainly formatting. There can be printer variants between the texts. Also having all 32 copies will attract visitors, especially since the original site has been taken offline because OUP decided not to upgrade the flash infrastructure to html. For transclusion, you can use Help:Transclusion#Adding_section_labels. Languageseeker (talk) 02:48, 18 April 2021 (UTC)
I am not understanding the proposal as written. How is adding versions of a work expanding a scope. If that is not the proposal, then stop speaking in jargon and have an expectation of an author's work or what is done at another site. Are you proposing no scans, etc. If you are talking bout 32 different sourced versions of Hamlet, +++, that is within scope. Versions are versions, versions are allowed. Otherwise, I simply don't understand the scope of the proposal. — billinghurst sDrewth 03:03, 18 April 2021 (UTC)
@Billinghurst: Because you said it was out of scope and removed the links.

Allow global sysops to work on this wiki[edit]

Hi, I propose allowing global sysops to work on this wiki. It is currently not enabled because the community has more than 10 admins/3 active sysops, but I strongly recommend that the community opt-in because they often help in combating spam and vandalism (eg GRP). As an en.wikibooks admin, I can attest to the work they do and have no issues with them at all. Thanks in advance, and please ping me if you need further input, since I don't watch this page.
P.S: Global sysops won't interfere with normal Wikisource matters (for instance they do not have access to Special:UserRights) - their role is codified in the policy page and is more or less handling spam or vandalism. This wiki can enact a global rights policy if needed (couldn't find one here). They'll only help you. Leaderboard (talk) 08:54, 4 May 2021 (UTC)

I don't think that it is necessary at this time. We have a good time spread for our 25 admins and patrollers, and more admins than there are global sysops. — billinghurst sDrewth 10:51, 4 May 2021 (UTC)
For reference, list of global sysops and Special:GlobalGroupPermissions and their rights are quite broad, many of which we would not expect to be used here. — billinghurst sDrewth 10:55, 4 May 2021 (UTC)
Hi @Billinghurst:, that's true, however, I don't see how it would be a problem to allow a small group of highly trusted users to remove spam or vandalism. I've seen even bigger wikis benefit from having global sysops. Leaderboard (talk) 11:39, 4 May 2021 (UTC)
Always present a full picture of the situation, not one favourable to your argument. How else can the community make an informed risk-based decision? — billinghurst sDrewth 11:44, 4 May 2021 (UTC)
And for openness, I am a global sysop (among other rights I hold here, at other wikis, and globally (m:user:billinghurst/m:User:Billinghurst/matrix). — billinghurst sDrewth 11:49, 4 May 2021 (UTC)
@Billinghurst: I'm not sure what the risks are, as you say. In theory it could mean global sysops doing things they shouldn't (in which case just file a request at Meta, you know that GS have been removed that way), or some things being sensitive that you don't want global sysops touching at any cost (in which case simply have your wiki enact a global rights policy, as I linked before).
This, to me, is a proposal that has only benefits and no drawbacks (otherwise I would have mentioned them). If there are indeed risks that I'm not aware, do let me know and I'll try to address them. Leaderboard (talk) 18:24, 4 May 2021 (UTC)
Please don't come and tell me that the way to manage our risks is to open up our wiki to more administrators, and then complain about them to get them removed. That is a horrid argument. Nor tell us that we would need to create a document to limit the scope of their activity. Does that picture look wrong to you? We weigh the upsides and the downsides.

Of course there are risks in giving broader permissions to people, we would essentially creating new interface administrators on site, given rights that the community has limited so far, though the community may find benefit there, or could be horrified. We would also have those users have their edits marked as autopatrolled, and we manage that right based on competence of editors at our site. Pointing to the rights that would be inherited is simple awareness, talking about the ups and downs should be part of a deeper conversation, this is not a flutter of pixie dust. — billinghurst sDrewth 02:27, 5 May 2021 (UTC)

@Billinghurst: Two things:
  • just because they have interface-admin does not mean that they will use it - in practice they will not use it in a wiki as large as Wikisource.
  • "We would also have those users have their edits marked as autopatrolled" - that's the case for global rollback as well
  • What is the problem with, as you say, "create a document to limit the scope of their activity"? It's good practice to have one even if you don't allow GS, as enwiki has done.
And for the record, I did link to the Meta page when I made this proposal, which (if I'm not wrong) clearly gives the rights available to the group. And yes, I'm confused with your arguments - as I said earlier, just because they have admin-level powers does not mean that they're going to invade and take the role of local admins like you. I only meant good faith when I said that there is practically no downside to allowing GS to work here.
If there are downsides you have (like the interface-admin part), tell them so that I can address it. I'm not going to ask the stewards to enact this change without sufficient consensus and discussion, and will not do anything if there isn't any. Worst-case, it'll stay as status quo (that is, no GS allowed here). Leaderboard (talk) 07:38, 5 May 2021 (UTC)
Umm, I just pointed the community to the facts, and some light interpretation, I neither said it was right nor wrong. That is called aiding a discussion to have full information without every person needing to dig it out. — billinghurst sDrewth 16:44, 5 May 2021 (UTC)
rights of admin / interface admin / GS
Administrators Interface administrators Global sysops
  • Block a user from sending email (blockemail)
  • Block other users from editing (block)
  • Bypass IP blocks, auto-blocks and range blocks (ipblock-exempt)
  • Change page language (pagelang)
  • Change protection levels and edit cascade-protected pages (protect)
  • Create and (de)activate tags (managechangetags)
  • Create new user accounts (createaccount)
  • Create or modify abuse filters (abusefilter-modify)
  • Create or modify global abuse filters (abusefilter-modify-global)
  • Delete tags from the database (deletechangetags)
  • Delete and undelete specific log entries (deletelogentry)
  • Delete and undelete specific revisions of pages (deleterevision)
  • Delete pages (delete)
  • Disable global blocks locally (globalblock-whitelist)
  • Edit other users' JSON files (edituserjson)
  • Edit pages protected as "Allow only administrators" (editprotected)
  • Edit pages protected as "Allow only autoconfirmed users" (editsemiprotected)
  • Edit sitewide JSON (editsitejson)
  • Edit the content model of a page (editcontentmodel)
  • Edit the user interface (editinterface)
  • Enable two-factor authentication (oathauth-enable)
  • Forcibly create a local account for a global account (centralauth-createlocal)
  • Have one's own edits automatically marked as patrolled (autopatrol)
  • Import pages from other wikis (import)
  • Mark others' edits as patrolled (patrol)
  • Mark rolled-back edits as bot edits (markbotedits)
  • Mass delete pages (nuke)
  • Merge the history of pages (mergehistory)
  • Modify abuse filters with restricted actions (abusefilter-modify-restricted)
  • Move category pages (move-categorypages)
  • Move files (movefile)
  • Move pages (move)
  • Move pages with their subpages (move-subpages)
  • Move root user pages (move-rootuserpages)
  • Not be affected by IP-based rate limits (autoconfirmed)
  • Not be affected by rate limits (noratelimit)
  • Not create redirects from source pages when moving pages (suppressredirect)
  • Override files on the shared media repository locally (reupload-shared)
  • Override the spoofing checks (override-antispoof)
  • Override the title or username blacklist (tboverride)
  • Overwrite existing files (reupload)
  • Overwrite existing files uploaded by oneself (reupload-own)
  • Perform CAPTCHA-triggering actions without having to go through the CAPTCHA (skipcaptcha)
  • Quickly rollback the edits of the last user who edited a particular page (rollback)
  • Reset failed or transcoded videos so they are inserted into the job queue again (transcode-reset)
  • Revert all changes by a given abuse filter (abusefilter-revert)
  • Search deleted pages (browsearchive)
  • Send a message to multiple users at once (massmessage)
  • Undelete a page (undelete)
  • Upload files (upload)
  • Use higher limits in API queries (apihighlimits)
  • View information about the current transcode activity (transcode-status)
  • View a list of unwatched pages (unwatchedpages)
  • View abuse filters marked as private (abusefilter-view-private)
  • View deleted history entries, without their associated text (deletedhistory)
  • View deleted text and changes between deleted revisions (deletedtext)
  • View detailed abuse log entries (abusefilter-log-detail)
  • View log entries of abuse filters marked as private (abusefilter-log-private)
  • View title blacklist log (titleblacklistlog)
  • Add groups: Autopatrollers, MassMessage senders, Patrollers and IP block exemptions
  • Remove groups: Autopatrollers, MassMessage senders, Patrollers and IP block exemptions
  • Add groups to own account: Flood flag and Translation administrators
  • Remove groups from own account: Flood flag and Translation administrators
  • Edit other users' CSS files (editusercss)
  • Edit other users' JSON files (edituserjson)
  • Edit other users' JavaScript files (edituserjs)
  • Edit sitewide CSS (editsitecss)
  • Edit sitewide JSON (editsitejson)
  • Edit sitewide JavaScript (editsitejs)
  • Edit the user interface (editinterface)
  • Enable two-factor authentication (oathauth-enable)
  • View detailed abuse log entries (abusefilter-log-detail)
  • Create or modify abuse filters (abusefilter-modify)
  • Use higher limits in API queries (apihighlimits)
  • Not be affected by IP-based rate limits (autoconfirmed)
  • Have one's own edits automatically marked as patrolled (autopatrol)
  • Auto-review on rollback (autoreviewrestore)
  • Block other users from editing (block)
  • Block a user from sending email (blockemail)
  • Search deleted pages (browsearchive)
  • Delete pages (delete)
  • View deleted history entries, without their associated text (deletedhistory)
  • View deleted text and changes between deleted revisions (deletedtext)
  • Delete and undelete specific log entries (deletelogentry)
  • Delete and undelete specific revisions of pages (deleterevision)
  • Edit the content model of a page (editcontentmodel)
  • Edit the user interface (editinterface)
  • Edit pages protected as "Allow only administrators" (editprotected)
  • Edit pages protected as "Allow only autoconfirmed users" (editsemiprotected)
  • Edit sitewide CSS (editsitecss)
  • Edit sitewide JavaScript (editsitejs)
  • Edit sitewide JSON (editsitejson)
  • Edit other users' CSS files (editusercss)
  • Edit other users' JavaScript files (edituserjs)
  • Edit other users' JSON files (edituserjson)
  • Delete Structured Discussions topics and posts (flow-delete)
  • Edit Structured Discussions posts by other users (flow-edit-post)
  • Hide Structured Discussions topics and posts (flow-hide)
  • Import pages from other wikis (import)
  • Bypass IP blocks, auto-blocks and range blocks (ipblock-exempt)
  • Mark rolled-back edits as bot edits (markbotedits)
  • Merge the history of pages (mergehistory)
  • Move pages (move)
  • Move category pages (move-categorypages)
  • Move root user pages (move-rootuserpages)
  • Move pages with their subpages (move-subpages)
  • Move files (movefile)
  • Move pages with stable versions (movestable)
  • Not be affected by rate limits (noratelimit)
  • Mass delete pages (nuke)
  • Enable two-factor authentication (oathauth-enable)
  • Override the spoofing checks (override-antispoof)
  • Mark others' edits as patrolled (patrol)
  • Change protection levels and edit cascade-protected pages (protect)
  • Overwrite existing files (reupload)
  • Overwrite existing files uploaded by oneself (reupload-own)
  • Override files on the shared media repository locally (reupload-shared)
  • Quickly rollback the edits of the last user who edited a particular page (rollback)
  • Perform CAPTCHA-triggering actions without having to go through the CAPTCHA (skipcaptcha)
  • View the spam blacklist log (spamblacklistlog)
  • Not create redirects from source pages when moving pages (suppressredirect)
  • Override the title or username blacklist (tboverride)
  • Edit protected templates (templateeditor)
  • View title blacklist log (titleblacklistlog)
  • Undelete a page (undelete)
  • View a list of unwatched pages (unwatchedpages)
  • Upload files (upload)
  • Oppose: I don’t think vandalism is enough of a problem here to justify adding in more administrators. If emergency access is needed, it can be requested. TE(æ)A,ea. (talk) 17:44, 4 May 2021 (UTC)
    I'm not sure what you mean by that? Stewards do have the ability to perform admin-level actions, but they will do so very rarely in practice. Leaderboard (talk) 18:17, 4 May 2021 (UTC)
(e/c) I also don't think it necessary at present. Our RC patrolling processes are working well and it's rare that vandalism and spam escapes our notice. I wonder what the motivation for this suggestion is. Beeswaxcandle (talk) 17:46, 4 May 2021 (UTC)
@Beeswaxcandle: A combination of
  • my experience having global sysops work in both wikis where I'm a sysop - they have been very helpful and I have no issues with them
  • seeing cross-wiki vandalism (eg GRP) and the inability for GS to act in these situations
  • "larger wikis" (those with >=15 admins), like this, benefiting from GS in practice (the most recent being English Wikivoyage with 48 sysops that I successfully convinced to opt-in just about a week back, and the effects are already there)
  • some GS hence asking for local adminship solely to handle vandalism (not sure if there are any such users here; I've seen them elsewhere)
Personally, I felt Wikisource could benefit from allowing GS and stewards to perform routine spam/vandalism removal, and hence presented this proposal Leaderboard (talk) 18:20, 4 May 2021 (UTC)
This somewhat evangelistic approach is all very well, but what is the quid pro quo for you appearing here and attempting to persuade us? Also, please give an example (or two) of something that has happened here at enWS that would have benefited from the GS being involved. (And, what is GRP? enWP offers Glass Reinforced Plastic or Gross Rating Point, neither of which make sense in the context. Throwing obscure abbreviations around is unhelpful.) Beeswaxcandle (talk) 17:42, 5 May 2021 (UTC)
@Beeswaxcandle: I apologise if "GRP" was not clear to you; they are a serial long-term LTA (wikipedia:Wikipedia:LTA/GRP) that has affected various projects already (including Wikisource from what I understand, apparently others disagree). I mentioned them because they attack cross-wiki and are something which global sysops can help a lot. I've seen cases of the other type; my user contribution would be examples. I don't know what "quid pro quo" means. For what's worth, I'm not doing this only to Wikisource; there are other wikis of this nature that can benefit from having GS around as well. Again, if this community does not like GS, this proposal will not pass and it'll remain status quo. Leaderboard (talk) 19:21, 5 May 2021 (UTC)
@Leaderboard: you now have my attention. Would you like some candid views on LTAs and wikisource? CYGNIS INSIGNIS 21:29, 5 May 2021 (UTC)
@Cygnis insignis: I am not sure if I should answer anything else to that question other than "ok". Leaderboard (talk) 07:19, 6 May 2021 (UTC)
@Leaderboard: [fixing pings doesn't work [1], I imagine there is a good reason for that.] Pardon, I was considering my reply. This is a nice place (and the best sister-site, imo) and I don't want to make noise (This is a library :). I see that en.wikipedia has more centralised and open discussion on some I the concerns I would air. Could I discuss that elsewhere, where there may be others with similar concerns or answers to allay my own? CYGNIS INSIGNIS 17:37, 7 May 2021 (UTC)
@Cygnis insignis: I do not know why you're adding en.wikipedia to the mix, but if there's anything that you want to say that should not be said in this thread, you may post in my talk page. Leaderboard (talk) 07:28, 8 May 2021 (UTC)
Symbol neutral vote.svg Neutral I am neither for not against the proposal. I don't mistrust the GS group and I'm not about to get jealous over a missed chance to block a sock myself. But there's not a whole lot of vandalism here that would benefit, unless I misunderstand something. Generally most vandals are one-hit wonders and wander off almost immediately in search of better sport, or if not, they hop IPs and don't edit more than a few times from a single IP, so having a GS on hand to step in a few minutes faster won't change much? Inductiveloadtalk/contribs 18:31, 4 May 2021 (UTC)
@Inductiveload: Maybe Wikisource is different, but here at Wikibooks, we've got cases where the LTA would continuously abuse/revert edits, at which point a GS (or a local admin) would be needed to stop it. Yes, IP hopping is a thing, but GS can still make the life of the LTA considerably harder. For reference, I've been at the receiving end myself in other non-GS wikis. And even if that isn't the case, as you inferred, global sysops are highly trusted and there is frankly no drawback to allowing them (and by extension, stewards) to act when it would help the wiki. Leaderboard (talk) 18:35, 4 May 2021 (UTC)
  • Oppose: Sounds like a solution in search of a problem.--Prosfilaes (talk) 16:34, 5 May 2021 (UTC)
  • Oppose. And, frankly, for a user with a grand total of two non-revert edits outside of this thread to make this proposal raises serious concerns about their motivation. --Xover (talk) 06:34, 8 May 2021 (UTC)
    @Xover: That is not surprising, because I don't contribute to Wikisource (and hence asked for a ping). But for what's worth, I stand to gain nothing from this proposal directly, because I am not a global sysop or global rollback. I have been filing similar proposals in other non-GS wikis that I think can benefit as well. Leaderboard (talk) 07:27, 8 May 2021 (UTC)
    You don't contribute to the project and yet you feel it appropriate to make policy proposals here, and appear quite confident you know what the project needs and what its challenges are without having participated in it. That you are making such unsolicited proposals to other projects to which you do not contribute does not particularly suggest good judgement either. Xover (talk) 07:47, 8 May 2021 (UTC)
    @Xover: I am not sure why you think I'm making a mistake by doing this. The reason I'm making such proposals is literally because I believe that this is something that every project can benefit from. And some may approve (like en.wikivoyage and en.wikiquote), some may not (like this wiki). It is not good assumption to say that I'm not making good judgement by making such proposals - by your line of thoughs, stewards shouldn't exist, because they don't contribute to the projects they have power on, do they?
    I am not making this proposal blinded either. I do have some understanding of the challenges cross-wiki patrollers face, as someone who has assisted other wikis in handling LTAs (long-term abusers), and hence have seen the difficulty of global sysops and stewards (even though I'm neither) when they try to handle LTAs that hop across wikis. The latter can apply global locks/blocks, the former cannot. Similarly, cross-wiki doxxing is a thing, and GS cannot help if wikis like these stay opted-out.
    Let me make this clear - even though I made this proposal in good faith, the decision rests with the community, and if the community does not approve it, the proposal will not pass. Simple as that. I have no authority to override consensus. In 3 days' time, unless I see a significant rise in support, I'll consider this as a failed proposal. Leaderboard (talk) 12:36, 8 May 2021 (UTC)
    Because you're not a global sysop, but are presuming to speak for them and making assertions about what they need, and because you're not a member of the Wikisource community but you presume to tell us what we need. If the community here needs help from the global sysops then we will ask them for it, and if the global sysops need our help they will ask us for it. And you appear to be equally confused about the role of the Stewards. Xover (talk) 13:36, 8 May 2021 (UTC)

Update Inclusion Guidelines to prohibit future non-scanned backed works[edit]

I’m proposing to update the inclusion guidelines to exclude all works that are not transcluded. The veracity and accuracy of non-scanned back texts cannot be verified without significant effort while scan-backed works can. Currently, there are efforts to replace non-scan backed versions with scan-backed versions to improve the overall quality of this site, but this is continually made more difficult by the addition on non-scan backed works. If this proposal passes, all new, non-scan backed works will be deemed out-of-scope and speedily moved to the user's namespace. For works that are born digital, a Wikisource edition will require the creation of a PDF or other suitable digital reproduction. Languageseeker (talk) 19:44, 7 May 2021 (UTC)

So, you are proposing that we delete ca. 208,277 pages of content (i.e. 41% of our content). This includes works that are digitally native and therefore do not have printed texts to be scanned. Your proposal would do away with the match&split process and make it unavailable for the purpose you recently used it for in bringing in PG texts. Your proposal precludes the possibility of bringing in works for which there are no scans available. Beeswaxcandle (talk) 23:09, 7 May 2021 (UTC)
Question Question Are you proposing deleting all current non-scan-backed works, or are you proposing a ban-hammer on future non-scan-backed works? Either way, I’m opposed, but it’d be good to at least have that clarification. — Dcsohl (talk) 00:30, 8 May 2021 (UTC)
I am not proposing to delete past content, but to place a moratorium on future. For works born digital, a pdf can be created to preserve the content. If the goal of this site is to create accurate copies of textual sources, then there must be an original to compare it to. Otherwise, there is absolutely no way to guarantee accuracy. Placing a moratorium will allow for a slow, transition process to begin. If there is some overriding compelling reason why a particular work cannot be scan-backed, then that can be treated as an exception.
Match-and-split is a problematic and difficult process. Why it can be used to salvage past works, there is no reason to create additional work for the future.
@Dcsohl: Is there any particular reason why you are opposed? Languageseeker (talk) 03:41, 8 May 2021 (UTC)
Your description doesn't match "not proposing to delete past content". You also need to provide an explanation of how this will be policed. How will the exception process work? In terms of the "born digital", what's to stop me uploading a text then creating a pdf of the same content I just uploaded, and voila it matches? There's still no verification of the source or its public domain status. How are we better off? I recommend you withdraw this version of the proposal and rewrite it. A single paragraph of why does not explain the how. Beeswaxcandle (talk) 07:37, 8 May 2021 (UTC)
"all non-scan backed works will be deemed out-of-scope and speedily deleted." is that meant to mean all new non-scanned works or all new and current works? Maybe you forgot to include new in there? MarkLSteadman (talk) 22:02, 8 May 2021 (UTC)
I just think there are other valid sources for works besides scan-backed, the most notable of which would be PG. Are we going to forbid the import of all PG works henceforth? It should definitely be harder to bring in non-scan-backed works, but I’m simply not in favor of making it impossible. — Dcsohl (talk) 02:07, 9 May 2021 (UTC)
I would forbid the import of all PG works henceforth. They're well archived; WP can durably link to them. There's no value in copying them here.--Prosfilaes (talk) 15:03, 9 May 2021 (UTC)
Symbol support vote.svg Support, non-scan-backed works are bad for the following reasons:
  • It's hard to determine what the source of our transcription is, if no source is given.
  • It's hard to determine version information: is it the original, is it from a more modern reprint, a revised edition? In the cases of smaller works that are usually published in collections such as songs or poems, which collection was this particular version published in? Another common question: what version contained this introduction by Bob Bobberson, and is that version still under copyright? This information can unfortunately remain unknown without a scan.
  • It's hard to determine copyright status of the work, or our particular version of the work, when necessary.
  • It's harder if not impossible to verify for accuracy. Gutenberg texts have no guarantee of validity (and are sometimes wrong), they often arbitrarily correct typos or perceived typos (which is against Wikisource policy), and are by no means comparable to a scan of an original printed work.
  • It's easy to give an incorrect source link by mistake (see Talk:Winesburg, Ohio as an example), especially if you copy-paste large amounts of text from other sites within a short time span.
Quality over quantity. Editors and readers of our content alike need something original to compare our text to on the fly. Furthermore it is very obvious that modern Wikisource pages with scan backing are of far better quality than those from the past, or the present, that are copy-pasted from other sources with no scan. We have too large a backlog of pages from the 2000s or early 2010s, that were inserted with no scans to back them, simply because the technology wasn't there yet for it or that technology was in its infancy. We certainly don't need any more! PseudoSkull (talk) 04:35, 8 May 2021 (UTC)
  • Multiple many times oh-so-much opposed to this proposal. Requiring scan-backing for new texts is in itself a no-brainer; the hard part is defining what to do with existing non-scan-backed texts, and how to deal with born-digital texts and other new texts for which, for whatever exceptional reason, it is not sensible to require scans. These are the things a modified policy needs to address, and this proposal doesn't even attempt to do so.
    In particular, a dumb requirement for scans without addressing the details will just lead to a proliferation of arbitrary self-created PDFs that correspond to no published edition, hiding the problem in a "scan" instead of making it obvious. Does a Gutenberg text get any less problematic because someone printed it to a PDF file before cut&pasting the text here? Xover (talk) 06:10, 8 May 2021 (UTC)
  • no I have no issues with encourage, strongly encourage for old works, but as a requirement NO. There are some things that are not available as a scan, or not available as a scan of suitable quality. Also modern works are electronic and do not require scans, and do not require proofreading, and making a scan to the transclude to allow the text is straight daft. — billinghurst sDrewth 09:00, 8 May 2021 (UTC)
  • This type moratorium on future is very good conception in my opinion and it will bring much benefits and will raise value of project. More about it I wrote on this too long thread. Could we delete old pages? Absolutely no. We can't to delete anyone's work. I propose to place main emphasis on texts with scans and change old non-scanned texts to new texts with scans. In my experience on one much smaller project I know how many mistakes are on non-scanned texts. Mistakes by procedure copy-paste, disorted words and whole sentences, disastrous punctuation. We can't also prohibit to create version without scans especially if scans are not available on this moment. On my opinion data based on non-scanned texts are worth as much as texts and articles with no source. In my experience I know that the more are texts with scans, the more users (even newbies) add texts with scans, and non-scaned texts appear less often. And, at last, small digression, @Billinghurst:, I'm hearing your experiences and I assure you I appreciate it very much. I think that we need to oblige users to updating Wikidata preferably right after overwriting old non-scanned texts, if a adequate items are existing on Wikidata. Maybe it is impossible and I'm just idealistic dreamer. I think, if a scan exist or appear we should will place the most emphasis on change old and unbelievable text with no source to the new believable and verifiable. Exactly like on Wikipedia articles, when user is adding new informations based on sources he can change old unbelievable informations. Tommy J. (talk) 13:57, 8 May 2021 (UTC)
  • Oppose. This is a terrible proposal, with obvious, and obviously disastrous, consequences. I oppose any scan-backing requirements, and especially one so draconian as this. While you may find the existence of non-scan-backed works reprehensible, they are nonetheless an integral part of English Wikisource. Certainly, and especially for larger and/or more important works, the encouragement of scan-backing is acceptable; but a requirement of the sort you propose will greatly damage the project. TE(æ)A,ea. (talk) 15:02, 8 May 2021 (UTC)
  • Oppose. I can certainly see where this proposal comes from. However, our backlog of non-scan-backed works that should be scan-backed ("Shouldies") has two origins: 1) very old works that predate modern WS norms, and, indeed, the ProofreadPage extension ("Oldies") and 2) new works that should be scan-backed from the outset ("Newbies").
"Oldies" ae uniquely annoying, because they are usually not without value and are often "core" texts, often via PG (Which probably made sense in the mind noughties, when there were far, far fewer scans online). I certainly don't think that deletion-without-replacement is a good idea. While these works are probably a chilling effect on building out the core scan-backed corpus, because scan-backing an existing work isn't as "sexy" as a whole new work, deleting them outright with no replacement plan will just be blowing holes in the library for ideology's sake. I think an unsourced or non-scan-backed work is (just slightly) better than no work at all.
"Newbies" are easier to deal with, since they can be addressed at the time by engagement with the contributor.
I don't have a good handle on this, but "Newbies" don't seem to be an overwhelming proportion of "Shouldies". Most "Shouldies" that I come across are from the mid-noughties and are often complete, but unloved.
At the risk of talking the talk without walking the walk (since I am not personally volunteering to take point), I think what we actually need here is something like:
  • As I have previously suggested, figure out a more collegial and less adversarial improvement venue than WS:PD. This will allow us to put eyes on "Newbies" and hopefully get them sorted before they get buried. This might be a hypothetical WS:SCANS (or a whimsical name like The Bindery, whatever), possibly repurposed from the obsolete WS:OCR.
  • For "Oldies", management is broadly similar, but there contributors are generally long gone. My proposal, which is contingent on the Monthly Challenge becoming a "thing" is to reserve a slot or two for "Wikisource-internal" works, which could be an "Oldie" in need of scan backing. This would provide a slow, but hopefully steady, impetus to slowly chip away at the backlog. If we can bring more proofreading firepower to bear on it that the MC can contain, we can set up a separate WikiProject, but, honestly, I can't see that happening.
What I don't really want to see here is writing new policy that will strengthen the (IMO undesirable) tendency to view policy-backed deletion as a quality control measure rather than a last resort for unsalvageable works). I would like to see it made clear to new users that if a scan is available and sensible, a work should be scan-backed. But it is already pretty clear, e.g. at Help:Adding texts.
@Languageseeker: my suggestion to you personally is that if you'd like to see progress made on the issue, firstly you should attempt to quantify it. We have 283 works tagged with {{migrate to}}, and 24 tagged {{scans available}}. That's not an epidemic of "Shouldies", but I certainly do not think that count is anywhere near the true count. Secondly, I suggest that you (continue) work on the Monthly Challenge as that is, IMO, a very interesting and engaging way to get eyes on specific "important" (either due to "Shouldie" status, or as a "core" text, FSVO "core"). Finally, if you still feel MC is not enough, I suggest taking your quantified statement of the issue from step one and using that to set up scan-backign workshop where we problematic works can be laid out for review and improvement. Inductiveloadtalk/contribs 22:12, 8 May 2021 (UTC)
... AND users can link to PG from author pages, there is no need for their versions to be here. — billinghurst sDrewth 12:44, 10 May 2021 (UTC)
My thoughts on this in general are: 1) We should be very strict on {{no source}} when works new works are created without a source and if those are not resolved by engagement with the contributors at that point consider deletion. 2) It would be good to easily track the size of that category over time and see how we can resolve many of the works in it 3) We as a community should use things like POTM / Monthly Challenge / Community Collaboration / WikiProjects etc. to include some component to identify and encourage migration of the backlog so that we can bring up the proportion of scan backed content up significantly to bring this https://phetools.toolforge.org/graphs/Wikisource_-_texts_en.svg down rather than flat. MarkLSteadman (talk) 22:40, 8 May 2021 (UTC)
  • oppose filling a void here with proofread content from PG can be linked from other texts, a virtue of this site. Adding scan based texts is all I do, and I've gone to bother of adding an alternative to a PG text, but I value having a link to anything usable. The minutes taken to convert a text for use here versus the hours of one or more users it not such a delicate balance in evaluating their worth. CYGNIS INSIGNIS 18:12, 9 May 2021 (UTC)

Bot approval requests[edit]

Importation of Bookworm Bot from French Wikisource[edit]

French Wikisource uses BookwormBot to generate useful stastical information about Index such as the number of pages proofread, validated, blank, etc. Would it be possible to import this bot into English Wikisource? Languageseeker (talk) 22:47, 20 April 2021 (UTC)

Symbol support vote.svg Support Symbol neutral vote.svg Neutral (see below) for the purposes of trying out the frWS model of a page-wise monthly goal, but it's less a matter of "importing" and more a matter of asking @User:Coren what we need to do to enable them to turn it on for us and do that, and also gain consensus to grant the bot flag (which is fine by me). Inductiveloadtalk/contribs 19:10, 23 April 2021 (UTC)
Pictogram voting comment.svg Comment @Inductiveload: User:Coren does not seem to be extremely active anymore. Would we need him to set up the bot or could we do it ourselves? Languageseeker (talk) 03:51, 24 April 2021 (UTC)
Where is its code? [I will presume that it has been suitably licensed as Coren was good that way.] What are the prerequisites for running the bot? Fro where are we planning to run the bot? It says from the DB, rather than pulled from the API. Which db? Has all the data been added to WD, and pulled from there, or what? — billinghurst sDrewth 08:57, 26 April 2021 (UTC)
Pictogram voting comment.svg Comment It might also be worth investigating if we could build the raw per-index "page-at-status-X counting" functionality into the ProofreadPage extension and then 1) we wouldn't need any bot and 2) all Wikisourcen can benefit. I do not have a handle of how murderous (or not) that might be on the server side with respect to page render times, so it may be a complete non-starter.
I have opened phab:T281195, but I probably don't have to time to dig into a PHP job any time soon. Inductiveloadtalk/contribs 21:56, 26 April 2021 (UTC)
The Wikisource:Monthly Challenge stats are now (I hope!) being automatically generated by a new script, which was a total duplication of effort that exists in BookwormBot, but seems to be at least functional for now. So I am now neutral - I'm not sure we need BwB any more. Inductiveloadtalk/contribs 22:29, 8 May 2021 (UTC)

Index:Who's who in the Far East, 1906-7, June (IA whoswhoinfareast00hongrich).pdf[edit]

I'd like to use my bot (User:SLiuBot) to import texts of the book from The Integrated Information System on Modern and Contemporary Characters (a database curated by (Q10875101)). I have already uploaded 10 test entries on my user pages (see User:Stevenliuyi/Who's Who in the Far East (June) 1906-7). Since I am new here, any advices/suggestions are very welcome. Several other English biographical dictionaries in that database are also in the public domain, and I also have the plan to import those dictionaries after completing this one, but that will be a separate bot request. Right now I just want to make the first one right. --Stevenliuyi (talk) 04:20, 30 April 2021 (UTC)

I have fixed some mirror issues in the test pages. If there is no objection, I plan to upload all texts in the book. --Stevenliuyi (talk) 18:04, 4 May 2021 (UTC)
I have finished uploading the texts (see Who's Who in the Far East (June) 1906-7). Please let me know if you have any suggestions. --Stevenliuyi (talk) 01:24, 6 May 2021 (UTC)

Repairs (and moves)[edit]

Designated for requests related to the repair of works (and scans of works) presented on Wikisource

The Pickwick Papers[edit]

Move The Pickwick Papers to The Pickwick Papers (Gutenberg) because The Posthumous Papers of the Pickwick Club (Charles Dickens edition) also exists. Languageseeker (talk) 13:50, 17 April 2021 (UTC)

I have moved the work and left a redirect. I am sorry but The Posthumous Papers of the Pickwick Club (Charles Dickens edition) is simply not ready for display as it is basically a pointer to another page set of pages, and a set of pages under a complete unwieldy name. The target is not a suitable target. Display of the works like that as two volumes is in my opinion an efficient display methodology. Just because a work was published in two volumes due to limitations in publishing methodology is no reason for us to present it like that. — billinghurst sDrewth 03:15, 18 April 2021 (UTC)
I have converted the page The Posthumous Papers of the Pickwick Club (Charles Dickens edition) to a redirect as that is pretty much how it should be at this stage. — billinghurst sDrewth 03:20, 18 April 2021 (UTC)

Index:Posthumous papers of the Pickwick Club (Serial Volume 19).pdf[edit]

Can pickwick19_20 0037a and pickwick19_20 0037b be inserted from [here] into File:Posthumous papers of the Pickwick Club (Serial Volume 19).pdf Languageseeker (talk) 13:42, 22 April 2021 (UTC)

Index:Works of Thomas Carlyle - Volume 17.djvu[edit]

Can the two missing pages from this index be inserted from [2] or the entire source replaced? No idea how to make DJVUs. Languageseeker (talk) 13:10, 23 April 2021 (UTC)

The link provided is to volume 19. Perhaps some research to find a nicely scanned set. CYGNIS INSIGNIS 13:43, 23 April 2021 (UTC)
Oops, link updated. Languageseeker (talk) 13:47, 23 April 2021 (UTC)
@Cygnis insignis: Are you saying that it would make sense to replace all 30 volumes with the full-color ones? Languageseeker (talk) 13:49, 23 April 2021 (UTC)
I would suggest replacing them with scans from the NY Public Library or IA collection:University of Toronto, texts in monochrome (eg. Google, and to a lesser degree Cornell) should be viewed with caution due to other compression and poor text layer. The effort in these works is transcribing, better to get the scans right for 30 volumes. I looked at this when setting up some Carlyle to transcribe, I don't remember what swayed me to the scans I ended up finishing. CYGNIS INSIGNIS 14:15, 23 April 2021 (UTC)
Agreed. I didn't create this collection, so I didn't pick the scans. But, I'm happy to create a list of volumes for replacement if someone will commit to creating the DJVUs and replacing the existing ones. Languageseeker (talk) 14:27, 23 April 2021 (UTC)
Check with the creator, it was @Ratte: if I recall, perhaps you could replace the files at commons (assuming that there is no work on the indices since yesterday). CYGNIS INSIGNIS 14:33, 23 April 2021 (UTC)
Yes check.svg Done by Xover. Ratte (talk) 16:04, 26 April 2021 (UTC)
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. — billinghurst sDrewth 12:48, 10 May 2021 (UTC)

Index:Rudimentary Treatise on the Construction of Locks (1853).djvu[edit]

The text layer in this is off by one page. The original DJVU (from before pages being removed) also had this problem although the cover's text layer didn't contain anything. -Einstein95 (talk) 14:49, 10 May 2021 (UTC)

@Einstein95: Yes check.svg Done I had to compress it quite aggressively due to phab:T278104, so please let me know if any pages have become unreadable and I'll try to tweak it. Xover (talk) 06:24, 11 May 2021 (UTC)
@Xover: FYI: After an unreasonable amount of experimentation and being told that >100MB files are basically user error (rest of rant omitted), I have found that bypassing the upload pages altogether, creating a File: page at Commons with the description, and then uploading a file directly into that with Rillke's chunked upload JS works for me.
Also Pywikibot should now be fixed (pending next release) to allow async uploads, and has been flawless for me since. My hypothesis is the PHP library used by IA-Upload has to same issue. Inductiveloadtalk/contribs 07:01, 11 May 2021 (UTC)
There is no way I am buying that this is a client-side issue. The exceptions getting thrown are down in the DB layer. If manually pre-creating a file description page works then that must mean it is the metadata part of the overall file upload process that is triggering it (not, perhaps, surprisingly), but it's still a server-side problem for which this is merely a workaround.
Who told you it was user error? The Pywikibot folks? Chunked upload, which exists specifically to upload >100MB files, is definitely supported (fsvo) functionality. --Xover (talk) 07:09, 11 May 2021 (UTC)
Rant continued on your talk page for avoidance of derailing this thread ^_^. Inductiveloadtalk/contribs 07:46, 11 May 2021 (UTC)

Other discussions[edit]

Policy on substantially empty works[edit]

[This is imported from WS:PD, where it applies to multiple current proposals, and several other works].

We have quite a few cases of works that are "collective" or "encyclopaedic" in that they comprise many standalone articles of individual value, which are basically just "shell pages", with no substantial content of any sort, not even imported scans or Index pages. For example, and this isn't intended to make any statement about these specific works, they're just examples and they may well get some work done soon during their respective WS:PD discussions:

Based on the usual rate of editing for things like that, unless dragged up into a process like WS:PD, they'll remain that way a very, very long time. I think it is perhaps there might be a case to host a mainspace page for this work, even though there is zero, or almost zero actual content. Do we want:

  • Mainspace pages where this is a tiny bit of information like header notes, scan links and maybe detective work on the talk page (not in this case). This provides a place for people to incrementally add content. Also gives "false positive" blue links, since there is actually no "real" content from the work itself, or
  • Do not have a mainspace page until there's some content. Only host this in terms of scan links author/portal scan links, much like we do for something like a novel.

Personally, I lean (gently) towards #2, but with a fairly low bar for how much content is needed. Say, Indexes, basic templates, a title page and one example article. Ideally, a completed TOC if practical, especially for periodical volumes/numbers. It is fair to not wish to transcribe entire volumes of these work, it is fair to not want to import dozens of scans when you only wanted one, it is fair to only want an article or two, but it's not fair, IMO, to expect the first person who wants to add an article to have to do all the groundwork themselves, despite having been lured in with a blue link. That onus feels more like it should be on the person creating the top-level page in the first place.

I do see some value in periodical top pages with decent lists of volumes and scans where known, because these are often tricky and fiddly to compile from Google books/IA/Hathi, so it's not useless work, even if there are no imported scans (though imported is better than not).

We currently have a large handful of collective works listed for deletion right now in various levels of "no real content", and, furthermore, every single periodical that gets added can fall into this situation unless the person who adds, so I think we could have a think about what we really want to see here. Inductiveloadtalk/contribs 15:43, 3 July 2020 (UTC)

  • I believe that, if there is no scan as an Index: page, the main-namespace page should not exist unless it is being actively completed or is already mostly completed. A few pages (of the volume itself) is not very helpful, and is entirely useless if their is no scan given. TE(æ)A,ea. (talk) 15:59, 3 July 2020 (UTC).
  • I think such preparatory information would ideally be on more centralized WikiProject pages (for the broad subject), both for clarity and to assist in keeping different efforts consistent -- but that it certainly should be retained as visible to non-admins. I think that the red vs blue link issue is minor (but not totally negligible) and outweighed by the disadvantages of hiding the history of previous efforts. I strongly encourage redirecting such pages to appropriate WikiProject pages (after copying over the details there). JesseW (talk) 18:11, 3 July 2020 (UTC)
  • @JesseW: I agree that history shouldn't be deleted, but I think we should approach this in terms of what we want to see from these works, rather than what to do with the handful of examples at PD. There are hundreds of periodicals we could have but don't, and this applies to those as well. If we can come to a conclusion about what is and isn't wanted, we can make all the deletion requested works conform to that easily enough. Inductiveloadtalk/contribs 20:55, 3 July 2020 (UTC)
  • I think these pages are necessary to list index pages and external scans of multi-volume works (such as encyclopaedias and periodicals) especially if they are wholly or partly anonymous or have many authors or are simply large. I think it makes no difference whether such pages are in the mainspace, the portal space or the project space (except that it is harder to find pages outside the mainspace). The point is that these works often have so many volumes (often dozens or hundreds) that they must have their own page, and cannot be merged into a larger portal or wikiproject. If the community starts insisting on index pages, what will happen is the rapid upload of a large number of scans for the periodicals that already have their own page. Likewise if the community insists on transclusion. I also think it is reasonable to have a contents page in the mainspace, as it allows transclusion of articles. Most importantly, new restrictions should not immediately apply to existing pages that were created before the introduction of the restrictions. This is necessary to prevent a bottleneck. James500 (talk) 23:55, 3 July 2020 (UTC)
move the works to a maintenance category, and i will work them; delete them and i will not: i find your sword of Damocles demotivating. Slowking4Rama's revenge 01:55, 5 July 2020 (UTC)
@User:Slowking4: I am not proposing a sword of Damocles. I agree that the imposition of deadlines is counter-productive. I do not support the deletion of any of these pages. I would prefer to see them improved. James500 (talk) 04:38, 5 July 2020 (UTC)
TEA is on his usual deletion spree. not a fan. will not be finding scans to save texts, any more. he can do it. Slowking4Rama's revenge 00:15, 6 July 2020 (UTC)
The entire point of moving this here, and not staying at WS:PD is to decouple from the emotions that get stirred up in a deletion discussion. Let's keep deletion out of this. If we come up with some idea of what we do and don't want, then we can go back to WS:PD and decide what to do. I imagine that all that will be needed will be a fairly limited amount of housework to bring those works up to some standard that we can decide on here, and all the collective works there will be easy keeps. Hopefully with some kind of consensus that we can point at to outline a minimum viable product for such works going forward. There are hundreds and thousands of dictionaries, encyclopedias, periodicals and newspapers that we could/will, quite reasonably, have only snippets of. How do we want to present them? What, exactly, is the minimum threshold? Let's head of all those future deletion proposals off at the pass, because deletion proposals often cause friction. Inductiveloadtalk/contribs 00:47, 6 July 2020 (UTC)
and yet deletion is the default method to "motivate" quality improvement. i reject your assertion that "emotions get stirred in a deletion discussion", rather, anger is a valid response to a repeated broken process being kicked down on the volunteers. it is unclear that a minimum threshold is necessary, rather a functional quality improvement process is. until we have one, you should expect to see this periodic stirring of emotions, as the non-leaders act out. Slowking4Rama's revenge 11:53, 9 July 2020 (UTC)
@Slowking4: Thank you for presenting this opinion, and I'm sorry if I have not made myself clear. We do need to figure out how to avoid a de-facto process of using WS:PD as an ill-tempered ad-hoc venue for "forcing" improvements on people who have somehow managed to generate works that are so in need of improvement that another user has nominated them for deletion. Please also consider looking at #Re-purpose_WikiProject_OCR_to_WikiProject_Scans for an idea to have a "functional quality improvement process" to which such works could be referred upon discovery rather than kicking them straight to WS:PD. If you have other ideas or you have previously suggested something similar to address these frustrations, you could detail them there. Personally, I think we should always prefer improvement over deletion. Exactly what the remediation is (refer to a putative WP:Scans, WS:Scriptorium/Help, directly WS:PD as now, or something else) is not what this thread is for. This thread is for discussing, what, if anything, should be the tipping point for deeming a page "lacking" and doing something about, whatever "something" is. I don't think I can be much clearer that this is not about deletion. If we also have a better venue for improvements, then that's even better.
For example, my personal feeling and !vote on A Critical Dictionary of English Literature is "keep and improve", despite it lacking scans or even links to scans, having only one article and no other content, not even a title page: in short, failing almost every criterion suggested so far in this thread. The only thing it does have is have is good text quality of the one entry. I personally do not think this work should be deleted, but I do think it should be improved in specific ways. The first half of that sentence is not the focus of this discussion, the second half is. Inductiveloadtalk/contribs 14:18, 9 July 2020 (UTC)
deletion threat has been an habitual method of communicating by admins since the beginning of the project. and text dumps have been habitual following in the guttenberg example. culture change and process change would be required to change those behaviors. we could may it easier to start scan backed works, but the wishlist was not supported. Slowking4Rama's revenge 21:00, 14 July 2020 (UTC)

I don't think this needs to be much of an issue going forward -- we all agree that it's OK to create Index pages for scans, even if none of the Pages have been transcribed yet; so the only case where this would come up is recording research where no scan has yet been identified as suitable to be uploaded. And for that, I still think a WikiProject page is the right location, not mainspace. (Or, if you must, your userpage.) JesseW (talk) 00:59, 6 July 2020 (UTC) I realized I may not have been clear enough here -- in my view, the ideal process goes like this:

  1. Decide on a work you are interested in (in this case, a periodical/encyclopedic one) -- don't record that anywhere on-wiki (except maybe your user page)
  2. Find and upload (to Commons) a scan of one part/issue/etc of the work.
  3. Create a ProofreadPage-managed page in the Index: namespace for the scan. (You can stop after this point, without worry that your work will later be discarded.)
  4. EITHER
    1. Put further research (on other editions, context, possible wikification, etc.) on that Index_talk page.
    2. Proofread a complete part of the scan (an article from the magazine issue, a chapter from the book, a entry from an encyclopedia, etc.) and transclude it to the mainspace (and create necessary parent pages), and put the further research on the Talk: page of the parent mainspace entry.

If you can't find any scan, and don't want to leave your working notes on your user page, put them on a relevant WikiProject's page.

If you come across such research done by others and misplaced, follow the above process to relocate it to an appropriate place, then redirect the page where you found it to the new location. That's my proposal. JesseW (talk) 01:08, 6 July 2020 (UTC)

@JesseW: It's not clear to me in your above whether when you use the term "index" you refer to a ProofreadPage-managed page in the Index: namespace, or a general wikipage in the main namespace on which an index-like structure (and/or a ToC, or similar) is manually created. Could you clarify? --Xover (talk) 05:14, 6 July 2020 (UTC)
I meant the namespace. Clarified now. JesseW (talk) 05:17, 6 July 2020 (UTC)
  • Hoo-boy. Y'all sure know how to pick the difficult issues…
    My general stance is that: 1) scans and Index: (and Page:) namespace pages have no particular completion criteria to meet to merit inclusion, and can stay in whatever state indefinitely (there may be other reasons to get rid of them, but not this); and 2) the default for mainspace is that only scan-backed complete and finished works that meet a minimum standard for quality should exist there.
    That general stance must be nuanced in two main ways: 1) there must be some kind of grandfather clause for pre-existing pages; and 2) there must exist exceptions for certain kinds of works that meet certain criteria. I won't touch on the grandfather clause here much, except to say I'm generally in favour of making it minimal, maybe something like "No active effort to get rid of older works, but if they're brought to PD for other reasons they're fair game". The design of a grandfather clause for this is a whole separate discussion, and an intelligent one requires analysis of existing pages that would be affected by it. It is always preferable to migrate pages to a modern standard, so a grandfather clause is by definition a second choice option.
    Now, to the meat of the matter: the exceptions…
    We have a clear policy to start from: no excerpts. Works should either be complete as published, or they should not be in mainspace. But quite apart from the historical practices that modify this (which are somewhat subjective and inconsistent, so I'll ignore them for now), there are some fairly obvious cases that suggest a need for more nuance than a simple bright-line rule alone provides. The major ones that come to mind are: 1) massive never-completed projects like EB1911 or the New York Times (EB because it's big; NYT because new PD issues are added every year); 2) compilations or collections of stand-alone works with plausible claim to independent notability.
    For encyclopedias and encyclopedia-like things, we have to accept some subsets due to sheer scale of work. But when that is the grounds for exception, there needs to be some minimum level of completion. I'm not sure I can come up with a specific number of pages/entries or percentage, but it needs to be more than just a single entry (and, obviously, only complete entries). For this kind of exception to apply, I think it needs to be a requirement that the framing structure for it is complete: that is, the mainspace page should give a complete overview of the relevant work even if most of it is redlinks. That includes title pages and other prolegomena when relevant. For a periodical like the NYT, that means complete lists of issues with dates and other such relevant information (e,g. name changes etc.). For preference, these kinds of things should be in Portal: namespace or on a WikiProject page until actually complete, but that will not always be practical (EB1911 and NYT are examples of this). Mainspace or Portal:-space should never contain external links (i.e. to scans) or links to Index: or Page: space (except the implied link of transclusion and the "Source" tab in the MW UI provided by ProofreadPage).
    For exception claimed under independent notability there are a couple of distinct variants.
    Newspaper or magazine articles need to have a certain level of substance in addition to a specific identifiable byline (possibly anonymous or pseudonymous, and possibly identified after the fact by some other source, such as the Letters of Junius) in order to qualify. It is not enough to ipso facto be a newspaper article, a magazine article, a poem, or an encyclopedia entry. On the one hand we have things like dictionaries and thesauri, where an entry could be as little as two words. Or a one-sentence notice without byline in a newspaper. Or two rhymed lines (technically a poem) within a 1000-page scholarly monograph.
    To merit this exception it should be reasonable to argue that the "work" in question should exist as a stand-alone mainspace page (not that we generally want that; but as a test for this exception, it should be reasonable to make such an argument). This would clearly apply to moderately long entries in the EB1911 written by a known author that has their own Wikipedia article. It would apply to short stories or novella-length serialisations in literary magazines by authors that have later become famous (or "are still …"). It would apply to various longer-form journalistic material from identifiable journalists (again, rule of thumb is notable enough for enWP article), including things in magazines that have similar properties. For most periodicals the most relevant atomic (indivisable) part is the issue not the entry or article, but with some commonsense exceptions.
    It would, generally, not apply to things that are works by a single author, like a scholarly monograph that just happens to be arranged in "entries" rather than chapters. It would not apply to things that are essentially lists or tables of data. It would not apply to short entries in something encyclopedia-like or entries that are not by an identifiable author. The OED for example, iirc, is a collective work where entries are by multiple not individually identifiable authors (and each entry is mostly very short too); only the overall editor is usually cited.
    For works claiming this exception too the framing structure should be complete, even if most of it are redlinks. The same general rules about Portal:/WikiProject and no external or Index:-space links apply. An exception would be for periodicals where new issues enter the public domain every year; and we should generally avoid including even redlinks for the non-PD issues here (but may allow them in a WikiProject page). For non-periodical works in multiple volumes where some volumes were published after the PD cutoff, including listings for the non-PD volumes (but not links to scans; those are a copyvio issue) is ok.
    Poems, short stories, and novellas are a special class of works here. A lot of these were first published in a magazine (possibly serialized), and a lot of them exist as multiple editions in substantially the same form. Some exist in multiple versions. These should all primarily exist the same way as chapters as part of their various containing works; but there are some cases where we might want to have, for example, a series of connected pages of the poems of Emily Dickinson. I am significantly ambivalent about this practice, as it amounts to making our own "edition" or "collection" of her poems (in violation of several of our other policies), but I acknowledge that it is an established practice and it is something that has definite value to our readers. It may be that it is actually a practice that should be governed by its own dedicated policy rather be attempted to be handled within these other general policies.
    For the sake of example; applying this to the works Inductiveload listed at the start of this thread would shake out something like this:
    Auction Prices of Books—This work appears to have no sensible subdivisions and is in any case by a single author. I see no obvious reason to grant this work an exception, except under sheer volume of work and even there I would want to see both a substantial proportion completed and some kind of ongoing effort towards completion (no particular time frame, but definitely not infinite and definitely not as an effectively abandoned project). In a deletion discussion I would very likely vote to delete the mainspace pages here (but, as nearly always, to keep the Index: and Page: namespace artifacts). I don't see this as a reasonable candidate for a Portal:, nor really a good fit for a WikiProject (though I probably wouldn't object to a WikiProject if someone really wanted one).
    Central Law Journal/Volume 1—A single volume is too little, so I would want to see a complete structure for the entire Central Law Journal, with level of detail for each volume similar to the one existing volume. Each article in the journal can be individually considered for a stand-alone work exception; but for the collection I would want to see at minimum a full issue finished to justify having the mainspace structure, and preferably multiple issues (in a deletion discussion I might insist on multiple issues). Index: and Page:-space artefacts can, of course, stay. A Portal: might make sense for selections from the journal, of articles that meet the standalone work exception. A WikiProject to coordinate work and track links to scans etc. might be a decent fit here, if someone wanted that. As it currently stands I would probably vote delete for the mainspace artefacts (with option to move whatever content has reuse value to a non-mainspace page for preservation; and undeleting if someone wants to work on something is a low bar).
    A Critical Dictionary of English Literature—The top level mainspace page has near-zero value, existing only to link to the single transcribed entry. For a credible claim to exception to exist it would need to be a complete framework for the work as a whole, and significantly more than a single entry must be complete. I would probably also want to see ongoing work, unless a substantial percentage of the entries were complete. The single finished entry is eligible to claim a standalone work exception, but I think it probably would not meet my bar for that (I might be wrong; and the rest of the community might judge it differently). In a deletion discussion I would probably vote to delete all the mainspace artifacts here (as always keeping Index:/Page: stuff) but with a definite possibility that I might be persuaded on the one completed entry (an absolute requirement for convincing me would be to scan-back it: as a separate issue, my tolerance for grandfathering of non-scan-backed works is small, and effectively zero for new/non-grandfathered works).
    Bradshaw's Monthly Railway Guide—Would need a full framework and a number of individual issues finished to merit a mainspace page. I see no credible subdivisions for a standalone work exception, but might be persuaded otherwise if, say, one of the train tables was used as a (reliable primary) source in a Wikipedia article (implying some sort of notability beyond just being raw data). In a deletion discussion I would probably vote to delete all mainspace artifacts here. If anyone made the argument, I would entertain the notion that there is value in treating train tables like poems, and hosting a series of train tables like we do Dickinson's poems; but that would require a substantial number of them completed.
    For everything above my stance is nuanced by a willingness to accept temporary exceptions for things that are actively being worked: active being operative, but with no particular deadline to complete the work. We have differing amounts of time available, and some works are so labour-intensive or tedious to do, that my person threshold for "active" is a pretty low bar to clear. If it's months and years between every time you dip in and do a bit I might start to get antsy, but days or weeks probably won't faze me. And that the projected time to completion is very long at that pace is not particularly a problem so long as it is not infinite. Within those parameters I would always tend to err on the side of letting contributors just get on with it in peace, regardless of any of the policy-like rules sketched above.
    I also want to emphasise that I think this is a very difficult issue to deal with. There are a lot of competing concerns, and a lot of grey areas that will likely take individual discussions to resolve. My balance point on this issue is partly formed by a broader concern about our overall quality (we have waay too many works of plain sub-par quality, and too many not up to modern standards) and a hope that by preventing the creation of these kinds of works (rather than deleting them after creation) we will be able to retain the good and desirable exceptions without dragging down quality, and without the traumatic and stressful events that deletions and proposed deletion discussions are.
    And for that very reason I am grateful this issue was brought up here for discussion, and I hope we can end up with some clear guidance, possibly in the form of a policy page, going forward. And in any case, since it will create de facto policy, this is a discussion that needs to stay open for a good long while (there are several community members that have not yet commented whose opinion I would wish to hear before closing this), and depending on how well we manage to structure the consensus, may also require a formal vote (up in the #Proposals section). --Xover (talk) 09:03, 6 July 2020 (UTC)
  • Symbol oppose vote.svg Oppose. It is becoming clear that a policy on incomplete works in the mainspace is going to place enormous pressure on individual editors. I think it would be more effective to start a wikiproject devoted to scan-backing works that lack scans and so on. James500 (talk) 12:14, 6 July 2020 (UTC)
    • @James500: FYI, this thread was made in order to provide an exception to the current policy of "no excerpts". A literal reading of the policy as it stands has a plausible chance of coming down delete on the mainspace pages over at WS:PD. This thread is a chance to come up with a better way to support such partial collective works. That we have several substantially incomplete and abandoned collective works lolling around in mainspace is actually the result of laxity in respect to stated policy (not to say I think it's a bad thing). The deletion proposals, whatever you may think of them, are actually not in contradiction to policy. That said, as always, there is scope to adjust policy. Which is what this is.
    • Now, in terms of a WikiProject to scan back works, I think that is a good idea. See #Re-purpose_WikiProject_OCR_to_WikiProject_Scans above, which proposed to reboot Wikiproject OCR as a scan-backing Wikiproject. Inductiveloadtalk/contribs 14:40, 6 July 2020 (UTC)
      • The policy says "When an entire work is available as a djvu file on commons and an Index page is created here, works are considered in process not excerpts." A literal reading of that policy is that no scan-backed work is an excerpt (it is expected to be completed eventually). Further the policy refers to "Random or selected sections of a larger work". A literal reading of that expression is that it does not include lists of scans, or auxilliary content tables, as they are not "sections" (they are not part of the work), and that not every incomplete portion of a work is either "random or selected" (which would not include starting from the beginning and getting as far as you can, with intent to finish later). I could probably argue that an encyclopedia article or periodical article is a complete work. James500 (talk) 15:16, 6 July 2020 (UTC)
  • Nice wall of text, Xover (and I say that with great respect!) -- it generally makes sense and sounds good to me. As another hopefully illustrative example, take The Works of Voltaire, which I've been digging thru lately. I think this would very much satisfy your criteria as a large work, with sufficient scaffolding to justify the mainspace pages that exist for it. I would love to hear others thoughts on that. JesseW (talk) 16:07, 6 July 2020 (UTC)
    @JesseW: Yeah, apologies for the length. Brevity is just not my strong suit.
    The Works of Voltaire probably qualifies on sheer scale of work, yes. I don't think the current wikipage at The Works of Voltaire is quite it though: as it currently stands it is more WikiProject than something that should sit in mainspace (its contents are for Wikisource contributors, to organise our effort, not our readers, who want to read finished transcriptions). It also mixes a work page with a versions page in a confusing way. So I would probably say… Move the current page to Wikisource:WikiProject Voltaire; create a new The Works of Voltaire as a pure versions page, linking to…; The Works of Voltaire (1906), that is set up as a work page with the cover and title (and other relevant front matter) of the first volume, and an AuxTOC (and possibly also the {{Works of Voltaire}} volume navigation template). I don't know how tightly coupled the volumes of this edition are (does the first volume have a common ToC or index of works for all the volumes?), so some flexibility on format may be needed to make sense. But as a base rule of thumb it should start from a regular works page and deviate only as needed to accommodate this work (mainly the size is different).
    In any case… With a volume or two completed (they're only ~350 pages each) I'd be perfectly happy having something like that sitting around. With less then that I'd possibly be a bit more iffy, but it's hard to put any kind of hard limit on that. And with somebody actively working on it I'd be in no hurry whatsoever regardless of current level of completion.
    PS. I'm pretty sure a large proportion of the contents of these volumes are works that would qualify under "standalone works" that could exist independently in mainspace, regardless of what's done with the The Works of Voltaire page. Even his individual poems and essays can presumably make a credible claim here (because it's Voltaire; less famous authors would have a higher bar). Better as part of the edition, but also acceptable on their own. --Xover (talk) 16:56, 6 July 2020 (UTC)
  • @JesseW: I personally take no issue with this page's existence (actually I think it's a nice work and good way to allow an important author's works to be slotted in piece-by-piece. I have some general comments which overlap with this thread (written before Xover's reply, so pardon overlap):
    • First off, I differ with Xover in terms of the scan links: I think they're better than nothing, and I don't see much value in duplicating the volume list onto an auxiliary page just to add scan links. However, I can sympathise with the sentiment that our mainspace shouldn't direct users off-wiki (or at least off-WMF). But if we don't have the scans, and that's what the user wants, they're leaving anyway. Real answer: import moar scans!
    • No scan links are necessary where the volume exists in mainspace and is scan-backed (e.g. v3)
    • Ext scan links should only be used when there is no Index page or imported scan. Use {{small scan link}} or {{Commons link}} when possible (e.g. v2)
    • The first volume list could probably be in an AuxTOC to mark it out as WS-generated content.
    • The "Other editions" section belongs on an auxiliary namespace page (Talk, Portal or Wikisource). I suggest the Talk page is best in this case. Inductiveloadtalk/contribs 17:35, 6 July 2020 (UTC)
  • @Xover: I am in agreement with the majority of what you say. Particularly, I think a framework around any collective work (be it a single-volume biographical dictionary or a 400-issue literary review spanning 80 years) is the critical prerequisite, plus at least some scans, the more the merrier. Where I think I differ:
    • I am inclined to be a bit more relaxed in terms of how much of a work we need. As long as a single article exists, it's not "trivial" (e.g. only a short advert or some incidental text like a "note to correspondents", as opposed to an actual article), it's well-formatted and scan-backed, and a complete framework exists, including front matter and a TOC, such that's it is easy for anyone to slot in new pieces, I'd be fairly happy. Lots of periodicals have all sort of tricky bits like tables of stocks or weather tables and writing into policy that those must be proofread in order to get the "real" articles into mainspace would be a chilling effect, in my opinion. If you allowed an exception, it would be verbose and tricky to capture the spirit without saying "unless, like, it's totally, like, hard, man".
    • I am not dead against scan links in the mainspace at the top level, when such a top-level page exists. See my comments on Voltaire above. I am against them where they could sensibly be on an Author page and they are the only mainspace content.
    • I am ambivalent on the presence of, e.g., disjointed train timetables. It's not my thing to have a smattering of random timetables, but as long as they're individually presented nicely, it's not too offensive to my sensibilities. I might question the sanity of someone who loves doing tables that much, but whatever floats the boats! Also, I think that this might circle back to "good for export" - a mark which certainly would require completed issues or volumes. If you want to get that box ticked, you have to do it all.
    • Re the "notability" aspect of individual articles, I'm not really bothered by that, as I don't think we'll see a flood of total dross because few people really want to take the time to transcribe 1867 articles about cats in a tree from the Nowhere, Arizona Daily Reporter, and, actually I think some of the "dross" can be quite interesting in a slice-of-life kind of a way (always assuming well-formed and scan-backed). And the real dross is usually so bad (no scans, raw OCR, etc) that it can be dealt with outside of this topic. I think part of the value of WS is the tiny, weird and wonderful, not just in blockbusters like War and Peace and Pultizers. I think I might like to see more of our articles strung together thematically via Portals, but that's another day's issue. Inductiveloadtalk/contribs 17:35, 6 July 2020 (UTC)
      • @Inductiveload: We appear to be mostly in agreement. But… instead of me dropping another wall of text on the remaining points of disagreement, maybe that means we're in a position to try to hash out a draft guidance / policy type page with the rough framework? Then we could go at the remaining issues point by point. Because I think I'm in with a decent chance to persuade you to my point of view on at least some of them, but this thread is fast getting unwieldy (mostly my fault). It would also probably be easier for the community to relate to now, and much easier to lean on in the future. --Xover (talk) 18:31, 6 July 2020 (UTC)
        • @Xover: If there are no more comments forthcoming after a couple of days, I think that makes sense. I don't want to railroad it: considering we have at least one !vote for "do nothing", I'd like to see if there are any other substantially different opinions floating about. Inductiveloadtalk/contribs 17:41, 7 July 2020 (UTC)

The quantity of text here has grown far faster than my ability to absorb it, so rather than continue to put it off, here's my position: I don't see any problem with transcriptions that are scan-backed, even if the transcription only covers a small fraction of the entire scan. If Sally chooses (say) to transcribe a favorite story, that happened to be published in an issue of Harper's back in the 1890s, and goes to the trouble of uploading the full issue, but only creates pages for the one story that interests her, I think that's great. It doesn't matter to me whether she intends to work on the other pages or not. If it's not scan-backed, but it's fairly high quality, I am personally willing to do some work trying to locate a scan and match it up to the text; I'd rather we take that approach, than deletion, though of course deletion is the better option in some cases where the scan is very hard to come by.

If all this has been said above, or if I've misunderstood the topic, my apologies. Please take this comment or leave it, as appropriate. -Pete (talk) 02:00, 8 July 2020 (UTC)

Apologies, I see I had missed the point.

I disagree with Xover's statement that a top-level page for a publication, with a link only to a single article within the publication, has "near-zero value." Such a page can serve an important function linking content together in ways that help the reader (and search engines) find the content they're looking for, or understand the context around it. For instance, A Critical Dictionary of English Literature is linked from the relevant Wikidata entry. The banner on the Wikisource page clearly tells a Wikisource reader that they won't find a full transcription here; and with a simple edit, it could link to a full scan on another site, or (with perhaps a little more effort) even transcription links here on Wikisource. This page has been here since 2010; we don't have any way of knowing what links might have been created elsewhere in the intervening decade. (I do think that new pages like this should not be created without a scan at Commons to be linked to.) -Pete (talk) 02:12, 8 July 2020 (UTC)

I'm really bad with walls of text, so I have only read a tiny portion of the above discussion. But I want to mention a couple of things that I think are worth considering in this discussion.
  • Most of the time, a mainspace "work" that is only a table of contents, but which has none of the actual content, and is not actively being worked on, can be (and should be) deleted as No meaningful content or history under our deletion policy.
  • A mainspace work that has only a little bit of content, but that content is a work unto itself within the scope of Wikisourse, should be kept. Most periodicals are like this. For an example, see the Journal of English and Germanic Philology which only has one hosted article, but that hosted article is scan-backed and firmly within scope.
  • On some occasions, empty mainspace works do have value. I ended up creating the page The Roman Breviary, depsite containing no actual content, mostly because there are a lot of works that link to it, using many different titles, and if someone uploaded a copy of the work under one title then many of the links would remain red because they point to different titles of the work. This could be easily solved by creating redirects to a simple placeholder page, so I did. I tried to make the placeholder page as useful as a placeholder page can be, as it contains useful information about the history and authorship of the work, and links to the Index pages where the transcription will take place.

Anyway those are my 2 cents, sorry if they are redundant —Beleg Tâl (talk) 00:40, 29 July 2020 (UTC)

Proposal[edit]

Since there has been no extra input for a month, and not wanting this section to get archived without at least attempting a proposal, I have started a proposal #Collective work inclusion criteria above. Inductiveloadtalk/contribs 11:00, 25 August 2020 (UTC)

Since the proposal has now slipped off the main page (to here), with vague support for the first part (collective work inclusion criteria) and a fairly consistent opposition to the second (no-content pages), my plan is to transfer the first part, as guidelines rather than policy, to Wikisource:Periodical guidelines. As non-binding guidelines, they can then be worked on further in situ. Sound OK? Inductiveloadtalk/contribs 08:10, 16 April 2021 (UTC)
The example given in Wikisource:Periodical guidelines might be improved, PSM is and was an exercise that has gone its own way (no offense to @Ineuw:, this is a site under development and that is only one example).CYGNIS INSIGNIS 13:05, 17 April 2021 (UTC)
@Cygnis insignis: You would be wrong to think that I am offended. Remember that when I started, I knew everything. By now, so much of that knowledge is lost that I am happy to listen. Would you elaborate please? — Ineuw (talk) 19:50, 17 April 2021 (UTC)

I've created Bradshaw's Monthly Railway and Steam Navigation Guide (XVI) - it couldn't be done on one page, due to the very high number of template transclusions. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:52, 1 September 2020 (UTC)

None ASCII characters in article titles[edit]

the Dictionary of National Biography uses date ranges in disambiguation extensions to article titles. There was pressure by some Wikipedia editors to change from dash to ndash to fit in with the Wikipedia policy on this issue. Wikipedia handles the issue by having ndash in the article title and a redirect with a dash.

The use of dashes was justified for Wikisource because it simplifies the URL and that no redirects were necessary.

I am currently working on Wikipedia linking article to Wikisource articles in the Royal Naval Biography I have just come across these two articles:

The are no ASCII redirects

So should the articles to be moved, or should there be redirects, or are the names fine as they are and there is no need for redirects? -- PBS (talk) 14:59, 25 March 2021 (UTC)

@PBS: They should be moved. For punctuation like that (and dashes, quote marks, etc.) we use plain ascii in page names (nb. in page names: article titles are a different story). --Xover (talk) 15:24, 25 March 2021 (UTC)
I am using the term "article titles" to be the part of the URL that is used to make the URL unique. I think you are using the term page name to mean the same thing. -- PBS (talk) 15:29, 25 March 2021 (UTC)
Oh, yes, sorry; I should have been clearer. I just meant it as an aside in case there was confusion, but assumed you knew that so didn't want to over-explain. I should have just left it out. But, yes, the page name, which is what ends up in the URL, uses ascii punctuation. I'm just hedging around "article" since mainspace wikipages on Wikisource can contain zero or more "articles" (with or without non-ascii titles)—from a newspaper, for example—unlike Wikipedia where a mainspace wikipage by definition is "an article". --Xover (talk) 15:56, 25 March 2021 (UTC)

<-- There could be a lot of these how do I go about requesting a bot to run through the titles? -- PBS (talk) 12:24, 27 March 2021 (UTC)

Make a request at WS:BOTR. Beeswaxcandle (talk) 17:32, 27 March 2021 (UTC)

Small note: the turned comma in M‘Farland represents a superscript c (McFarland), not an apostrophe. So "McFarland" would be the better ASCII equivalent, perhaps with "M'Farland" as another redirect. (And Mc is just an abbreviation for Mac, as Geo is for George, but that's a separate discussion.) Pelagic (talk) 23:04, 17 April 2021 (UTC)

Google often unable to find works at Wikisource[edit]

I have noticed many times that Google is not able to find works hosted in English Wikisource. E. g. today I tried searching for "zawis and kunigunde" wikisource and the result was that Google found a subpage of my userpage where I just mentioned the work, a talk page where I asked for something connected with the work, it found even a Wikidata item connected with the work, but did not find the work itself. Is there anything we could do to make works better discoverable? --Jan Kameníček (talk) 12:43, 3 April 2021 (UTC)

@Jan.Kamenicek: Google finds that as the first result for both "zawis and kunigunde" wikisource and, for that matter, just "zawis and kunigunde" (both on the mobile website, oddly) as the first result. Since this is a recent page, it might just be that the Google crawler takes time to discover the page and add it to the index. The mobile website hit might be just that the mobile website has been indexed, but the main website hasn't yet. Inductiveloadtalk/contribs 13:38, 3 April 2021 (UTC)
It's now returning the main website (not mobile) page, so I guess the spider got there, and some magical algorithm decided (correctly) that main website was a better result. Inductiveloadtalk/contribs 19:22, 3 April 2021 (UTC)
if we linked at wikipedia, and wikidata, then works might be more findable. Slowking4Farmbrough's revenge 01:14, 4 April 2021 (UTC)
FWIW DuckDuckGo and Bing both found the pages without issue. — billinghurst sDrewth 06:26, 6 April 2021 (UTC)

@Whatamidoing (WMF): Do you have anybody who has an "in" with Google who could explain why this is happening? Or someone who can assist us with what is reasonably missing in our metadata? — billinghurst sDrewth 06:23, 6 April 2021 (UTC)

I don't. The last time I heard of anyone working on SEO stuff, it was @Deskana (who reclaimed his volunteer status a couple of years ago, so it's been a long time). Let me ask around. I'll let you know if I learn anything. Whatamidoing (WMF) (talk) 19:11, 6 April 2021 (UTC)

@Whatamidoing (WMF), @Billinghurst: Google populates its search engine a variety of ways, most of which are Google secrets. The most likely explanation for why it took a little bit to show up in Google is that Google doesn't crawl Wikisource as much as it crawls sites like Wikipedia, because Wikisource doesn't get as much traffic as Wikipedia so it's not as critical if it's slightly out of date. The page was created on 31st March, so it's really not surprising that it took a few days for it to be picked up. The fact it picked up other pages first is also not surprising, as Google doesn't necessarily crawl websites linearly. It's now first in the results for me if I search for "zawis and kunigunde", which is excellent. All the metadata in the page HTML looks good and I don't really think there's much you could do to improve it. The most important thing is the link to the item Wikidata in the schema.org format in the HTML, which I can see is in there (search the page source HTML for "sameAs" and you'll see it). In fact, that good metadata is probably why Google switched over the link to the desktop version from the mobile version, as the "canonical" URL for the page is given as the desktop version. I don't think there's really anything to do to improve things, things are already pretty great, and it'll just take a few days to pick things up sometimes. --Deskana (talk) 10:05, 7 April 2021 (UTC)

@Deskana: thanks so much for that fulsome explanation, it helps, and this bit
The most important thing is the link to the item Wikidata in the schema.org format in the HTML, which I can see is in there (search the page source HTML for "sameAs" and you'll see it). In fact, that good metadata is probably why Google switched over the link to the desktop version from the mobile version, as the "canonical" URL for the page is given as the desktop version.
indicates that we need to rouse up our transcribers to do a better job of adding decent Wikidata. We have not been rigorous in getting all users to do it, though it is a tricky beast which Wikidata does not particularly assist. Some of the bot operators create shell items, which may not be particularly better. — billinghurst sDrewth 11:11, 7 April 2021 (UTC)
I talked to a couple of folks, but they didn't have anything else to add. These are good ideas, and so is patiently waiting for Google to take notice of the page's existence. Whatamidoing (WMF) (talk) 22:56, 15 April 2021 (UTC)
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. — billinghurst sDrewth 12:50, 10 May 2021 (UTC)

Universal Code of Conduct – 2021 consultations[edit]

Universal Code of Conduct Phase 2[edit]

The Universal Code of Conduct (UCoC) provides a universal baseline of acceptable behavior for the entire Wikimedia movement and all its projects. The project is currently in Phase 2, outlining clear enforcement pathways. You can read more about the whole project on its project page.

Drafting Committee: Call for applications[edit]

The Wikimedia Foundation is recruiting volunteers to join a committee to draft how to make the code enforceable. Volunteers on the committee will commit between 2 and 6 hours per week from late April through July and again in October and November. It is important that the committee be diverse and inclusive, and have a range of experiences, including both experienced users and newcomers, and those who have received or responded to, as well as those who have been falsely accused of harassment.

To apply and learn more about the process, see Universal Code of Conduct/Drafting committee.

2021 community consultations: Notice and call for volunteers / translators[edit]

From 5 April – 5 May 2021 there will be conversations on many Wikimedia projects about how to enforce the UCoC. We are looking for volunteers to translate key material, as well as to help host consultations on their own languages or projects using suggested key questions. If you are interested in volunteering for either of these roles, please contact us in whatever language you are most comfortable.

To learn more about this work and other conversations taking place, see Universal Code of Conduct/2021 consultations.

-- Xeno (WMF) (talk)

20:45, 5 April 2021 (UTC)

Invitation to m:Talk:Universal Code of Conduct/2021 consultations/Discussion[edit]

I am interested in hearing the input of Wikisource users about the application of the Universal Code of Conduct, especially from the perspective of interactions on Wikisource. Xeno (WMF) (talk) 23:56, 17 April 2021 (UTC)

Help with Splitting Pages[edit]

Could somebody help me split the pages in commons:File:Paradise Lost 1674.djvu Languageseeker (talk) 00:14, 8 April 2021 (UTC)

Nothing at that link, and I would want to know a lot more about what is being proposed prior to acting. We stopped a lot of splitting and matching years ago due to issues with edition matching issues. — billinghurst sDrewth 14:14, 10 April 2021 (UTC)
@Billinghurst: Whoops. Link fixed. The scan does not have the pages split in half, i.e. page 1 and 2 are in the same image. Languageseeker (talk) 15:51, 10 April 2021 (UTC)
Why do you want to split them? Proofread them as they are. Just page number with on the odds, or evens. No different essential from doing a work with columns. Done numbers of works like that, works with no issue, just a little work on the page numbering, and there are plenty of meansto do that. — billinghurst sDrewth 16:22, 10 April 2021 (UTC)
@Billinghurst: Good to know. thanks. Didn’t want to create a mess.Languageseeker (talk) 05:37, 12 April 2021 (UTC)
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. — billinghurst sDrewth 12:50, 10 May 2021 (UTC)

Help Cleaning up Author:Charles John Huffam Dickens[edit]

Poor Dickens has got a terrible author's page. Most of the editions there are from posthumous reprints that have little value and in general it's a bit of a mess. Could someone help me clean up this page. We'd probably want to create subpages for each one of his works with the periodical versions, published edition, cheap edition, 1858 Library edition, and the 1867 Complete Works of Charles Dickens. Anything after that is just reprints until the Clarendon editions that are in copyright. Charles_Dickens_bibliography is a help, but that could also use expanding. Languageseeker (talk) 05:26, 8 April 2021 (UTC)

We would not want to create Author: subpages for each work. Versions pages in Mainspace for those who search by name of work and a simple indented list on the Author: page for those who come in by searching for Dickens is sufficient. Beeswaxcandle (talk) 06:04, 8 April 2021 (UTC)
@Beeswaxcandle: Are you sure? Oliver Twist has 12 authors authoritative editions alone. Languageseeker (talk) 20:46, 8 April 2021 (UTC)
Yes, I'm sure. Author: subpages for a single work is not the intention of such pages. Beeswaxcandle (talk) 05:11, 9 April 2021 (UTC)
@Beeswaxcandle: So, is something like Wuthering Heights a mistake? Languageseeker (talk) 05:16, 9 April 2021 (UTC)
I think (?) this is just a simple misunderstanding based on the way each of you is using the jargon. If I'm understanding correctly, @Languageseeker: means that we should pages for each work, to list the various editions of the work. If so, that seems sensible and common, and Wuthering Heights looks like as good an example as any of what's common practice around here. But we wouldn't call that a subpage -- I think what @Beeswaxcandle: was understanding you to mean was something of the form of [[Author:Charles John Huffam Dickens/Hard Times]], which would certainly be an anomaly. -Pete (talk) 05:30, 9 April 2021 (UTC)
Yes, I think that you’re exactly right. I want make Hard Times the page that list all the versions of the work instead of containing the text of one specific edition. Languageseeker (talk) 05:53, 9 April 2021 (UTC)
Then, why did you call it a "subpage"? You mean a versions page. Once we are hosting multiple versions (editions) of Hard Times, then we'll need a versions page. Until then, the single version can stay where it is and the list of other versions/editions should be on the Author: page. Generally, we prefer to avoid redlinks on the three types of disambiguation pages (disambiguation, versions, translations). [Knowing as I say this, that there are redlinks on some versions pages, but these do not set precedent.] Beeswaxcandle (talk) 06:49, 9 April 2021 (UTC)
Sorry for the poor terminology. Probably the result of a sleep addled brain. I'm still a little concerned about posting the lists on the Author Page because it's already starting to look like a mess because of the large number of versions that Dickens contributed to and the need to distinguish between those and other versions. Languageseeker (talk) 14:11, 9 April 2021 (UTC)
Uh, since when is using the incorrect terminology a good topic for public discussion? We're all learning here. We got past the misunderstanding, maybe we can look forward, not litigate minor irritations.
It seems to me that in the case of Oliver Twist, the challenge is that a specific edition is occupying the title that would be used for a versions page. So a page move would be required, which given all the subpages might be something to approach with caution. That's what I see Languageseeker doing here; in another section they have asked if the title could be changed, but they've gotten no response. I think if we can just establish what the best practice is for that kind of move, the issue would be resolved. (And, I'm happy to help out with this in the coming weeks, you're right, the page is not very useful in its current state.) -Pete (talk) 16:59, 9 April 2021 (UTC)
We don't disambiguate works until there is the need. When there is the need then we do it. There being multiple editions is not a need, use the author page, though for Dicken's works I would think that there are dozens of editions of many of his works, and not a lot of value in simply edition listing unless there is true value. — billinghurst sDrewth 14:20, 10 April 2021 (UTC)
I'm not listing every edition that has ever been printed. Only the editions that Charles Dickens directly contributed to as identified in the Clarendon Dickens. The names that I give to the editions are the standard one used to identify them. The page needs works and I'm trying to fix it up. As an example, Great Expectations is not even a version done by Dickens. Others are missing the original illustrations. Languageseeker (talk) 14:26, 10 April 2021 (UTC)
We recently went through this process for the plays of William Shakespeare. Yes, there were existing editions at each title, but the pagenames were needed for disambiguation pages. The actual examples would read like spaghetti, so I'm going to describe a theoretical situation that's a bit simpler. Let's say there was a copy of the play Much Ado About Nothing at that title, but we need that title for the disambiguation. The play text was moved to Shakespeare - First Folio facsimile (1910)/Much adoe about Nothing, a specific edition location. The versions of Shakespeare's play are listed at Much Ado About Nothing (Shakespeare), and works with the title Much Ado About Nothing are dismbiguated from each other at the main title.
All this points out something that isn't mentioned in the discussion above: There might be other works titled Oliver Twist, such as book reviews, encyclopedia articles, literary articles, dictionary entries, etc. For the plays of Shakespeare, there are retellings by Charles and Mary Lamb, and summaries of the plays by Hazlitt, and articles in the Encyclopedia Americana. The works of Charles Dickens are of a similar stature in the English language, so I urge checking around to see whether we need both disambiguation pages like Oliver Twist and versions pages like Oliver Twist (Dickens) before undertaking such a monumental revision. For Shakespeare we had the added headache that some of his plays, such as Julius Caesar and King Richard II were also the names of authors. So, I urge anyone planning to undertake revision of the works of Mr. Dickens to put in some planning first, or you may find that everything has to be changed twice.
That said, the works of Dickens are long overdue for cleanup, and I heartily welcome such cleanup. --EncycloPetey (talk) 23:37, 12 April 2021 (UTC)

New texts[edit]

What are the criteria for adding works to {{New texts}}, and thus to the main page? I added Boy Scouts and What They Do yesterday, but another editor has removed it, without asking me first. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:29, 8 April 2021 (UTC)

discussion prior to temporarily removing text CYGNIS INSIGNIS 16:09, 8 April 2021 (UTC)
In which you mentioned removing the work from the template (let me check...) zero times, before doing so. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:08, 8 April 2021 (UTC)
It has always been the case that anyone with a concern about a work should remove it so that the issues can be raised and addressed. When addressed then please feel welcome to relist it. — billinghurst sDrewth 14:12, 10 April 2021 (UTC)
Are you able to answer the question: "What are the criteria for adding works to {{New texts}}"? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:28, 10 April 2021 (UTC)
The usual minimum criteria (as I have listed them recently elsewhere):
(1) The work must be fully Proofread (in the Wikisource sense) with no pages unproofread or problematic and all images and major formatting included in this
(2) The work must be fully and correctly transcluded, so that the entire work is available in a correct and logical sequence
(3) A suitable license template must be placed on the work's primary Mainspace page.
It's also nice if the Author page exists (so we don't have redlinks on the Main page), and if categories have been added (so that it can be found through category searching). If you look at Boy Scouts and What They Do, it has not been fully proofread. --EncycloPetey (talk) 01:33, 12 April 2021 (UTC)

FYI: Author:William Shakespeare to be disambiguated[edit]

FYI, due to the presence of multiple author pages for individuals named "William Shakespeare", it is necessary to move Author:William Shakespeare to Author:William Shakespeare (1564-1616) to make room for a disambiguation page. This process has already begun, and I am updating links to the page. Due to the sheer enormity of the pages that link to that page, I am going to make a few passes with AWB before requesting assistance.

Note: this is a routine operation in accordance with our policies and established practice. This is not the thread in which to discuss the pros and cons of our existing disambiguation practices, or whether we should make an exception due to Shakespeare's importance. If you feel that a change is worth proposing, I encourage you to submit a proposal above, under WS:S#Proposals. —Beleg Tâl (talk) 00:28, 11 April 2021 (UTC)

Ouch. Did you also check enWP? We probably need to look to regular maintenance on the disambiguation page as typically no one is going to check. — billinghurst sDrewth 00:05, 12 April 2021 (UTC)
@Billinghurst: thanks for reminding me about enWP, I'll have a look. Speaking of enWP, they have a bot that checks for links to DAB pages, which might be a good idea for us -- this is far from the only DAB page in which one item is notable and the others are nobodies. —Beleg Tâl (talk) 01:22, 13 April 2021 (UTC)
Looks like the worst offenders are case disambiguation pagesBeleg Tâl (talk) 03:13, 13 April 2021 (UTC)
If you are expecting there to be pros and cons, and a possible exception, then why did you start the process before making an announcement or seeking feedback? --EncycloPetey (talk) 01:19, 12 April 2021 (UTC)
@EncycloPetey: As you may have noticed, every time a notable work or author gets disambiguated, there is always a chorus of editors saying "this is notable, it should be an exception", after which we kindly remind them that no, we do not make exceptions based on notability, this is the way things are done around here. I was hoping to save them the bother but it seems I made it worse :S —Beleg Tâl (talk) 01:20, 13 April 2021 (UTC)
The issue isn't Shakespeare's importance (although that factor might also be relevant to a discussion of how to handle it); it's the more practical issue that in the world at large there are a massive (massive!) number of works by Shakespeare (due to all the editions) and works about Shakespeare and works that refer to Shakespeare; and the same applies for works that are in scope for Wikisource; and the disproportion only gets greater for works actually currently on Wikisource. That means that we currently have a massive number of links to his author page, both actual and potenmtial, and are adding links to his author page at a far higher rate than other authors, and we will keep adding links to [[Author:William Shakespeare]] for the foreseeable future.
Meanwhile, John William Thomas Shakespeare has all of one published work that is in scope, and a small handful of references in other works (mostly biographical dictionaries); William Goodman Shakespeare has a small handful of works and slightly more references ditto; and w:William Shakespeare (inventor) (the only actually ambiguous name) wrote some patents and while they might theoretically have an author page here eventually, I very much doubt that will happen in our lifetime. In other words, from a practical perspective, moving Shakespeare to a disambiguated name and putting a dab at "William Shakespeare" makes about as much sense as preemptively disambiguating all author pages: it's theoretically pure but a practical mess.
All of which is to say, I agree with EP that this would have better been discussed before starting a huge AWB run. It's possible the outcome would have been the same, but it is by no means a given. --Xover (talk) 06:17, 12 April 2021 (UTC)
@Xover: As I said on my talk page, I really do understand this point of view, but we have had this discussion so many times and the result has always been that these points don't matter. Hence why I said, this notification of a significant but routine operation is not the place for this debate to resurface, this is a matter for a proposal to change our existing practices. —Beleg Tâl (talk) 01:25, 13 April 2021 (UTC)
@Xover: It has always been my habit, when addressing a discussion in which a matter seems to be in a gray area, to add content in order to move it into clear black and white. You will have seen me add scans to works nominated for deletion, to add works to authors nominated for deletion, etc. So even though I suspect that this might be taken the wrong way, please know that this is the only reason why I have created United States patent 446529 by Author:William Shakespeare (1869-1950) to make this discussion more concrete. —Beleg Tâl (talk) 01:39, 13 April 2021 (UTC)
@Beleg Tâl: You're entirely correct that this action does not come across in the way you here characterise the intent to be. Doubling down on unilateral action in the face of pushback from multiple members of the community does not tend to strengthen that impression either. In fact, using information I gave you, that you were previously unaware of, to artificially create a situation where you "win" the discussion does not particularly help convey an impression of respect for other points of view. You may also wish to consider that when you force a situation to be "clear black and white" you also preclude the possibility of nuance and compromise (i.e. it is effectively a scorched earth tactic). But, hey, congratulations on your win… --Xover (talk) 06:32, 13 April 2021 (UTC)

Template: Periodical header, not-actually-orphaned pages, etc.[edit]

The {{header periodical}} template seems to link pages in a way that is not detected by the "what links here" function. This, in turn, has resulted in users tagging the articles as "orphaned" when they are not. An example is Oregonian/1915/February/28/Miss Hobbs' place goes to C. Abrams, which is linked from Oregonian, but I've seen a number of others recently. Ping CalendulaAsteraceae

Relatedly, I'm a little unsure about the proper use of {{Header periodical}} in general, and its docs do not go into much detail about how it is to be used. My perception is that it's trying to be two different things, but maybe I'm missing something. On the one hand, it seems like a sort of stopgap, intended to deal with periodicals where there has been no effort to date to manually index the Wikisource pages related to it; it auto-generates a list of subpages. On the other hand, it creates a header that is specific to periodicals, which seems like a nice alternative in these cases to the header generated by {{header}}, even for periodicals where the manual curation has been done; but if it's used in such a case, it results in redundant entries. How, for instance, should Oregonian be adjusted to best fit our standards? -Pete (talk) 15:10, 11 April 2021 (UTC)

Can we please unconflate the argument between unlinked and the template. Pages should 'not be unlinked, and that periodical header does not do it is probably misrepresenting its purpose, or is missing the point.
  • If one is working on a periodical systematically then you should be having a proper header and be curating the root page. The subpages get linked. Typically periodicals do have a ToC and a flow.
  • We had cases of ad hoc articles of newspapers which were transcribed for a purpose, eg. linked to from author pages—as author, or about author—or linked from other articles, or from portals. And newspapers don't have a ToC and don't flow. We also had people creating pages at root that should/could be at subpage level.
So we created the stop-gap/fallback of the template, and we kept the name generic matching other language we used. So yes we have a template there to assist, it is not meant to exclude the work of linking to pages.

So if you are working systematically on the Oregonian, then please curate it and that would include article linking. If you are working on specific article for specific purposes then please provide suitable article linking. — billinghurst sDrewth 23:03, 11 April 2021 (UTC)

Ah, I understand now, thank you. I had indeed missed the point, my mind-reading skills are not very advanced :) I've added this info to the template docs. -Pete (talk) 02:00, 13 April 2021 (UTC)

Should mdash be surrounded by space or not?[edit]

Always used mdash without surrounding spaces which I believe is/was the community standard. However, the "Clean up OCR" script surrounds it with spaces as it wraps the paragraph lines. For me, line wrapping is the last step in proofreading and final spell checking.

As shown in the above comparison, I have my own AutoHotkey text standardization and line wrapping script, which is identical in every respect to "Clean up OCR", but I can modify it to whatever the current community requirement is.

I needed an additional an AutoHotkey script which does a partial cleanup of the text without line wrap, to help me identify the OCR errors between mdash and the hyphen, by surrounding the mdash with a space, and I left it so in final format. But some editors noted this and that is why this post. Which should it be?— Ineuw (talk) 23:53, 11 April 2021 (UTC)

We replicate the work. In the works that I have seen, typically not. I have no idea what the grammaticians and the editors of the world have a rule to apply at WP. We also have not typically done the half spaces, and all the other unicode that can apply. — billinghurst sDrewth 23:58, 11 April 2021 (UTC)
If there is a space, make sure that it's a non-breaking space with &nbsp;. —Justin (koavf)TCM 02:18, 12 April 2021 (UTC)
IIRC, em dashes will always linebreak even with &nbsp;? They break when set close-up against letters or other punctuation. Pelagic (talk) 01:28, 18 April 2021 (UTC)
Not. CYGNIS INSIGNIS 04:26, 12 April 2021 (UTC)
The "rule of thumb" usually offered in the past is that there should not be spaces around em-dashes. Where spacing appears around such dashes in older works, it is usually half-spacing that flanks the em-dash rather than full spaces. Wikisource has opted not to use half-spacing, and so in most cases we collapse half-spacing around em-dashes. However, there are works where there is clearly a full space or even a double-space next to one (or both) sides of an em-dash, such as at the end of a line of dialogue, so there isn't a simple rule that would apply in every situation. Hence a "rule of thumb", although the Wikisource:Style guide (s.n. Formatting, 7. Punctuation) explicitly advocates for "Whichever dash is used, it should not be flanked by spaces." (emphasis mine) This statement applies in the majority of situations. --EncycloPetey (talk) 23:21, 12 April 2021 (UTC)

Style issue across the site[edit]

I have seen several instances of hyphens that are used in places where en dashes should be. Does the community here think that we should continue using hyphens for purposes other than hyphenation or should we replace them with endashes for things like date ranges or separating asides in running text, etc.? (I am making an exception here for disambiguation of author pages as hyphens are easier to type: please ignore this particular instance of the misuse of hyphens in favor of endashes, as I think it will just complicate the discussion. Note also that I am not suggesting changing the punctuation of an original work as it was published, just for documentation, etc.) —Justin (koavf)TCM 02:09, 12 April 2021 (UTC)

You need to be a bit more specific about where you think we ought to be using (or not using) hyphens. The thread on my Talk page concerned its use in Portal Headers, which concerns neither date ranges nor "separating asides in running text". --EncycloPetey (talk) 03:09, 12 April 2021 (UTC)
I am referring to any instances where a hyphen is used and it isn't performing the function of hyphenation: joining two surnames of a person, breaking up a word over a line wrap or for showing syllable stress, or showing how certain parts of words are prefixes or suffixes (e.g. "The prefix Sino- refers to things from China...") Rather than list every way that hyphens are misused (e.g. in date ranges), it's easier to list the three times that they should be used. —Justin (koavf)TCM 03:37, 12 April 2021 (UTC)
I think it will be quite difficult to push this through so generally. Current practice is that dashes are not used in page titles, although I personally would allow it. I am quite hesitant about replacing hyphens by dashes in the main namespace in cases when the hyphens were (though incorrectly) used in the original work. Otherwise I support using hyphens and dashes in the way described e. g. at Wikipedia:Hyphens and dashes. --Jan Kameníček (talk) 17:13, 12 April 2021 (UTC)
Thanks. Agreed that this is a good rough guide. Note again that "I am not suggesting changing the punctuation of an original work as it was published, just for documentation, etc." —Justin (koavf)TCM 21:17, 12 April 2021 (UTC)
"Documentation, etc." is a very vague description of where you think hyphen-policing should occur. I also point out that, just because a hyphen is used incorrectly, it does not follow that an en-dash should be used instead. That is a false dichotomy. --EncycloPetey (talk) 23:14, 12 April 2021 (UTC)
I mean all pages other than the transcription of works themselves, which may have different or otherwise inappropriate typography. I propose that we use the guidelines that Jan just appealed to but in which cases would there be an inappropriate hyphen usage that wouldn't be replaced with an en dash? —Justin (koavf)TCM 03:37, 13 April 2021 (UTC)
I'm not sure what you're asking for here. Are you a) asking for permission to fix punctuation on Help and other such documentation pages? b) requesting a global replace on such pages? c) an amendment to the style guide? d) a new policy page on punctuation? e) something else? Beeswaxcandle (talk) 05:31, 13 April 2021 (UTC)

Accessibility on this site[edit]

I have recently added some accessibility features to tables such as table captions and was encouraged to post about it. These are required by WCAG best practices and provide a very high impact on making the site useful for the blind with very low effort. Note that I have also ported over {{sronly}} so these won't display, so there are no concerns about styling. Is there any good reason to not include table captions on data tables? Should we implement best practices as decided by Web authorities and accessibility advocates or is there some reason why we shouldn't? —Justin (koavf)TCM 02:12, 12 April 2021 (UTC)

I asked you to support your assertion that table captions are required, and you failed to do so. I have also pointed out that the place where you applied them is not a data table, but is applied purely for layout and is temporary for the benefit of proofreaders. The so-called data are copyright statuses for the works we cannot yet import and links to Index pages of works that we have. As works on the list are validated, the so-called "data" is removed and will eventually disappear altogether, hopefully within the next two years. I also asked several times for you to start a discussion, and am glad to see that you have now done so. --EncycloPetey (talk) 03:02, 12 April 2021 (UTC)
Excuse me, please they are data tables. What is it you think that a data table is? I have provided citations for using captions as best practice for data tables. Please do not keep on asserting that data tables are for layout when they are not. I also don't think that the implication that information should only be accessible if it's going to last an indefinite amount of time withstands even the mildest scrutiny. Just because the Sun is going to swallow the Earth in a few billion years, that doesn't justify us not using best accessibility practices. My point is relevant to all data tables, not only the ones that you think someone will remove at some point. Rather than make this discussion about a single table that you keep on misidentifying as not being a data table, I am asking a broad question that refers to a site-wide culture of using best practices for persons with disabilities. Are you in favor of doing that or are you opposed to it?Justin (koavf)TCM 03:33, 12 April 2021 (UTC)
Let’s try to keep this conversation civil. Accessibility is an important issue and should be a priority. Maybe a detailed proposal would be a good start. Proofreading is an important task that recognizes differences in ability and helps to make texts more accessible. However, it depends on certain visual abilities. There’s an need for a discussion of what areas need accessibility features and what kind. Languageseeker (talk) 05:41, 12 April 2021 (UTC)
I propose that data tables have captions. —Justin (koavf)TCM 06:05, 12 April 2021 (UTC)

UPictogram voting comment.svg Comment Use of term "data tables" is ambiguous, please give specific examples. Please consider using WS:Sandbox for some permanent exaples. Thanks. — billinghurst sDrewth 14:01, 12 April 2021 (UTC)

"Data tables" are distinguished from "layout tables" by the former showing the sorts of things that actually belong in a table (e.g. Nations in a certain Olympics and how many medals they won or vendors who owe your company an invoice per month) and the latter is the misuse of a table to provide the layout of elements on a webpage instead of using CSS. Here is a data table of school lunches. Here is a fictional budget and items sold in a data table. All data tables should have captions and semantics for columns and rows. —Justin (koavf)TCM 21:15, 12 April 2021 (UTC)
We reproduce what was published. Are you advocating that we generate captions and add semantics that are not in the published works? — billinghurst sDrewth 06:13, 13 April 2021 (UTC)
To be fair, correct semantics like scope=row/column on header cells actually probably should be used wherever appropriate. Not all tables have a caption in the original, but technically speaking, when they do, we also should be using the |+ syntax rather than a direct styled thing like {{center|Caption}} (per-work CSS can help to target those captions for auto-styling anyway). Perhaps when they don't, a {{sronly}}-type affair could be the table equivalent of an image alt attribute (something else we should really be doing for accessibility. We should also (technically) use {{lang}} whenever encountering non-English text.
For fully-correct semantic table markup, we are a bit hamstrung by the ongoing failure of Mediawiki to provide <col>/<colgroup> elements, as well as the "direct" nature of our data (semantic markup is easier if you're a site generating a table from a database of numbers - you can just change the server to generate the table as such).
I think while there is a lot of work that can be done to improve accessibility, if we're going to get anywhere it needs to be a more structured effort, perhaps in concert with https://phabricator.wikimedia.org/tag/accessibility/ (maybe a column in https://phabricator.wikimedia.org/tag/wikisource/?) than just ad-hoc threads at WS:S with no clear end state or process to get there.
The biggest problem I have with accessibility is that it is, in general, extremely hard for any "abled" editor to know if they have just produced something accessible, or almost-entirely inaccessible - it looks the same to them. Screen-reader software is rare, finicky about platforms and can be expensive. I have (after non-inconsiderable effort) managed to get Orca working, but it's very laborious to check what is written comes out OK. It would be good if someone knew of a service that could directly translate hunks of HTML into "screenreader text" to see how what we currently have comes out. Perhaps some kind of interactive tool could be built around it. User:Koavf: any ideas? Also, if you are serious about improving accessibility, writing documentation about what is and is not good practice will help others (including myself) understand how we can improve. Inductiveloadtalk/contribs 13:10, 13 April 2021 (UTC)
we did do some meetups with DCPL braille library with editing on screen readers (adding alt text to images), and at Gallaudet, but it would take some grant resources to do some UX. but in the mean time, maybe an accessibility user group on meta would be a start. Slowking4Farmbrough's revenge 22:08, 13 April 2021 (UTC)
Barring Foundation-wide standards and lacking any particular local ones, would you be in favor of adopting en.wp's as a rule of thumb? —Justin (koavf)TCM 02:59, 14 April 2021 (UTC)
I have no idea of enWPs, and anyway they generally are free-creating tables rather than replicating a work. I also know that (some of) our tables can be very busy and complex and the idea of adding further complications and complexity does not enchant me. I would want something easy and reproducible that does not pollute the work, can be done in wikitext, and not make me have to work at undoing any system imposed formatting. So at this stage I see that we are needing some developed guidance and priority in what can be done to improve the readability of a work. I would think that for a while we would look to voluntary compliance and encourage the WMF to consider it s part of their development. — billinghurst sDrewth 14:35, 14 April 2021 (UTC)
@Billinghurst: All you do is add "+Caption" at the beginning. It is done in wikitext and couldn't be much easier. —Justin (koavf)TCM 12:13, 24 April 2021 (UTC)
  • As proposed, this seems apt only to bludgeon other contributors with one's own preference. Accessibility is a large and complex issue that deserves a wider perspective than this. --Xover (talk) 06:18, 14 April 2021 (UTC)
    • The wider perspective is WCAG, the international standard for web accessibility. What Koavf says about table captions is correct. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:52, 16 April 2021 (UTC)
    • @Xover: "this seems apt only to bludgeon other contributors with one's own preference" What? How is this any different from alt text? Why shouldn't we have accessibility on this site? —Justin (koavf)TCM 12:13, 24 April 2021 (UTC)

Line numbering coming soon to all wikis[edit]

-- Johanna Strodt (WMDE) 15:08, 12 April 2021 (UTC)

Questions: (for those who understand the techno-speak) Will line numbering apply in all namespaces? Will it appear in the proofreading window of the Page namespace? And can it be deactivated by an editor if they find it distracting? --EncycloPetey (talk) 23:10, 12 April 2021 (UTC)
It was said "... you can enable line numbering ...". I would suggest that you ask over on the extension talk page whether it was tested ad will function in the Page: namespace. — billinghurst sDrewth 06:07, 13 April 2021 (UTC)
@EncycloPetey, I think this only appears if/when you have the colored syntax highlighting turned on. Do you normally use that? If not, then you shouldn't see it. Whatamidoing (WMF) (talk) 05:30, 21 April 2021 (UTC)

Chapter Navigation at bottom of the page[edit]

I tried reading on my phone today and I noticed that there is no way no way to navigate to the next chapter when a user reaches the bottom of a page. Instead, a user needs to scroll back up. Would it be to add automatically add a navigation footer when the header exists? Languageseeker (talk) 17:00, 12 April 2021 (UTC)

Can you reword your question? —Justin (koavf)TCM 21:16, 12 April 2021 (UTC)
@Languageseeker: Yes check.svg Done It should be working now. Hopefully the gadget cache has cycled by now. Thanks for the heads up. Inductiveloadtalk/contribs 21:27, 12 April 2021 (UTC)
To phrase that a little differently: We noticed this problem a few weeks ago, and discovered it had been caused by a template update. The problem was corrected, but it will take time for the newly made template change to propagate through every work on the site. --EncycloPetey (talk) 23:07, 12 April 2021 (UTC)
Thanks! Languageseeker (talk) 23:15, 12 April 2021 (UTC)

Why a Facsimile of First Folio than an actual First Folio?[edit]

Since Shakespeare is having a renaissance, I have to ask why we are proofreading a facsimile of the first folio instead of an actual copy of one? There are so many copies of it available on line that this makes no sense to me. Why not West 6 available on the Cambridge Digital Library or West 150 (external scan)? Languageseeker (talk) 01:41, 13 April 2021 (UTC)

What Wikisource content are you talking about -- could you provide a link? (And have you asked whoever started that transcription? Possible they were simply unaware of better sources..?) -Pete (talk) 01:48, 13 April 2021 (UTC)
This is the Index. Index:Shakespeare_-_First_Folio_Faithfully_Reproduced,_Methuen,_1910.djvu Languageseeker (talk) 01:55, 13 April 2021 (UTC)
there you go Index:Mr. William Shakespeare's Comedies, Histories, & Tragedies (1623).djvu have a nice time. Slowking4Farmbrough's revenge 02:59, 13 April 2021 (UTC)

Unless we are pushing a version through a program or project I think that it is inaccurate to say that we are preferring proofreading anything over anything else. We typically free proofread what we want to proofread, not a coordinated consensus on what work is next, and then why we have a consensus on our projects when we have them. — billinghurst sDrewth 06:03, 13 April 2021 (UTC)

@Billinghurst: It’s not about what user are working on, but the specific source file being worked on. There are numerous scans of the first folio available online while the one on Wikisource is a facsimile of an unknown printing of the first folio. Is there a way to propose moving the text from the facsimile to scan? Languageseeker (talk) 11:38, 13 April 2021 (UTC)
You are judging. We work from published editions of works, and that is what we have here, and as long as people source it and label it appropriately then that is all we ask. We will republished great works, crap works, sexy works, dirges, misogyny, murder notes, trials, enlightenment and so on. Our requirement is published. If you have a better version then start it. I don't understand lots of works that people work upon and that I do not value, and I don't put that down to their failure. — billinghurst sDrewth 06:03, 14 April 2021 (UTC)
Not sure if I'm understanding correctly, but doesn't the question boil down to the following?
  • Might we one upload a superior scan, or a scan of a superior work (and the answer, presuming that somebody has such a scan and it's clearly out of copyright, is yes)
  • What would it take to begin transcribing that alongside the existing work (not sure whether or not that's part of the question, I'm happy to address it if there's a need)
  • Maybe there is a desire to move the pages from the existing one to a superior scan? Not sure about whether or not that is what's sought, or would make sense in this instance. If so, there are script writers who could probably be persuaded to do such things...
Is there more to it than that? If there's a judgmental bit going on, it's going over my head. (Personally I find it's really valuable to learn why other people have made certain choices here...sometimes I learn something from the answer, sometimes I'm able to share useful info. But just because I ask the question, doesn't mean I'm judging.) -Pete (talk) 06:14, 14 April 2021 (UTC)
Index:Shakespeare - First Folio Faithfully Reproduced, Methuen, 1910.djvu is not an 'unknown printing'. The possibility exists, as you are aware, for Match & Splitting for another edition, but that might as well be direct import (with blame for errors being attributable to another site, not compounded by appearing to be a transcript of the linked index). CYGNIS INSIGNIS 12:22, 13 April 2021 (UTC)
By unknown, I mean that the West number is not known. The printers copy edited the w:First Folio as they printed it so there are differences between different printing of the First Folio, so shifting from one copy to another would require reproofreading which is less than desirable. Right now, I'm trying to get West 192 on Commons so that the text can be shifted over prior to validation. It seems a shame to proofread and fully validate a text only to have to redo it again. Also, the images and text in facsimile are of lesser quality than the original one. I think that Shakespeare's First Folio is important enough to warrant creating as best of a copy as possible. Languageseeker (talk) 13:29, 13 April 2021 (UTC)
@Languageseeker: The word "facsimile" here is a description of the type of edition, much like a "diplomatic edition" or a "limited edition" or a "hardcover edition". I.e. it's just another edition. The Methuen facsimile is also a known and well-renowned edition (and has been a standard edition since it came out; you'll still find courses thought based on it). If we really needed to we could find out which copies it used (it used multiple as I recall), we'd just need to slog through the literature. But, bottom line, there is absolutely nothing wrong with the Methuen facsimile, and it isn't interchangeable with any other edition. We could (and ideally should) host all 235 extant copies of the First Folio (and at least one copy each of the Second, Third, Fourth Folio, and the False Folio)—not to mention most of the Qu—artos—but that's an entirely separate issue. --Xover (talk) 17:28, 13 April 2021 (UTC)
Alright, good to know. Although, we might have to do some detective work to find the missing ones. Just kidding. Would there be an objection to adding West 192 to have a single edition First Folio? Languageseeker (talk) 17:31, 13 April 2021 (UTC)
BTW, I’m working on adding the quartos to Commons from the BL although the pages aren’t split. Languageseeker (talk) 17:37, 13 April 2021 (UTC)
@Languageseeker: West 192 is the NSW copy, I think? But any of the extant copies are fine for hosting. Some of them have missing pages and such, but surprisingly few due to the fact that owners have continually patched them over the centuries (some of them are real FrankenFolios). But if we're transcribing the First Folio rather than a later collected edition it is specifically for such unique features (like the page numbering snafu), so a couple of missing pages isn't really a problem.
If we're going to pick just one copy for a concerted proofreading effort we'd want give it a lot more thought (and I'm not at all certain West 192 would be it), but that's a mainly hypothetical question until we actually have a half a dozen or so volunteers ready to dig into the proofreading. The First Folio is really tough to proofread (properly).
BTW, I wouldn't recommend bulk uploading scans that have unsplit pages and similar problems. We can work around most such issues after the fact, but that's mostly applicable to when you run across an existing file and can't get an alternate scan. If your goal is just to prepare these for others to proofread, uploading lots of scans that need workarounds or doctoring has limited value (i.e. the odds are that someone else will have to fix them before they get proofread). --Xover (talk) 18:40, 13 April 2021 (UTC)
@Xover: Thanks for great reply. I normally wouldn't upload unsplit scans, but they are such important works that I figured that it's worth having a copy on Commons. The British Library hosts the scans in unsplit format so I'm making uncompressed pdfs from them to preserve them the future. Languageseeker (talk) 23:35, 13 April 2021 (UTC)
@Xover: It took 27 batch download jobs and over a day, but The second, third, and fourth folios now have indexes. Languageseeker (talk) 15:32, 15 April 2021 (UTC)

McClure's Magazine[edit]

A note to say that I have been through the lists of volumes and moved any page that was linked and described as being from the magazine to a subpage per the existing schema. I have updated all the volume pages so we won't have issues of mislinks that existed. I am starting on a search of the main ns to see if there are pages from McClure's now sitting in main ns though outside of the magazine subpage hierarchy. [No promises of no errors, and happy to be pinged for anything that I broke and need to fix. The whole magazine definitely is in need of scans to facilitate proofreading.

template:McClure's link now exists if you find any pages and move them and need to relink through from the author ns. — billinghurst sDrewth 05:58, 13 April 2021 (UTC)

Quite a lot of tedious organizational labor -- thanks for doing that. -Pete (talk) 06:09, 13 April 2021 (UTC)
Oh yeah (*′☉.̫☉) 3 days and that is with helpers in TemplateScript. Had to get done at some point. — billinghurst sDrewth 08:20, 13 April 2021 (UTC)
The volumes are largely incomplete (and most of the bluelinks are to author pages[!]), can there be more assistance to someone browsing the works. I assume that is the purpose of the page. CYGNIS INSIGNIS 09:09, 13 April 2021 (UTC)
@Cygnis insignis: The volume pages were essentially indiscriminately rootpage linked, and that led to other works, other versions, version pages or disambiguation pages. I was just starting the process of tidying what has been problematic for a while, and letting the community know. Multiple times I have maybe done one work, and then walked away. Others have done the same thing at other times. This week I just put the nose to the wheel. Nothing more, nothing less. — billinghurst sDrewth 13:44, 14 April 2021 (UTC)
\o/ Good going. BTW, I have (just) made a script that can move root pages to subpages. Might be of use for similar endeavours: User:Inductiveload/Scripts/Move to subpage. Obviously it's alpha-level and might have quirks and issues. I will one day get all this crap up in a Git repo and have it properly maintained. But until then, safety squints required! It worked decently enough for ~400 entries of The Complete Poems of Paul Laurence Dunbar.
I would offer to bulk upload McClures but 1) I'll need to get round to gathering the upload data (job for anyone who is bored (yeah, I know, me neither) and 2) Commons has been erroring on large uploads for weeks now and I'm struggling to upload even one item, so even if I can organise a scan set, I can't actually offload them onto Commons and I'm now running out of local disk space. Inductiveloadtalk/contribs 13:48, 13 April 2021 (UTC)
Thanks for building a script. Having a break from McClure's for a while. Trying to not add any new major works and work on the existing maintenance tasks and tidying that are problematic. This task was not so neat and tidy to automate, each one had to be manually reviewed and sorted, and then the header data updated, prev/next found and added, etc. I will consider the script when I have more than 100 pages to move. — billinghurst sDrewth 13:50, 14 April 2021 (UTC)
@Billinghurst: No expectation that you should do more of anything! BTW, that script will (attempt to) do the next/prev links if it can. As long as you have at least one entry on each side of each moved page (whether it exists or not) the links will be generated. Of course, building the ordered list for ingest is still a timesink, especially when you need to figure out what does and doesn't exist on-wiki! It works for well under 100 page runs too, for example, it made moving the 16 pages of The_Heart_Of_Happy_Hollow out of the root level fairly easy. Inductiveloadtalk/contribs 16:27, 14 April 2021 (UTC)


I created a template{{IAu}} to make using the IA upload tool a bit easier. For example, if you want to upload McClure 10, the format would be {{IAu| mccluresmagazine10newy|McClure - Volume 10|pdf}} Upload McClure - Volume 10. Even if it doesn’t work, the link will remain on the page. It also does the usual of warning if the file is already on commons or if the IA link is invalid. You can batch upload links, wait for all of them to load, and then press upload in each individual file to start the uploads. It takes about 5m to load the metadata for about 70 files and probably around 5 minutes to hit all 70 buttons. Not saying anyone should do it, but it’s an option. Languageseeker (talk) 18:09, 14 April 2021 (UTC)
@Languageseeker: If you generate a file like this User:Inductiveload/Requests/Batch_uploads, I can do a batch for you. The problem with spamming straight uploads from the IA via IA-Upload is that their metadata is generally really, really bad and to actually fix it file by file takes a long time via the IA Upload interface, and even longer to fix retrospectively. For example File:The Atlantic Monthly, Volume 30.djvu vs File:The Atlantic Monthly Volume 41.djvu. Inductiveloadtalk/contribs 18:29, 14 April 2021 (UTC)
@Inductiveload: I agree that IA metadata is subpar, but you can still find the book. Moreover, I think it would be best to tag metadata fields while proofreading and then update the data rather than trying to capture all the metadata first. This way, a user would only need to type the metadata once and there could be more detailed metadata. For example, the TOC or articles could be included as part of the metadata. Languageseeker (talk) 20:41, 14 April 2021 (UTC)
@Languageseeker: If you get the metadata correct up front, the fill index gadget will nearly always import it correctly. If you just import the IA's dross, then you will need to edit it. For example: all 73 volumes of Index:The Jesuit relations and allied documents (Volume 73).pdf will now need to be manually fixed (or a bot run configured, which is fiddly and annoying). Dumping index pages with zero care over the metadata clogs up Category:Index - File to check (186) (Which was almost cleared a few weeks ago) and takes hours and hours of others' time to sort out after the fact. Whereas if you had prepared as a spreadsheet, you can copy most of the metadata to every row with "Ctrl+D" and it's done.
Yes, the fact that this has to be stored and replicated around the place is a bit clunky and I hope one day WD will become useful for us. However, it's not there yet. Inductiveloadtalk/contribs 20:50, 14 April 2021 (UTC)


@Inductiveload: I meant that it should be possible to fill the Index and metadata from the Pages by tagging the parts in the Pages ns with something like {{title|}}, {{subtitle|}}, etc. and then reading it back in. Instead of WD to Wikisource, it would be Wikisource to WD. Languageseeker (talk) 23:05, 14 April 2021 (UTC)

Adding Disclaimer for all US-Gov docs that are not scan backed.[edit]

I think it’s important to add a disclaimer to all US-Gov that are not scan back that the text may be inaccurate. Such a disclaimer is present on all US-Gov website for non-pdf files. Also, all pages that link to White House.gov need an access date because the site is not durably archived. Languageseeker (talk) 15:16, 13 April 2021 (UTC)

Is this not covered by Wikisource:General_disclaimer#Accuracy? Inductiveloadtalk/contribs 15:27, 13 April 2021 (UTC)
To me, the disclaimer seems aimed at saying that Wikisource users might make a mistake, but, in this case, the Government Printing Office explicitly states that these texts are not guaranteed to be accurate. So, this should GPO disclaimer should be part of the template. Also, all the non-pdf versions of Congress publications are incomplete at the source. Languageseeker (talk) 15:36, 13 April 2021 (UTC)
Don't see the necessity. We are working from editions, and we are accurate to the published edition, so if anything needs saying it belongs in the general disclaimer for all works and the sources. We could weave into the text ... "//that we have editions based on a date of publication and they are believed to be true to the date retrieved, and not reflect later updates to the source. Errors in the source at the date of retrieval will generally be reflected in the edition at Wikisource.//" — billinghurst sDrewth 13:37, 14 April 2021 (UTC)
The issue is that the Congressional Documents are not being uploaded from their published sources, but the rough transcription provided by the GPO. I'm not against importing governmental documents, but they require more care than his. I think that there's a need for standards to make sure that the way they're being added does not create more work for the future. At minimum, this would require the importation of the GPO authenticated pdfs and standard templates for things such as line numbering, sponsors, etc. Languageseeker (talk) 15:07, 14 April 2021 (UTC)
Doesn't change what I said about we proofread to the source, cite the source. Also for what you are explaining in no way would justify a change of wording to the licence. The appropriate place to add a note is the talk page of the work in a {{textinfo}} box — billinghurst sDrewth 12:51, 10 May 2021 (UTC)
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. — billinghurst sDrewth 12:51, 10 May 2021 (UTC)

Can our OCR software be tweaked?[edit]

The latest version of our OCR does an excellent job, except it interprets a double quote as two single quotes. Can this be remedied?— Ineuw (talk) 20:38, 14 April 2021 (UTC)

@Ineuw: I'm not sure of the state of play of the OCR project. Probably The Phabricator project is the place to ask. Otherwise, in the meantime, a double single quote is probably fairly easy to hit with a regex replacement script. Inductiveloadtalk/contribs 20:58, 14 April 2021 (UTC)
Thanks for the info. In any case they must also be checked individually, and evaluated as italics, single, or double quotes. But then, this is not new with OCR, but now the scan is about 90% wrong. I wasn't sure if this was a local undertaking. I will post it on Phabricator.— Ineuw (talk) 21:13, 14 April 2021 (UTC)
I'm not sure what exactly is happening with the OCR, i.e. if they've changed anything with the normal OCR tooling yet. It's possible something has changed, or maybe just the OCR gremlins are around today. Remember than the normal OCR tool doesn't actually do any OCR if the file has a text layer, so it might be embedded in your file. If it's something that trips us the normal OCR, the Google OCR might work better, since that genuinely does go an OCR the image? Inductiveloadtalk/contribs 21:24, 14 April 2021 (UTC)
I had both on the toolbar, but currently our version is better. Google OCR doesn't insert an empty row between paragraphs and merges two columns text line by line instead of adding the right column below the left. So, this is a lesser issue.— Ineuw (talk) 21:35, 14 April 2021 (UTC)
Now that e-book exporter has been unbroken, they say they are turning to OCR. you can also leave comments on meta. meta:Talk:Community Tech/OCR Improvements. Slowking4Farmbrough's revenge 21:47, 14 April 2021 (UTC)
@Slowking4: Much thanks for the link.— Ineuw (talk) 00:23, 15 April 2021 (UTC)

Can our OCR software be tweaked? REVISITED[edit]

I posted an issue at Meta with which I am struggling with. I just want to bring it to the attention of the community because we have experts here who may not be participating in the meta discussions.— Ineuw (talk) 07:46, 20 April 2021 (UTC)

@Ineuw: Judging by a superficial look at the issue, this is a problem with the local JavaScript here on enWS that's showing up now due to changes in MediaWiki. The way the script translates the data it is getting back from the OCR server into usable text in the text field is … not optimal, and may need to be rewritten. I'll try to take a closer look when time allows. --Xover (talk) 12:35, 20 April 2021 (UTC)
@Xover: Thanks for looking into it. I can't understand why is the problem intermittent. Could we not copy a better working script from another Wiki? — Ineuw (talk) 13:36, 20 April 2021 (UTC)
@Ineuw: This is the same script that has been copied between wikis. It hasn't really been actively maintained since 2015 (and the same goes for the server backend), so it has a lot of half-a-decade old technical assumptions. That it's starting to fail as web browsers and MediaWiki continue to change is not really unexpected.
Meanwhile, in debugging this it would be useful to know which pages you were trying it on when it was very slow and a couple of pages where the performance was more or less normal. I've found some obvious problems, which is what I was referring to above, but on reflection I'm not at all certain they would explain the symptoms you are describing.
BTW, how certain are you that it's something to do with the OCR tool and not something local to your computer? The reason I ask is because on meta you say the OCR text suddenly showed up after 5 minutes; but after 5 minutes all the network requests etc. would have long since timed out, so if the server was that slow you should have gotten nothing or an error message. If it was a local problem (slow computer, web browser hanging, that kind of thing) on the other hand, you might see effects like that. It'd be odd if that only affected the OCR button, so unless you're seeing similar problems with local applications or other web pages (or the Google OCR button for that matter) that probably isn't it. --Xover (talk) 14:41, 20 April 2021 (UTC)
Thanks for the reminder to check my browsers' setups again.— Ineuw (talk) 15:15, 20 April 2021 (UTC)
This has nothing to do with the browsers, or the OS. I tried Firefox and Vivaldi in Windows and Linux, and the OCR script behaviour is the same.— Ineuw (talk) 05:17, 21 April 2021 (UTC)
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. — billinghurst sDrewth 12:52, 10 May 2021 (UTC)

DMCA for Mind, Character and Personality[edit]

Hello Wikisource - In compliance with the provisions of the US Digital Millennium Copyright Act (DMCA), and at the instruction of the Wikimedia Foundation's legal counsel, one or more pages have been deleted from Wikisource. Please note that this is an official action of the WMF office which should not be undone. If you have valid grounds for a counter-claim under the DMCA, please contact me. The takedown can be read here.

Affected file(s) are Mind, Character and Personality and all subpages, which are:

Subpages deleted

Thank you! JSutherland (WMF) (talk) 22:30, 14 April 2021 (UTC)

This act is something really unprecedented. Wikisource community has always paid thorough attention to copyright issues and never tolerated any copyvios, and a WMF employee with zero contributions here does not consider it necessary to discuss deleting our content with us and simply appears and deletes what he wants to delete, ignoring the community. Really unbelievable how the communities are valued by the WMF…
Could @JSutherland (WMF): at least additionally explain the grounds of the deletion in more detail? The provided link does not go anywhere. --Jan Kameníček (talk) 23:19, 14 April 2021 (UTC)
Link should go to wmf:Legal:DMCA_Mind_Character_and_Personality. That's the WMF's job; they protect the wikis from the potential consequences of copyright infringement by following the DMCA, which requires them to take down works when they receive a properly filed DMCA notice. I'm actually surprised at how lenient they are, instead of just mechanically taking down DMCAed works without concern.
In this case, it seems that Mind, Character and Personality is a 1977 work, an edited compilation that gets a new copyright.--Prosfilaes (talk) 23:26, 14 April 2021 (UTC)
If the things are as Prosfilaes has described, the community would not refuse to delete the work. Copyrighted 1977 works are habitually deleted here. But we do not deserve to be ignored. --Jan Kameníček (talk) 23:31, 14 April 2021 (UTC)
Legally speaking, I feel the hands-on approach the WMF gives us for these deletions is excessive; they'd be better off legally, as I understand it, deleting on DMCA and refusing to discuss with it us. They're bound by the law here, and I don't find it a problem.--Prosfilaes (talk) 02:56, 15 April 2021 (UTC)
Apologies, for the broken link - our tool is designed for Commons-based deletions since, as you say, these are extremely rare on Wikisource. (Also I made a typo.) Unfortunately the DMCA Policy doesn't really allow for much wiggle room here. If however you do have valid grounds for a counter-claim under the DMCA (more info here), please do let me know. If it'd be helpful, I can ask an attorney to comment here, though I am not sure they would be able to speak much to individual cases. Thanks, JSutherland (WMF) (talk) 00:06, 15 April 2021 (UTC)
I'm not entirely surprised because many users have trouble understanding that a particular edition of an out-of-copyright work can in copyright. The WMF has a duty to protect itself from lawsuit. This is one of the problems with non-scan backed works that this community is going to have to accept. Languageseeker (talk) 00:15, 15 April 2021 (UTC)
I have started a copyright discussion here. PseudoSkull (talk) 01:09, 15 April 2021 (UTC)
Of more immediate concern is to determine the status of the other works uploaded by User:Rcrowley7 at around that time. This one did not have sourcing or licensing information provided, and a quick glance at a couple of the others shows them to be in the same position. Education (White) has a source given on the Talk page, but it leads to website that claims copyright and I suspect this site is the source for most of the other of White's work. There is also the side-question of the works' non-compliance with our formatting guidance. Beeswaxcandle (talk) 07:21, 15 April 2021 (UTC)
Agree, the foreword is signed by "The trustees of the Ellen G. White Publications" which suggests that this edition was published posthumously. --Jan Kameníček (talk) 08:44, 15 April 2021 (UTC)
@JSutherland (WMF): Does the information that you are not sure "they would be able to speak to individual cases" mean that they in fact do not know anything about ít because they did not really check the copyright status of this particular work? I believe that the work could be a copyvio, but why did you not tell us which steps were made to check it and with what result? Why did you delete it without letting us know and instead of giving us proper information so that we could use our processes to get rid of a copyvio? --Jan Kameníček (talk) 08:30, 15 April 2021 (UTC)
@Jan.Kamenicek: because that is the process. WMF gets a DCMA notification and makes a determination per wmf:The Wikimedia Foundation Digital Millennium Copyright Act (DMCA) Policy and acts. We have the ability to challenge. In this case it would be reasonably obvious that we have not much of leg to standupon. You would need to find an archive version of that work that shows with something that puts it into the public domain. — billinghurst sDrewth 16:08, 15 April 2021 (UTC)
time for a road trip to university of alabama. this copyrighting of the public domain needs to be opposed. it's bad enough when the nephews rent seek, but the great-grand-children? Slowking4Farmbrough's revenge 11:22, 16 April 2021 (UTC)
They have put the work online, and it is available to be read. How about you change the link on the author page so it points to the available copy of the work. — billinghurst sDrewth 12:20, 16 April 2021 (UTC)
i would just as soon route around their web1.0 text dump, and go back to the 1889 base text. [3] we need the free alternative text without all the revisionist prefaces, and copyright abuse. Slowking4Farmbrough's revenge 00:41, 17 April 2021 (UTC)
Great news, I look forward to your transcription. In the meantime, you can link to what is available so that those who are interested in the work can find it. — billinghurst sDrewth 01:07, 18 April 2021 (UTC)
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. — billinghurst sDrewth 12:52, 10 May 2021 (UTC)

Please help correct my small error[edit]

I clicked and saved the wrong button on Page:A Wild-Goose Chase - Balmer - 1915.djvu/281 when I meant to validate the page. It appears that I proofed a page that was already proofed. If someone will Vallidate this one page, this will complete the validation for the book. Thanks to anyone who can take care of this. Maile66 (talk) 19:22, 15 April 2021 (UTC)

@Maile66: Yes check.svg Done Inductiveloadtalk/contribs 19:24, 15 April 2021 (UTC)
Thanks. That was quick. Maile66 (talk) 19:33, 15 April 2021 (UTC)
@Maile66: You're welcome and congrats on the completed validation! Inductiveloadtalk/contribs 19:34, 15 April 2021 (UTC)

Importing the Shakespeare Quarto Archive[edit]

So, the Shakespeare Quarto just got taken down this week because of the end of flash. However, the texts and images are available on [4] is there any easy way to import the texts and files of the proofread quartos into Wikisource? The encoding scheme is here [5]. Rescuing all 32 quatros from the digital oblivion seems like a worthy project. Languageseeker (talk) 04:16, 17 April 2021 (UTC)

IA-upload offline (error 503)[edit]

@Samwilson: If you are around, would you be able to have a look at IA-upload, it is telling me that the service is not available. Thanks if you can. — billinghurst sDrewth 12:55, 17 April 2021 (UTC)

@Billinghurst: Sorry about this! I've restarted the web service and it's back online now. It was down for 23 hours and 23 minutes. The error log was being filled with URL rewriting debug output (I've turned that off now) and I couldn't see at a glance what went wrong. Will keep an eye on it. The uptime log is here: https://stats.uptimerobot.com/BN16RUOP5/782616657Sam Wilson 00:39, 18 April 2021 (UTC)
@Samwilson: Could you investigate the situation, the IA upload has been mostly down for the past few days. Languageseeker (talk) 02:00, 20 April 2021 (UTC)

The Tragicall Historie of Hamlet Prince of Denmarke[edit]

What exactly is this? --EncycloPetey (talk) 00:47, 18 April 2021 (UTC)

The first Quarto of Hamlet from the defunct Shakespeare Quarto Archive. Languageseeker (talk) 00:49, 18 April 2021 (UTC)
Then why are there two different redlinks with bad syntax and a transcription project for one of the redlinks? Is this the First Quarto of Hamlet or a mishmash? --EncycloPetey (talk) 00:52, 18 April 2021 (UTC)
There are two copies in the world. The shelfmark identifies the copy. There is one Index because Commons is not playing nice with Pattypan. Languageseeker (talk)
@Languageseeker, @EncycloPetey: it is out of scope and unnecessary. We don't have either work, and we don't create work pages like that in main namespace. — billinghurst sDrewth 00:58, 18 April 2021 (UTC)
Um, there was a transcription project. Languageseeker (talk) 01:00, 18 April 2021 (UTC)
(ec) I have moved it to user namespace. There is a whole section above about creating ugly stick pages. I keep hoping that I never see {{ext scan link}} in main namespace again. So very tempted to make it so that template does not display in main ns. — billinghurst sDrewth 01:03, 18 April 2021 (UTC)
I moved the links to the main version page of Hamlet and created a proposal to make them in scope. Languageseeker (talk) 02:13, 18 April 2021 (UTC)
And I have taken them out. Get your proposal through first, red links typically not there unless they are appear somewhere else onsite — billinghurst sDrewth 02:53, 18 April 2021 (UTC)

Pictogram voting comment.svg Comment I have made the above link as a redirect. We would never have that as a versions page, as we only ever have one versions page for a work. If/when there is a completed transcription, they will appear on Hamlet (Shakespeare) as per all other versions. — billinghurst sDrewth 02:56, 18 April 2021 (UTC)

And how are they supposed to be finished if the scan link cannot be posted? Languageseeker (talk) 03:31, 18 April 2021 (UTC)
It can be posted, but not in the mainspace. This template belongs in the Author: and Portal: namespaces. Beeswaxcandle (talk) 05:47, 18 April 2021 (UTC)
@Languageseeker: What makes you think that you are the only person with works that are unfinished and unlinked from the main namespace? These works are no different from any other work, so please use the existing system. If you want to set up a project please look at Wikisource:Wikiprojects just like the rest of us have had to do. — billinghurst sDrewth 06:07, 18 April 2021 (UTC)
@Beeswaxcandle: There is nothing in any of the documentation or pages about style and format that states the template belongs in the Author: and Portal: namespaces only. Nor in fact is there any documentation about its use in any particular namespace(s). But—supposing for the moment the link must be placed in the Author ns—where on Author:William Shakespeare (1564-1616) should the small scan link to the First Quarto of Hamlet be placed? --EncycloPetey (talk) 16:16, 18 April 2021 (UTC)
Templates are typically required for the purpose of the page, and the addition of this external link template to a "versions" page is changing the purpose of the page then to disambiguate our versions. The typical use of the template is to point to things from our curated author and portal namespaces. There may be a special case for its use in main ns, but it is not on version pages. So it is a false argument to say that it doesn't say it cannot be used in main ns, it is the use that you are undertaking. — billinghurst sDrewth 00:39, 19 April 2021 (UTC)
So maybe the argument that you would like to develop is how does its use on a "versions" page fits with what we are doing, rather try and argue that the template doesn't have a rule that it cannot be used in another namespace. Context is far more important. — billinghurst sDrewth 00:43, 19 April 2021 (UTC)
[ec] But the Versions page come into being when content from the Author namespace moves into the Main ns as a Versions page. That is, when we have only a single edition, that edition is listed on the Author page. when there are multiple editions to manage, that content moves from the Author namespace to the Main namespace. It remains the same content; only its location has changes. Why then would a template apply to that content when it occurs in one location but not in the other?
When I created all the disambiguation and versions pages for the plays of Shakespeare, I advertized that process here in the Scriptorium and asked for feedback on what I was doing. At no point did anyone raise objections about the use of the {{small scan link}} and {{ext scan link}}. I also point out that, prior to the creation of those pages, we had only copy-paste Gutenberg texts for most of Shakespeare's plays. Once the versions pages went up, we had (at least) eight people join in with the transcription of various editions of Shakespeare, and they found those editions because there were links to the Index pages of the scans on the Versions pages. There were Index pages, and some of them had languished for years, but making those pages known led to the completion of several texts and the cleanup of several others.
I still do not understand why there is such strong resistance to the use of {{small scan link}} next to the listed works they contain. They allow editors to see that a transcription has been started and find it. We repeatedly have cases where an unlisted scan is finished, only to find it is a duplicate of an existing work, and the person who spent all that time is usually (and rightly) furious that no one bothered to coordinate listings. Removing {{small scan link}} from versions pages will exacerbate rather than ameliorate that issue. And as I have pointed out, such listing have directly led to the completion of long-languishing works.
If the matter of the listings is the same, why does it matter which namespace is located in? If we had the Versions pages in a separate "Work:" namespace like the Italian Wikisource has done with "Opera:" would that make it OK to use the template? If not, then why not? If so, then the argument against the use of {{small scan link}} in the Main ns is circular. --EncycloPetey (talk) 00:54, 19 April 2021 (UTC)
So this comes back to the basic questions
  • What is a versions page, our versions, some versions or all versions?
  • When should a versions page exist?
  • When would we make exceptions to either of the prior two questions?
At this stage we are appear outside the scope of Wikisource:Versions and as I attempted to address in the next section, we have got there by creep, not through clarity of our scope. If we are going to morph, then we need the central directional pages to move with them. We also need to be clear and explicit to the broader community that scope is being expanded, not implicit or having to guess. — billinghurst sDrewth 03:43, 19 April 2021 (UTC)

Creep of complexity and business of disambiguation pages, especially version pages, and other cruft into main namespace[edit]

Over time our simple listing disambiguation pages of works onwiki have been creeping to more and more complexity. They are becoming pseudo-stub pages and getting listings off-site, and pages of less relevance, definitely not transcluded works. We are getting into more and more disagreements about the purposes of these pages. And having to have the argument about what is appropriate on these pages is sucking up good time and simply becoming tiring repeated argument. We need to resolve the issue at a higher level so we don't have to have this continuing bickering of the detail.

Templates in use on the pages include

The past couple of months we are now getting people talking about future works and adding redlinks to works that we don't have, may never have. These pages don't rely on future promises, they should be used for works on site now, and possibly where we have had to disambiguate for them, eg. in a journal that we hold the scans.

Links of pertinence

In my opinion, we are not the encyclopaedia, we shouldn't be having discussions about all the versions of a work, or encyclopaedic argument about the differences or their histories in our main namespace. All that belongs at English Wikipedia, or what we have designated our curated spaces and we have author and portal namespaces for those sorts of pages. For many years we have been trying to tidy up the cruft, and keep our main namespace to be for presented work; it was to be our quality space, just for works. I feel we are drifting to cruft, so what type of product are we trying to put into our namespace? Otherwise what is the purpose of the main namespace? Up until now it has not been designated a find aid for things but works we have. — billinghurst sDrewth 06:03, 18 April 2021 (UTC)

I think that a lot of this creep has to do with a few factors. First, the difficulty of adding a text. Users find a scan on IA, but become too challenged by uploading the file to Commons and then decide to an external scan link to preserve their work. Uploading to Commons should be performed by a bot to clear that backlog. We can even gather all the IA ids and ask Fæ to upload them. Second, this site lacks a clear direction in creating a central corpus of key works in English. Instead, users decide to work on whatever strikes their fancy. By contrast, French Wikisource has the Mission 7,500 that provides a set of works to work on every month. This has resulted in French Wikisource having transcribed texts. Because English Wikisource does not have a core set of key texts, there are more version pages with no scans or texts or indexes. Version are a problem of important texts and not of minor texts.
For important texts, version pages can serve a critical function by designating which texts should be on Wikisource. Researching the publishing history of a book takes time and access to certain sources. In general, I believe that Wikisource should only include version of texts that the author contributed to, those that have important scholarly value (critical editions), or with new illustrations. A version page can serve as a space to detail these editions.
For now, the IA does not have scans of every book ever published. Therefore, there is work to be done by future generations. The version pages can serve as key guides for the future. Research today, proofread tomorrow.
As for the presence of small small scan link, they allow the user to see how they can contribute to this project. This is a collective proofreading platform. It makes no sense to hide the proofreading from users. Nor does it make sense to ask an individual user to proofread an entire work by themselves and then post it to mainspace. Transcription projects belong in mainspace.
I would recommend the following changes to Wikisource.
  1. Only allow scan backed new texts unless the user provides a compelling reason why that is not possible.
  2. Restructure the POTM to follow the French model of presenting several texts and focus on building core texts.
  3. Allow for version pages when a text has a complicated publishing history and more than three editions.

Languageseeker (talk) 04:23, 19 April 2021 (UTC)

Pictogram voting comment.svg Comment This issue is presented as something that has happened "The past couple of months", but the issues have been around as long as Wikisource has. Anyone working on large projects such as the 1911 EB knows that we have had long-standing redlinks in the Main namespace for years. With regard to versions and disambiguation pages, we had a case where the community opted in favor of red linked titles in 2015. Calling all of this "cruft" and appealing to abstract philosophy is not a practical approach. Saying that "it has not been designated a find aid for things but works we have" misses entirely the fact that disambiguation pages are find aids for locating other things; they are not themselves works, but are dynamic and changing entities by their very nature. The same is true of versions pages; these pages will and do change. Hiding works and their progress from listing on the basis of some abstract philosophy will not get the work of Wikisource done. --EncycloPetey (talk) 15:33, 19 April 2021 (UTC)
That is a different story with a different argument.
  • We all have texts that we want done, and our means for having these seen and listed has been through the the Author and Portal namespaces. This selective means of putting additional editions as available is outside how we have been doing them and outside of our existing guidance per Wikisource:Versions.
  • Redlinks that I mentioned were specifically related to versions/disambig, not the general case. On version pages they would appear to be outside of our guidance at Wikisource:Red link guidelines#Main namespace, and not at this time our guidance on the Versions either. Citing EB1911 is interesting though not relevant to the case.
  • To cite the 2015 example as a community consensus should be seen as a misrepresentation that the community undertook a conversation to change our policy or our guidance, and it should not make it incapable of review and change. Or it could be that this is an example of some exceptions to the rules, rather than the new normal.
  • I am aware that matters evolve, though it should be through open conversation and agreement with the expectation that our guidance will do so too. It is also reviewable, which is what I am doing here. However, the basics are that these are a variant disambiguation pages. There is a basic concept behind these types of pages.
If we are moving "FINDING AIDS" into the main namespace? Is it only going to be version pages? What makes them special? Why would they include links to off-site? What is the equity? What is the benefit? How does it work in the holistic sense? Should we start version pages for every work that has multiple editions? How will it work for biblical works? It is a can of worms and once you open it without thinking about the broader consequences rather than your own particular desires. It is why we have the guidance and why we achieve consensus for these changes and their scope. — billinghurst sDrewth 00:25, 20 April 2021 (UTC)
As I said, I don't think that every work needs a version page. However, having a version page can help identify the specific subset of a work that should be on Wikisource. To give an example, we don't want every copy of Hamlet ever printed on this site. There are only a few specific editions that should be in scope. Posting an edition on a version page can be seen as an invitation to comment on the in-scope nature of the work prior to a user sinking a significant amount of work into it. I've seen many mere reprints of 18th century works that I've replaced with links to scans of original editions.
Second, by providing what you term a "finding aid," Wikisource can prevent users from adding editions that this site would not want to host. For example, A Child's History of England (1900) is the only scan backed copy of this text. However, this edition is a mere reprint with no unique illustrations or input from the author. It's a random edition pulled from the IA because the PG text was based on it. It basically has no value. As soon as another copy of this work appears, I would submit it for deletion.
Third, without knowing about other editions, it makes it more difficult to properly name a text leading to future work. For example, image that the first version of Hamlet posted on this site was Q1. Would be call this Hamlet. If so, then in the future, an administrator would need to move it Hamlet (Quarto 1). If we had a version page, we would first call it Hamlet (Quarto 1). Languageseeker (talk) 02:24, 20 April 2021 (UTC)
I think you're too prescriptive on versions. I will not support the deletion of any scan backed version, because it's not the "right" version. Every printed edition is in scope. Certainly, with Hamlet, every different bowdlerization and every different abridgment have an interest to someone. We don't generally proscribe the best of anything; we include what people want to add, provided it has been printed. I encourage people spending their time on good versions, but I don't support proscribing what is the good version; if someone insists on a version of Oliver Twist respelled and mildly abridged for an American audience, that's their right to work on that here.--Prosfilaes (talk) 03:11, 20 April 2021 (UTC)'
For a specific example of where a random edition is important, if you want to know how The Pilgrim's Progress affected Carl Sandburg's writing, you don't want the best edition; you might want the version the American Tract Society published in the 1850s, because that was the copy in his library. Maybe he also read others, but you want to get the ones he had hands on, not the one's influence by Bunyan.--Prosfilaes (talk) 05:29, 20 April 2021 (UTC)
This is not outside how we have been doing it; the Weird Tales pages have been up since 2009. Citing EB1911 is also a general example, quite relevant to your last statement. As for your last statement, I think it's pretty clear we who are advocating this want versions pages to point to all versions of works that users are interested in working on. There's some disagreement about details, but it seems that any place we're pointing to multiple works, we should have links to works that people are interested in doing. As for me, we shouldn't be making author pages, links on author pages, or version pages if nobody is realistically going to work on them; Author:Isaac Asimov should be trimmed to the bones. Allowing any authors to have an author page with all their works is a can of worms; what if someone imports the entire LoC authority list? In practice, however, it's not much of an issue.
Recently, you begged off writing guidelines. That doesn't seem to be anyone doing that job here, but it's hard to make official changes if no one is willing to clearly document them, and it's hard to debate when there's unwritten guidelines that are leaned on if and only if it's convienent.--Prosfilaes (talk) 03:02, 20 April 2021 (UTC)
Is this a helpful page Panchatantra? It that a versions or is that an article? Which bit are we wanting for clear navigation at enWS. — billinghurst sDrewth 10:44, 20 April 2021 (UTC)
@Prosfilaes: I think that I am entitled to cry off some things around here, and I never said that they were unimportant. I think that I do enough in a range of areas where no one else is contributing, so please excuse me from one of the things which I find difficult. — billinghurst sDrewth 10:48, 20 April 2021 (UTC)
Meh. That's certainly not how I would write Panchatantra, I would definitely remove all the modern works and cut down on the header, but it's certainly useful to have other editions that aren't yet on Wikisource listed there.--Prosfilaes (talk) 14:33, 20 April 2021 (UTC)

Disabled music scores[edit]

Nine months have already passed since the music scores produced by the score extension were disabled and looking at task T257066 it seems that nobody really bothers about contributors’ complaints. At w:Help talk:Score#Can we please re-enable display of the score images? it was suggested to replace vorbis="1" with %vorbis="1"% or %sound="1"%, which at least enables to see the image of the score. I tried it and it seems it works this way. What do others think about such a temporary workaround (especially as "temporary" can mean "really long" in Wikimedia environment). --Jan Kameníček (talk) 13:27, 19 April 2021 (UTC)

@Jan.Kamenicek: I ran a bot job to do that replacement in January (IIRC ~450 instances). I don't think any more have appeared since, because only cached scores can be shown until "they" fix it (i.e. the "fix" will re-enable existing scores, but not allow the creation of new ones, AFAIK). Inductiveloadtalk/contribs 13:35, 19 April 2021 (UTC)
@Inductiveload: I am afraid there are some which the bot probably did not catch, like Page:The music of Bohemia.djvu/32. --Jan Kameníček (talk) 13:39, 19 April 2021 (UTC)
Huh, quite right. Perhaps because it's not transcluded it didn't get caught in the dragnet. Please hold. ^_^ Inductiveloadtalk/contribs 13:45, 19 April 2021 (UTC)
@Jan.Kamenicek: OK, I hit another 40-ish. Let me know if you see any more. Inductiveloadtalk/contribs 14:19, 19 April 2021 (UTC)
It was transcluded, but now it is OK (i.e. at least the picture is visible), thanks! --Jan Kameníček (talk) 14:58, 19 April 2021 (UTC)
Huh, so it is. So Now I have another bug in my random pile of JS junk to figure out! Anyway, thanks for the heads-up. Inductiveloadtalk/contribs 15:12, 19 April 2021 (UTC)

Pictogram voting comment.svg Comment The way to get resources for such a fix is through the annual/biannual priorities lists, or finding a hacker. SCORE only came about after a long time as one of our participants, GrafZahl, showed a particular interest and even that took ages due to security issues. This will be a case of developing a case for a fix and lobbying. Sitting, waiting, hoping will not bring the solution. — billinghurst sDrewth 22:44, 19 April 2021 (UTC)

Contributors’ job is to contribute, tech team’s job is to provide technical support, WMF’s job is to provide funding. Any project can work well only when everybody does their job well. It is quite enough when contributors report bugs, why should they lobby for the bugs to be fixed, why should they lobby for something that should be just natural? What else should they do? Sorry, but I am definitely not "sitting". I want to be adding content above all, but also try to do some little maintenance too, report bugs, and in better times I also take part in or organize various wikiactivites in the real world. I do not want to waste more of my wikitime on neverending pleading the tech team for support. System where volunteers have to lobby for support can result only in losing them (or at least in not gaining them). The bug was properly reported in the phabricator which is the place determined for such reports. Now it is the tech team’s turn to fix it (and it has been their turn for 9 months already). --Jan Kameníček (talk) 23:51, 19 April 2021 (UTC)
I hear you, I am not saying this is the ideal, I am mentioning my view of the reality. Reality != perfect world. If we had a perfect world, I would not be manually adding information to WD, writing spam filters, deleting spam, undoing edits and telling people to read the guidance, ... — billinghurst sDrewth 00:35, 20 April 2021 (UTC)
yeah, scores was always unsupported, and a miracle it worked at all. one of these days a musical hacker will reverse engineer lilypond in open source, and we will have another mission to transcribe all that public domain sheet music. but until then, it will remain locked away behind paywalls and hard to find archive scans. Slowking4Farmbrough's revenge 00:19, 22 April 2021 (UTC)

Tech News: 2021-16[edit]

16:48, 19 April 2021 (UTC)

Firefox extension: TitleCase[edit]

Transform strings into Title Case, Proper Case, Start Case, Camel Case, Upper Case, and Lower Case. You have 2 ways to change text. Either by right clicking on the field and changing the case or by highlighting and only changing what you highlighted.

TitleCase

Just tripped over this Firefox extension that allows case manipulation of text through block, right click. I know that I regularly manipulate case when transcribing, though not as easily as this with the forms that I have. — billinghurst sDrewth 14:45, 20 April 2021 (UTC)

I've been using this for a while; its a great time-saver. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:11, 24 April 2021 (UTC)

A book is currently to be used on a contest[edit]

Dear Wikisource fellows. I would like to inform you that Index:Scented isles and coral gardens- Torres Straits, German New Guinea and the Dutch East Indies, by C.D. Mackellar, 1912.pdf is currently used on a proofreading contest. You will expect new users editing the book. I will monitor the progress and provide feedback to participants if needed. In case I missed something, kindly let me know. Much appreciated. ··· 🌸 Rachmat04 · 02:49, 22 April 2021 (UTC)

@Rachmat04: Sound exciting. Thanks for organizing this. Languageseeker (talk) 03:23, 22 April 2021 (UTC)

Suggested Values[edit]

Timur Vorkul (WMDE) 14:08, 22 April 2021 (UTC)

Purpose of Proofread of the Month[edit]

@Cygnis insignis, @Billinghurst, @Prosfilaes, @Xover, @Inductiveload, @ShakespeareFan00: I cannot seem to figure out why we have a POTM and how the works are chosen. On the talk page for POTM, the text seems to be selected by one user and then everyone else goes shrug, whatever. I don't think that POTM has ever exceeded 500 pages completed a month. When I try to raise discussions about it, nobody seems to want to discuss it over there. So, I'm asking here. Why do we have a POTM? How are the works selected? Is it working? How does Wikisource define success when it comes to POTM? Languageseeker (talk) 15:22, 23 April 2021 (UTC)

  • Not sure why, something about collaboration and getting new users involved. There is also a badge. The selection of interesting and important works would see more contributions, from me at least. CYGNIS INSIGNIS 16:37, 23 April 2021 (UTC)
  • There isn't a single goal. The activity raises awareness of what we do, brings in new proofreaders, provides opportunity for even long-time editors to learn new approaches, increases the diversity of our works. . . . Usually a success is the validation of a work, so we pick a work that we expect could be completed in the time of one month. But not always: we have picked more challenging works when there was a clear plan, such as the huge book of Scottish songs we did, where we tackled a complex work and taught advanced editing technique to participants. To my mind, the best benefit is that it attracts new editors interested in the subject of the work, and since the editors are often new, we need a work that will be simple enough for newcomers to contribute to. To make that happen, the scan must be carefully prepared and checked so no unexpected issues will stop progress of the work. There should be few or no tables, limited illustrations, and relatively simple formatting throughout. My response may not fully answer your questions, but my answers aren't the only answers to your question. Part of what PotM is differs from editor to editor, and that is a good thing. --EncycloPetey (talk) 16:48, 23 April 2021 (UTC)
The archives of this page tell us that PotM was set up in August 2008. "Each month the community will select one text to be proofread and hopefully we can completely proofread the text in that amount of time." In those early days, the works were randomly selected according to the interests of the proposers. When Billinghurst and then I co-ordinated the selections, we tried to put in some coherence around the domains of knowledge selected and awarded badges for involvement in any month's work. RL got in the way of my heavy involvement in PotM and awarding the badges fizzled out. We encourage new editors to do some pages in the PotM so that they can get a feel for our processes and many of our most experienced contributors have started there and still contribute from time to time to particular projects. I'm only here because the June 2011 PotM was interesting to me and I somehow stayed. The challenge in selecting a work (as Cygnis insignis says) is to find one that is interesting. The "shrug, whatever" reaction to a selection is usually because the work doesn't appeal. We've also learnt over the years to avoid works with complex formatting. They simply don't work as collaborations, and aren't useful for teaching new editors. Beeswaxcandle (talk) 20:22, 23 April 2021 (UTC)
  • Pictogram voting comment.svg Comment In the beginning it was sitting beside a range of OF THE (TIMEFRAME) components, though that was pretty well prior to us having a truly functioning proofread page system. When I rejigged, because it wasn't working, we discussed the purpose of being a better gateway to newbies, and to use it as avenue to spread our subject coverage. It was meant to pique interest and not be scary, able to be accomplished within the month, hopefully through validation. Part of the means to ensure that not one person or subject dominated was to have rough themes planned ahead where people could make future nominations. I also had a November validation month where we took already proofread works and validated them. I did set up quirky rewards for participants. I did it for a while setting up templates and processes, then BWC joined me, and others joined so I moved onto other things that no one else was doing, I also think that coincided with me becoming a steward. One also needs some variety and changes. — billinghurst sDrewth 23:16, 23 April 2021 (UTC)
"I think the entire idea of a work a month does not work." what is there to discuss? Slowking4Farmbrough's revenge 23:44, 23 April 2021 (UTC)


  • These are great replies that clarify some things. For me, the single greatest drawback to selecting one text is that it will have limited appeal. French Wikisource has demonstrated that having a variety of texts attracts more users. Wouldn’t it make sense to not just limit to only one text? Languageseeker (talk) 23:56, 23 April 2021 (UTC)
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. — billinghurst sDrewth 12:52, 10 May 2021 (UTC)

Call for Nomination of Texts[edit]

There's good signs that English Wikisource might try the French Wikisource approach of having a set of texts for the community to work on with new texts added each months and incomplete texts rotated out after three months. Therefore, I'm calling upon the community to submit nominations for texts that they would like to have on this site. These texts should be important and have a broad appeal. Here is a list of some texts that I can think of and that others have suggested in various places.

  1. Masnavi I Ma'navi (transcription project) (Wikisource Islam)
  2. The Mahabharat (external scan) (Wikisource India)
  3. The Portrait of Dorian Grey (start transcription) (Wikisource LGBTQ+)
  4. Paradise Lost (transcription project) (Seventeenth Century)
  5. Clarissa, or the history of a young lady (transcription project) (Eighteenth Century)
  6. Manhattan Transfer (transcription project) Etsu Inagaki Sugimoto, A Daughter of the Samurai (transcription project) (Celebrating the Public Domain)
  7. The Wanderer (Fanny Burney) (transcription project) (Women Writers)
  8. The new Negro: an interpretation (transcription project) (Black Writers)
  9. Uncle Tom's Cabin (1851 First Edition) (transcription project) (Slavery in the USA)
  10. Librarian's Copyright Companion (Legal Texts)
  11. Anna Karénin (transcription project) Russian Fiction
  12. Enquiry into plants (transcription project) Classics of Science
  13. The sidereal messenger of Galileo Galilei (transcription project) Classics of Science
  14. Remarks on prisons and prison discipline in the United States (external scan) Reformists
  15. Commentaries on the Laws of England (transcription project) (Legal Texts)
  16. Commentaries on the Constitution of the United States (transcription project) (Legal Texts)

Languageseeker (talk) 00:59, 24 April 2021 (UTC)

Manhattan Transfer is up for an upcoming PotM. It would be inappropriate to poach titles from that project. --EncycloPetey (talk) 01:10, 24 April 2021 (UTC)
Fair enough, I replaced it with a different work. Languageseeker (talk) 01:39, 24 April 2021 (UTC)
Define "important". All texts published through a non-vanity press are/were important to someone—otherwise they would not have been published. To go a step further, all 12,456 works in Category:Index Not-Proofread were important enough for someone to create an Index here for them. But you can't put all of those into your proposed rotation. Beeswaxcandle (talk) 01:51, 24 April 2021 (UTC)
The definition of important is left to the individual user. To prevent an overwhelming number of nominations, I’m asking each user not to nominate more than five texts.
For now, I’m planning on having 15 texts to start with. I know that important is a tricky and nebulous term and I know that I don’t know every text. That is why I’m asking the community for nomination. I’m looking for a diversity of texts. What text from your community or interest would you like in Wikisource? Which text do you think would others find interesting? Languageseeker (talk) 02:00, 24 April 2021 (UTC)
There are definitely some works that are more important than others. Quite a few historically-notable texts only have Project Gutenberg text needing match-and-split (if I had to suggest one: Uncle Tom's Cabin) As for one needing proofreading, there's the recently-declared-C.C. Librarian's Copyright Companion. Mcrsftdog (talk) 02:04, 24 April 2021 (UTC)
@Mcrsftdog: Added to the list. Thanks. Languageseeker (talk) 02:56, 24 April 2021 (UTC)
  • This work looks very nice; I would like to proofread it. However, the text layer is offset in several places; could someone (Xover?) please shift it? (The first is at p. 3; the p. 3 text layer doesn’t exist, and the p. 4 text layer is put there instead. This happens on a number of other pages, as well.) TE(æ)A,ea. (talk) 12:24, 25 April 2021 (UTC)
    @TE(æ)A,ea.: I can probably fix it, but it's not clear to me which File: and Index: we're talking about. Link? Xover (talk) 10:20, 26 April 2021 (UTC)
The Harvard Classics contains many significant works, with Index pages set up already. Among them are Don Quixote, Two Years Before the Mast, and Anna Karenina. We are very weak in world literature. In science, Galileo's The Sidereal (or Starry) Messenger is a short but seminal work which is not scan-backed, and Theophrastus' On the History of Plants is a work of monumental importance that we don't have at all. --EncycloPetey (talk) 04:13, 24 April 2021 (UTC)
I think that these are all good texts to have. My one concern is that Anna Karenina is a reprint. What do you think about the Leo Wiener translation ([ https://archive.org/details/completeworksofc09tols/page/n19/mode/2up external scan]). It also has the advantage of being part of a set of complete works. Languageseeker (talk) 04:41, 24 April 2021 (UTC)
I am not experienced enough with the translations to recommend one or another. I will, however make one more recommendation: any works by Dorothea Dix who pushed for reform in the care of the institutionalized. We have zero works from her, which is tragic. --EncycloPetey (talk) 04:49, 24 April 2021 (UTC)
I would like to suggest Commentaries on the Laws of England and Commentaries on the Constitution of the United States. We don't have classic jurisprudence at WS at all. Ratte (talk) 14:01, 24 April 2021 (UTC)
@Ratte: For the Blackstone Commentaries, do you have any strong feeling about featuring the original edition vs the 12th edition? Languageseeker (talk) 15:08, 24 April 2021 (UTC)
You mean 3rd edition of Book I? we don't have 12th edition here. When Blackstone has published the first edition of Book III in 1768, he decided to republish other books together with Book III. It's the one-time publication of all the four books, which have cross references to each other's pages. Ratte (talk) 15:32, 24 April 2021 (UTC)
@Ratte: Wikipedia states that the 5th and 12th edition are considered the best editions and HathiTrust has the 12th edition available. I'm wondering if it' better to feature the 3rd edition to finish it or go on to the 12th edition. Languageseeker (talk) 17:26, 24 April 2021 (UTC)
Now it is clear. Yes, I agree, it is better to go on to the 12th edition. Can you upload a scan of it? Ratte (talk) 17:47, 24 April 2021 (UTC)
My main reaction is looking at including humanities/philosophy/theology in the next set of texts to pair with the Classics of Science (probably more accurately STEM as we may want math as well). I like the idea of having some categories highlighting geographic (India / Islam / Russian), temporal (18th, 17th), author (LGTBQ+, Women, Black) diversity and then a set around professional (legal), STEM, and the humanities as subject diversity as a breakdown. Then within some of those broad groups we can then pick works (e.g. pick a century --> nominate an important work missing from that time period, pick an area --> pick a work, etc.). MarkLSteadman (talk) 22:48, 24 April 2021 (UTC)
@MarkLSteadman: Glad you like it. Is there any text from humanities/philosophy/theology that you would like to add to the list? Languageseeker (talk) 22:55, 24 April 2021 (UTC)
In philosophy we have no works by Moses Mendelssohn or Johann Joachim Winckelmann. We also don't have any works by Humboldt. Friedrich List was the founder of the historical school of economics and we have none of his. My background is in physics actually so I would defer to any humanists for gaps here... MarkLSteadman (talk) 23:33, 24 April 2021 (UTC)


@EncycloPetey, @Beeswaxcandle, @TE(æ)A,ea., @Mcrsftdog, @Ratte, @MarkLSteadman: I’ve begun to build the page for the Monthly Challenge for May and I would love some feedback. Still adding texts. See user:Languageseeker/MC Languageseeker (talk) 15:04, 25 April 2021 (UTC)

  • Languageseeker: It looks good, more or less, to me, from an aesthetic point. I would recommend putting a border around the flags, because the flags of Japan and England blend in with the white background. Also, the Librarian's copyright companion title stretches over two lines, which is unappealing; I think the title should be shortened (in the display) so that it remains on one line. As for the content, I agree (generally) with the idea of some longer-term works and some shorter-term works, but I believe more care should be taken as to the selection of categories. For new users, especially, some more fiction and less technical or scientific writing would be appreciated. TE(æ)A,ea. (talk) 21:08, 25 April 2021 (UTC)
Maybe an indicator of beginner-friendly or beginner-beware? Also, it is not clear why Islam / India have `Wikisource` before them while the other's don't. MarkLSteadman (talk) 22:00, 25 April 2021 (UTC)
@TE(æ)A,ea.: Thank you for your feedback. I'll add the border for the flags to my to-do list. Are there any works of fiction that you think should be featured? I'm especially interested in key texts that have been proofread and need to be validated. Over the next few days, I'm going to add a few more works of fiction.
@MarkLSteadman: I'm trying to make sure that the majority of the texts will not be too difficult. There will also be a separate talk page so that users can ask questions. I see this project as a means of training new users so I'm trying to strike a balance between being easy material for users to learn the skills and also giving users a few challenges. Let me know if any text strikes you as being to difficult. Languageseeker (talk) 00:30, 26 April 2021 (UTC)
I thought one of the differences here was that it was supposed to allow collaboration on works with more difficult formatting (e.g. Plays, tables (Heller), or footnotes), characters (e.g. s as in the 17th Century works featured, diacritics, or the Greek characters in the Loeb), or content (e.g. images or equations). Not for all the works, but for some of these categories. Currently I can at least some of these challenges with the Richardson, Milton, Shakespeare, Theophrastus, and Heller selections. My suggestion was maybe thinking about indicating these are good ones to get started, while these might be a little more challenging? Maybe something to think about for later? Or looking to provide a little more guidance around some of those issues in the discussion page? MarkLSteadman (talk) 00:48, 26 April 2021 (UTC)

BTW, if anyone wants to add a text to the page the format is {{MC-Cover|Index|Cover Page number|Title to Display|Publication Date|Author|Subject (the green text)|N|Country for the Flag}}

  • page = replaces the author with the specific page
  • cover = use a specific image for the cover
  • author = Text to display in the Author Field


@EncycloPetey, @Beeswaxcandle, @TE(æ)A,ea., @Mcrsftdog, @Ratte, @MarkLSteadman, @Billinghurst:@Xover, @Inductiveload: The near final version of the Monthly Challenge page is done Monthly Challenge. Any and all feedback welcome as always. Languageseeker (talk) 22:40, 26 April 2021 (UTC)

  • By the way, pinging (by {{re}} or otherwise) only works if you sign your comment. This page looks good, if we are comparing it to the main page of WS:PotM; however, if it will be placed on the front page, there needs to be some small part that can be transcluded there. The Community collaboration is represented on Template:Collaboration, which you can use as reference. A short blurb, for lack of a better word, would be readily accepted on the front page, once you finished setting up this project. TE(æ)A,ea. (talk) 21:45, 26 April 2021 (UTC)
  • Pictogram voting comment.svg Comment Can I ask that this be converted into a project like we have done for all other Wikisource:WikiProjects This subject matter is probably a replacement for project for Wikisource: Community collaboration and is what we have done on such occasions and is part of the current schema and means to address extended projects. It should not be seen as the replacement for PotM which has its particular purpose. — billinghurst sDrewth 23:48, 25 April 2021 (UTC)
I think that the consensus was that this will remain separate from both PotM and Current Collaboration for now. To get this fully running will also require the importation of the Bookworm Bot from French Wikisource which might take a while because the user the bot has largely left as far as I can tell. The Monthly Challenge is designed to help introduce new users and to improve the core collection of Wikisource. I need to translate/write-up the FAQ from the French Wikisource. Most of my time was taken up in creating the template for adding the texts to the page. I'm trying heavily to make it visually appealing.
The only consensus is that this should not replace PotM. The possibility of it coming in as a community collaboration has not been discussed. In the 12 years I've been contributing here, there have been a total of seven (7) collaborations in that box on the Mainpage. Billinghurst's suggestion is worthy of consideration. You must also consider how this proposed project will work on the Mainpage. There is limited space in the "Current Collaborations" box, which is where it needs to appear. Beeswaxcandle (talk) 01:15, 26 April 2021 (UTC)
@Beeswaxcandle: My comment is that if setup it can become the current community project, which then becomes a very simple decision of the community to migrate from the existing project to the next. It also then becomes a future decision for the community when we migrate to the next. The community has already had the discussion and the consensus for priority and flag projects, so the angst about having the project doesn't need to be had, we want these flag projects, WHEN someone is willing to run them. — billinghurst sDrewth 01:30, 26 April 2021 (UTC)
This idea is based on the French Wikisource Mission 7500 and the box should ideally be in the same location: Upper Right hand corner of the Front Page above the New Text section of a similar size to the Explore Wikisource box. It should be very visible so that new users can see it right away. The French Wikisource Mission 7500 project is yielding around 8,000 to 12,500 pages proofread and validated a month. So, this seems like a very worthy model to attempt to replicate. Languageseeker (talk) 01:46, 26 April 2021 (UTC)
Meh, you don't ask a lot. It is a community project and belongs in the community project space. Get your project up and sorted, with its requisite documentation and valid pages and system, and prove the concept. If we then want to have a conversation about the main page, and the various positions, then that is entirely a different conversation. — billinghurst sDrewth 04:08, 26 April 2021 (UTC)
I’m asking to give the experiment a fair chance. For me, the placement matters because it makes it extremely visible on the front page. I don’t want to get buried. Even if the Current Collaboration section is replaced with this Monthly Challenge, it would still require a reconfiguration of the template controlling that section. I’m not entirely sure how to do that. I’m happy to do the translation of the French if that is necessary. If this requires a community discussion, I’m happy to do that if you point the way to where. The Bookwormbot also requires the granting of a bot flag and importation that an administrator needs to do. However, it can be plugged into the current template easily as just another field. Languageseeker (talk) 05:13, 26 April 2021 (UTC)
Actually what you are asking for is for more than a fair trial. You are asking for more focus than any other project has had. You are asking for your project to have more focus than our completed works. I don't rate your proposal higher than completed works, especially as the completed works will appear in the completed list. You are suggesting proposed works are ahead of completed content. Anyway, that is getting way ahead of yourself as you don't even have a working project yet. Do the leg work, produce your system. When you have a project to which we can point people then we can progress. — billinghurst sDrewth 08:22, 26 April 2021 (UTC)
@Languageseeker: Do not remove line breaks, but please remove any hyphens when they are used to break words across lines I think this is a difference in en/fr process that shouldn't be imported. enWS common practice (rightly or wrongly) is to remove the line breaks. It might be better just to link to WS:MOS and Help:Formatting conventions than to call out a single rule? Inductiveloadtalk/contribs 22:03, 26 April 2021 (UTC)


@Inductiveload: I thought that it make more sense not to remove line breaks because it’s much harder to spot and correct mistakes when the line breaks are removed. However, if that’s a hard rule, I’m happy to remove that comment and start a separate discussion about whether removing line breaks makes sense.Languageseeker (talk) 22:40, 26 April 2021 (UTC)
I mean, I get why people leave them in (and we say to take them out, AFAIK for borked interactions between Mediawiki's almost indecent love of P-tags and our templates without consistent newlines in DIVs). But it doesn't make sense to me to have separate style guidelines for MC works. I'd say, just point at the existing guidelines and say "do that". When changing things, generally, change one thing at a time. Inductiveloadtalk/contribs 22:45, 26 April 2021 (UTC)
@Inductiveload: Ok. Removed the offending line. Didn't mean to break any major rules. Languageseeker (talk) 23:00, 26 April 2021 (UTC)
You should not be differentiating from any of the guidance of the style guide. It makes things difficult if you start to have different rule sets. — billinghurst sDrewth 12:03, 27 April 2021 (UTC)
RE: Validation I see you're adding some texts to be validated. This will require some oversight, as we frequently have new editors who do not properly understand what "Validation" actually entails. Some think it's a quick check, without actually comparing against the scan copy. They therefore rush through to get the work to get it "done". Some new editors end up using spell-check, and also do not compare against the scan of the original. Any process that advocates (or seems to advocate) for speed or for completion without caution and training. --EncycloPetey (talk) 02:04, 27 April 2021 (UTC)
@EncycloPetey: I agree that validation can be a tricky thing to master, but I also think that it's important to train new users. To help address, your concern, I added a note about the goal of validation and a link to the validation help page. Languageseeker (talk) 02:49, 27 April 2021 (UTC)
We also do have a Wikisource:Validation of the Month, but it is not advertised on the <Main page. --EncycloPetey (talk) 04:14, 27 April 2021 (UTC)
@Languageseeker: is there a particular reason Mathnawí is using volume 2, even though volume 1 is virtually empty: Index:The Mesnevī (Volume 2).pdf?
Also, I think it should be a "thing" that the index pages have to be fixed up (i.e. status is "to be proofread") before they can be entered for MC. Inductiveloadtalk/contribs 11:14, 27 April 2021 (UTC)
@Inductiveload: Volume 1 of Mathnawí' appears to try to establish the definitive Persian text and so is mostly in Persian as far as I could tell. To your second point, that makes sense. Languageseeker (talk) 13:23, 27 April 2021 (UTC)
@Languageseeker: Ah right, then that makes sense! Inductiveloadtalk/contribs 13:34, 27 April 2021 (UTC)
I was working on proofreading the 20 pages of Introduction and then planning on blanking the Persian part for exactly that reason to prevent exactly that confusion and get into proofread status quickly. MarkLSteadman (talk) 13:54, 27 April 2021 (UTC)

IA tool allowing duplicates[edit]

Yesterday, the IA tools allowed me to upload a number of duplicates leading to a large waste of time concluding with the duplicate indexes being deleted. Anyone else having the same issue or know what is going on? Languageseeker (talk) 18:25, 25 April 2021 (UTC)

Relatively recently (but I don't remember exactly when) some of us were complaining that the IA Upload tool wasn't permitting us to upload a DjVu file when Faebot had already uploaded a pdf of the same. It looks like this has now been fixed. Beeswaxcandle (talk) 18:47, 25 April 2021 (UTC)
Could it at least warn about duplicates? I don't want to create redundant indexes everywhere wasting everyone's time. Languageseeker (talk) 19:16, 25 April 2021 (UTC)
Often, you need to manually check for duplicates first. Sometimes the IA copy has to be edited to repair pages, remove duplicate pages, or strip Google notices. The IA tool won't always catch those as "duplicates". --EncycloPetey (talk) 19:18, 25 April 2021 (UTC)
The phabricator task T269518 which was closed as resolved short time ago should allow duplicates if they are of different formats (after a warning is launched) but should not allow exact duplicates, see the comment of @Samwilson: from 9 April there. So if some duplicates were uploaded in the way described above and without warning, it is a bug. --Jan Kameníček (talk) 21:52, 25 April 2021 (UTC)
They weren't exact duplicates—.djvu vs .pdf, which is correct behaviour. Also, some were a different printing of the same edition, which no tool is ever going to pick up. Beeswaxcandle (talk) 22:57, 25 April 2021 (UTC)
  • Pictogram voting comment.svg Comment The responsibility to check Commons for the existence of a work does not lie with the IA-uploader tool, it lies with the uploader. There are many means that copies of a work can occur at Commons. There can be multiple editions, there can be copies of the work from multiple sources, so there needs to a be more mature approach than relying on the upload tool, especially as the PDFs were uploaded by a person to cover their needs, though they are generally inferior text layers than the DjVus. There is not even the guarantee that the best copy of the work has been uploaded; nor that the copy uploaded is the best to proofread. If you are using a PDF layer, you are often making things harder for your proofreading and more likely to get errors in the produced work.

    To more fully comment on an identified issue it is more helpful to have some examples and processes followed rather than just react to general complaint. — billinghurst sDrewth 00:14, 26 April 2021 (UTC)

Diskussion:Projekte[edit]

I just discovered this at de.wikisource,

Rules For New Projects (English Translation)[edit]

Each new project with an extent of more than 50 pages must meet the following points:[1]

  • 1. The script meets our requirements. See text base.
  • 2. Scans must be uploaded to Commons and the quality of the scan has to be good enough for proofreading.
  • 3. To meet the requirement of the 4-eyes principle the point a) or b) must be fulfilled:
    • a) Before or while the work on the project a quid pro quo in an equal extent is expected. (e. g. Proofreading)
      or
    • b) The project has found enough backers to be finished in a comprehensible timespan. To search for helpers this page can be used.
  • 4. To be clear, every project over 50 pages must be announced here, before start. Before start means: No index nor articles should be created before approving of the project. Is there no concern in the span of ten days the project can start.[2]

It is strongly recommended to also announce little projects with big parts of non latin letters, like greek, hebrew or handwritten scripts.

Also look at:


Some interesting sentiments are expressed, although it is presumably a reaction as policy by community members to abandoned projects. I don't see a concern where this remains in 'work- or proof-reading space' (the indices and their pages) and it is done with restraint and forethought, but obviously there are many practices here that would not meet the same degree of explicit or tacit approval at that sister's community. CYGNIS INSIGNIS 06:15, 26 April 2021 (UTC)

References

  1. beschlossen im März 2010 (Permalink zum SKR)
  2. Siehe Diskussion Juli 2015
The German Wikisource has decided to do many things differently from the other Wikisources. For one thing, they never adopted the idea of an Author namespace. Their Wikipedia has a number of very different approaches from everyone else as well, such as permanent parallel duplication of categories. It does mean that they often attempt approaches that no one else has tried, because they are not simply doing what everyone else is doing. --EncycloPetey (talk) 17:37, 26 April 2021 (UTC)
I think having such a policy fairly obviously has the exact outcomes you would expect: extremely low proofreading rates (~20k proofread or validated pages/year, vs 230k at enWS and 400k at frWS) as well as low participation (125 vs 428 and 253 active users). A less expected outcome (for me at least) are that the overall deWS proofread:validated ratio is still only ~1:1. Whereas it's "worse" (roughly 3:1 at both enWS and frWS), I'd have expected much better, since it should trend to 0. Also the very low productivity compared to frWS, where the pages/active user is ~10 times higher (enWS is "only" 4 times higher). For more fun stats: https://phetools.toolforge.org/statistics.php
While I imagine the overall quality in mainspace is much better, at what cost? And could we not achieve the same outcome through being stricter on pre-emptive transclusion of unfinished works? For example, adding a date to {{incomplete}} and having them flag up after n months?
Also, I disagree with the underlying implication that a proofread-but-not-validated text is somehow worse than no text at all. Inductiveloadtalk/contribs 18:07, 26 April 2021 (UTC)
With our Translation namespace it was expected that users would be using the Page: ns to do their translations, and this has not particularly been enforced, especially as we move over number of old translations. dWS ddidn't/doesn't use ProofreadPage so they can have a different level of approach/tolerance to translations. If we had them in the PrP environment and not naked in Translation: ns, then we would have no ugliness there at all, and accordingly infinite patience. — billinghurst sDrewth 11:50, 27 April 2021 (UTC)
I think "(English Translation)" just means it is an English translation of the deWS rules for all works, not that it only applies to translations. Particularly, since deWS requires permission (and a 10 day delay) on even creating the index page (which is, I guess, why they only have 142 "unproofread" or "incomplete" indexes: petscan:18955661), they do not have infinite patience, even in the working spaces.
Over here, we have had at least one "textbook" translation recently: Translation:The Three Princes of Serendip. Inductiveloadtalk/contribs 12:11, 27 April 2021 (UTC)
Two ideas I recall seeing around are a section on the author page that separately lists active transcription projects and a search button for linked indexes. I would prefer just the latter to avoid so much self reference on the site, and it is a better practice for avoiding concerns like working on a new index that is already half done elsewhere (as mentioned by Encyclopetey somewhere). If links to indices are manually added wherever, then it becomes remiss not to add that to the list of things to do aside from proof reading. CYGNIS INSIGNIS 13:05, 28 April 2021 (UTC)
it is good to have german wikisource as an example of what not to do. at wikimanias, we ask each other what is up with the "sick man of europe" [hopelessly lagging] wikisource. if you think i am exaggerating, check out the statistics: back in 2010, en & de had the same proofread pages around 100,000. now, de is at 300,000 and en is at 1300000. [12] the unhealthy wikis like de wikisource and wikinews are a choice by the admins at those projects: there is no technical reason they cannot be as productive as the english, french, italians and polish.Slowking4Farmbrough's revenge 22:45, 28 April 2021 (UTC)
@Slowking4: please rephrase that second sentence. CYGNIS INSIGNIS 04:05, 29 April 2021 (UTC)
ok. describing chronic failure has historical antecedents. but the lesson remains, do the opposite - no quid pro quo, no preconditions to start an index, no precondition of announcements or support, no requirement of text base (scanned back) pages. Slowking4Farmbrough's revenge 15:28, 29 April 2021 (UTC)
there might be pertinent comment after that attempt at a rejoinder, but I lost interest … CYGNIS INSIGNIS 15:51, 29 April 2021 (UTC)

Looking for (ancient) Armenian speakers to help transcribe a text on Wikisource[edit]

Recently the French Wikisource community started to work on the transcription of Grammaire de Denys de Thrace. So far you might wonder what the relation with Armenian speakers? Have a look to the full title instead:

GRAMMAIRE de DENIS DE THRACE, tirée de deux manuscrits arméniens de la bibliothèque du roi. Publiée en Grec, en Arménien et en Français, et précédée de considérations générales sur la formation progressive de la Science glossologique chez les anciens, et de quelques détails historiques sur Denis, sur son ouvrage et sur ses commentateurs ; PAR M. CIRBIED, membre de la société royale des antiquaires de france, professeur d’arménien à la bibliothèque du roi. extrait des mémoires de la dite société.

That's a bombastic title if you want my opinion. 😂 However it gives a good overview of what it contains. Especially, it make transparent that the book includes a huge load of material in Armenian. So the text is basically translation and comments of a text in Armenian, itself a translation of an ancient Greek text - also provided in the book. As the French Wikisource community has limited skills on Armenian, it makes the work of transcription far more complicated. Thus this call.

We are looking for people able to make transcriptions of text in Armenian alphabet. It would be even better if we could find people with ancient Armenian, since it is expectable that the text will likely contain oddity of the past. And if we can find someone who can moreover speak French or English to interact in discussions with the community, it would be perfect. Note that the main requirement is simply being able to read Armenian alphabet and to write it in simple Unicode transcription. In particular, there is no expectation that the potential helpers would make any work on the formatting that often require to deal with locale templates.

Please be bold in spreading the word wherever you think that could reach potentially interested Armenian speakers.

With all my warm love, Psychoslave (talk) 07:31, 26 April 2021 (UTC)

IP Masking Engagement[edit]

Hello Wikisource community, this is about IP Masking engagement which the Anti-Harassment Tools team is carrying out.

The point of the engagement is to understand how the project will impact editors. Also, we want to know which other tools you will need to be able to effectively govern the projects in absence of IPs.

Please read more on the IP Masking project here.

Please add your comments on the talk page.

Best regards,
STei (WMF) (talk) 12:43, 28 April 2021 (UTC)

Tech News: 2021-17[edit]

21:24, 26 April 2021 (UTC)

Clarification for WS:SHORT[edit]

Hi! So WS:SHORT's policy states that:

Reserved for Wikisource project reference pages (WS: namespace) only.

Intuitive. However, there is a note in the shortcut parameter of {{header}} stating that:

This is normally reserved for very large reference works (e.g. EB11)

Okay. So, the laws of the Philippines are often abbreviated (e.g. Republic Act No. 9003 -> RA 9003). The shortcut parameter is a perfect fit for this. However, I don't know if this is an accepted use for shortcuts. It would be awesome if this is acceptable.

Also, there are already some redirect pages created by others (RA 9188, RA 9189, RA 9190, RA 9191). If these shortcuts are not acceptable, then what to do with these already existing ones? If they are acceptable, then I can create shortcuts for all other Philippine law pages (Portal:Law of the Philippines).

TY! — 🍕 Yivan000 viewtalk 08:02, 27 April 2021 (UTC)

As long as no other works use the "RA" abbreviation, then creating redirects is fine. If the "RA" abbreviation is used for other works, then any that need to point at more than one work will need to be changed into disambiguation pages. The shortcut parameter in the {{header}} template is not appropriate for this purpose. Per the instructions at the Header template, the field should be very rarely used in the mainspace. Beeswaxcandle (talk) 08:14, 27 April 2021 (UTC)
Oh, okay then. I'll just add a separate box in the notes. TY! — 🍕 Yivan000 viewtalk 08:36, 27 April 2021 (UTC)
Pictogram voting comment.svg Comment @Yivan000: I don't think that using RA is a good choice as a shortcut on its own, it is too ambiguous in my opinion, whether it is currently used or not. I can think of a number of general use for RA, outside of here. Please try something more distinct and universal, or think of a nomenclature that could describe the work. I would be more thinking PH-RA or something like that. In terms of RA nnnn redirects, that seems okay to me at this point of time. — billinghurst sDrewth 11:42, 27 April 2021 (UTC)
@Billinghurst: I think the redirects as is are already fine. For me, the PH-RA shortcut redirects are too much. Note that there are other abbreviations for other laws (like CA, PD, BP, BAA, PP, ...), and having two redirects for each is too much. Also, I searched all abbreviations in Special:PrefixIndex and there are no conflicts whatsoever.
All is good. — 🍕 Yivan000 viewtalk 13:49, 27 April 2021 (UTC)
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. — billinghurst sDrewth 12:53, 10 May 2021 (UTC)

Mass move/rename of pages[edit]

We have a simple and effective SQL tool for mass deletions, I was wondering if we also have a mass move/rename script? — Ineuw (talk) 15:04, 27 April 2021 (UTC)

Help with Validating a Text[edit]

I was wondering if someone could validate Index:Paradise Lost Manuscript. It's part of the upcoming Monthly Challenge series and I think that it would make an extremely impactful first text to validate. The Index consists of the 34 pages that are book 1 of Paradise Lost and the only extant part of the manuscript. The text was written by an amanuensis. Languageseeker (talk) 00:43, 29 April 2021 (UTC)

Taking a quick look, there are two consistent issues I see. (1) Indentations of lines from the original is not replicated, and line indents are to be replicated for poetic works. (2) You've used the poem-tag throughout; this can cause huge headaches for multi-page works when they are transcluded. The poem tag does not always behave predictably, so for poems that span multiple pages, line breaks are preferred. --EncycloPetey (talk) 05:23, 29 April 2021 (UTC)
I read the indent point (1) as a new stanza, without bothering to check if that was what the printer did. What is the plural for 'amanuensis', because that is how this document was apparently compiled. Point (2) should be policy, the resources of this site have laboured to accomodate a tag that does little more than obviate the need to add breaks. CYGNIS INSIGNIS 13:23, 29 April 2021 (UTC)
@EncycloPetey, @Cygnis insignis: Thank you both for your feedback that helps me to clarify/understand what remains to be done on this manuscript. I think that I would like to reproduce the look of the manuscript as much as possible. There are four major tasks remaining when it comes to formatting 1) Replace poem with br tags 2) add in {{ls}} 3) add missing plines 4) add gaps. Languageseeker (talk) 13:43, 29 April 2021 (UTC)
In a printed text I have no hesitation in disposing of the indent and replacing that with an empty line, a la wiki, but in the case of a new transcript I am not so sure. Perhaps I should nominate for deletion as 'self- or un-published' to save me worrying. CYGNIS INSIGNIS 14:04, 29 April 2021 (UTC)
When the text is in prose, I agree with you about dispensing with line indents, but with poetry an indented line does not signify the start of a "paragraph", and may be an internal line of a stanza that has been indented. Deciding when indents are new stanzas and which are internally indented lines is an editorial decision, and choosing to make a break where there was an indent only, will affect the way it is read. --EncycloPetey (talk) 15:03, 29 April 2021 (UTC)

Score needed[edit]

Could somebody please oblige by transcribing the short score on Page:S.S. Bremen - G. Howell-Baker - music by E. Edgar Evans.jpg? It's beyond my skills. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:36, 29 April 2021 (UTC)

  • Andy Mabbett: I have done this, although, because the score feature is still disabled, nothing renders. TE(æ)A,ea. (talk) 19:08, 29 April 2021 (UTC)

dentition renderin'[edit]

"the cat proceeded to its bowl of biscuits

CRUNCH CRUNCH CRUNCH!. [apol. to Brautigan] :(

I need to add a dental formula in a work, there is some coding for that at the big sister: this. That could be copied over to Template:DentalFormula for the convenience of those familiar with it, but any stable code that allows me to apply numerals and separators above and below a 'line' (or other demarcator) would be useful for the odd instances I have found. This is useful for odd fractions too, 16ths, but I couldn't find anything mentioned in the Help pages. CYGNIS INSIGNIS 13:12, 29 April 2021 (UTC)

Would it be possible to use Template:Sfrac? To produce, for example, 2.1.3.3/2.1.3.3 ? That template supports rendering all characters as far as I know in both the "numerator" and "denominator," and the Wikipedia Template:DentalFormula is based on it as far as I know. Mathmitch7 (talk) 14:45, 29 April 2021 (UTC)
[e/c] There's {{sfrac}}: 2.2.2.2/2.2.2.2. Export is a bit wonky on some less capable clients. I am unsure of the best practice for fractions in the general case, especially w.r.t. accessibility. Inductiveloadtalk/contribs 14:46, 29 April 2021 (UTC)

Translations and Reprints from the Original Sources of European History[edit]

Today I happened to come across the work Translations and Reprints from the Original Sources of European History, published by the University of Pennsylvania's History department from about 1897-1907. As far as I can tell, archive.org has full or nearly full coverage of the series, which has several volumes. I was originally looking for a PD English Translation of "fr:Qu’est-ce que le tiers état ?" by Author:Emmanuel Joseph Sieyès which is available in full on the French Wikisource but not yet validated, and I happened across this series, where excerpts of that text appear in the 6th volume. I guess my question is, would this source be something that others would be interested in having all volumes up on commons and a central page here with a table of contents? It seems quite wide-reaching and may help us expand English-language coverage of minor texts from European authors. I'm just curious if this is a work people think would be useful. Mathmitch7 (talk) 14:40, 29 April 2021 (UTC)

@Mathmitch7: I generally think it's a very good thing to set up this kind of collective work "on spec", because the set-up is quite a lot of faffing about for many users, but dipping in to proofread an article or two is easy once that groundwork is laid. But make sure it's well linked from authors pages and perhaps Portals so it can be found. If it can't be found, no-one will come, even if you build it.
In theory, if you don't proofread any articles, it can't have a mainspace page, however. C.f. Wikisource:Scriptorium/Archives/2021-02#No-content_mainspace_pages for a long but fizzled discussion to try to determine best practices for that. Inductiveloadtalk/contribs 14:56, 29 April 2021 (UTC)
I would be interested in doing some of that, though probably very little overall, however, I have a preference and see no harm in having sections of the volumes with red-linked parent titles that follow this sister's convention of series/vol/sect, eg, I care for about 10% or less of The Emu, linking the 'parent titles' from The Emu/volume 3/Extinct Tasmanian Emu would be, as someone here said, "guaranteed to engender disappointment". CYGNIS INSIGNIS 15:29, 29 April 2021 (UTC)
@Mathmitch7: That looks like an amazing find. I’m sure some of those translations are still probably the only ones ever made. I took a quick look and there appears to be a new series as well. Languageseeker (talk) 19:22, 29 April 2021 (UTC)
Alright, so I've now uploaded one scan of each volume available on the internet archive of this series, you can find them at commons:Category:Translations and Reprints from the Original Sources of European History (UPenn series). I'll try to get them up to wikisource soonish, but I have some other things I have to do today. I'll note that the "new series" mentioned (which I also uploaded) are more monographs than big edited volumes, but I think they will still be helpful. Also, I was unable to find a version of Volume 5 readily available -- there's a version on archive.org but as it's a reprint from the 1971 it's not fully available for download, so that's something to look for in the future. Mathmitch7 (talk) 13:29, 10 May 2021 (UTC)

Is this an acceptable Index page name?[edit]

Index:Narrative of Henry Box Brown - who escaped from slavery enclosed in a box three feet long and two wide and two and a half high (IA narrativeofhenry00brow).pdf

The document is always referred to as "Narrative of Henry Box Brown" am I permitted to shorten it? This is the only copy on IA, and I have it as a .djvu file but is identical to this pdf. — Ineuw (talk) 04:48, 2 May 2021 (UTC)

My first question would be "but why?". There is no direct relationship between the transcluded work name and the Index: page name. Renames at Commons generally related to their criteria, so it is more into that space than ours when we have no requirement. I have shortened filenames at Commons though only where they have been problematic/wrong. — billinghurst sDrewth 10:57, 2 May 2021 (UTC)
Now that you moved the file name on commons the text had to be moved by an admin. I know that the file name wasn’t ideal, but It’s really creating more work for very little reward. Languageseeker (talk) 11:46, 2 May 2021 (UTC)
Ineuw is an admin so essentially they are just creating work for themself. — billinghurst sDrewth
the 19th century practice of putting all the metadata in the title is a little tiresome, but index title length or sense does not matter, as we can name the work what wikisource wants. and they tend to have machine generated internet archive artifacts. so all pdf names should be acceptable including "qwerty123". Slowking4Farmbrough's revenge 16:01, 3 May 2021 (UTC)

┌─────────────────────────────────┘
Thanks for all the comments, and apologies for these belated clarifications. I was distracted by unrelated issues.

My primary concern in renaming the index was hoping not to offend Languageseeker the original uploader. The reason for doing it was aesthetical, simplifies web search, and it's short, clear, and unambiguous. It is also uncluttered when proofreading. As long as this does not breach WS rules, the critiques are non-sequitur. — Ineuw (talk) 11:38, 6 May 2021 (UTC)

Sections[edit]

Please can someone explain to me in simple terms (or point to page which does so) how sections can be transcluded?

I tried on Showell's Dictionary of Birmingham/A, but that failed. What did I do wrong?

I am confused that we appear to have two types of markup: ## A ## and <section end="A" />. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:43, 2 May 2021 (UTC)

  • You needed to add the sections on the final page. On the final page, there are two sections: “A” and “B.” You should put ##A## at the top of the page (where the “A” section begins on that page) and ##B## between the “A” and “B” sections. I have done this, so the page now transcludes properly. TE(æ)A,ea. (talk) 13:19, 2 May 2021 (UTC)
In terms of the two types of markup: the # style is "Easy LST", while the <section /> is the standard style. Which you see/use depends on whether "Easy LST" is switched on in your Preferences. It's on by default. Beeswaxcandle (talk) 17:12, 2 May 2021 (UTC)
  • Essentially they are the same, the Easy LST is simply a javascript that converts the section tag to ##. At a point in time it was considered easier to have what was considered an easier methodology of simply requiring to mark the start of a section, and allowing the next section to open and close. — billinghurst sDrewth 08:31, 3 May 2021 (UTC)
<section end="A" /> is more commonly used, familiar, and self evident code that is available with a click. CYGNIS INSIGNIS 11:46, 3 May 2021 (UTC)
Noting that the section tag is in the editlist wiki markup drop down, and I knwo that I have made buttons for my toolbar to make my life easier. — billinghurst sDrewth 13:43, 3 May 2021 (UTC)

[Maintenance] Editions marked as "literary work" at wikidata[edit]

Looking at Wikidata we have about 1400 editions of works that are called "literary work" rather than version, edition, or translation (Q3331189).

Only a rough query, but it gives a good example of fixes that we need to make.

I will guess that we have some other variations of work that we will need to fix. Cannot just relabel them as some have merged editions into works. :-(

billinghurst sDrewth 03:04, 3 May 2021 (UTC)

Need to stop overwriting old versions, need to make a new version and disambiguate[edit]

Can I ask that as a rule that we typically never overwrite an edition of a literary work, especially a long-held work. When we have a new edition of a work, then we should name and disambiguate it. The community can then have the conversation about what to do with works.

With our long-held works where works were often not done as subpages there are redirects, versions, blah blah blah that need tidying up and it getting disconnected; plus the wikidata is getting horribly inaccurate. I hate having to upick these Gordian knots for lack of a simple solution being undertaken at the beginning, especially where an admin who can move with subpages, and without redirects. There may be exceptions to the rule, but these exceptions should be made with a clear mind, and by consensus. — billinghurst sDrewth 07:30, 3 May 2021 (UTC)

I think there should be a lot of caution in overwriting, it probably needs to go through discussion when the content is sourced and presentable.

Burning redirects has widowed content I linked from wikipedia in the past, I have opted to link sites like BHL instead for that and other reasons, and I already see how moving the second edition of Descent of Man on a whim or head-canon of a rule may make the deeplink I just added there useless. CYGNIS INSIGNIS 11:38, 3 May 2021 (UTC)

yeah, need some guidance about editions. we have a lot of non-scanned back works with no edition information, that we would like to update. and a lot of non-scanned backed poems (extracts from books). in addition to non-scanned backed works where the available scan is a different edition. Slowking4Farmbrough's revenge 15:47, 3 May 2021 (UTC)
@Billinghurst: I don't understand why? Look at The Deserted House (Alfred Tennyson) - a poem with edition information, but introduced by user, we can't verify it here and now. And other The Sun Rising - no edition information, no source. Why we can't overwrite works of this type? What's wrong if we will do it? Tommy Jantarek (talk) 23:47, 3 May 2021 (UTC)
@Tommy Jantarek: Firstly for your first example our version reputedly comes from The Works of Alfred Lord Tennyson (see its talk page), where is your version coming from? Should there be two versions? What will be the harm, issue with nnn versions?

Secondly, those works by our current methodology we would set them as subpages of their published work, and create either a redirect from root if we only have the one version, or we have a {{versions}} page and point to each.

Thirdly, because people have been doing a ham-fisted and half-arsed job of it, and fixing it up from that is an absolute PITA and waste of (my) time. A version of a poem will most likely have come from another work/source, so it should now be a subpage, not a root work; they overwrite with a different edition and a different source, they don't update the wikidata, etc.

Fourthly, we try to have half measures, and general guidance, and then we get hung by someone saying that they were not told that they could not do it by our written instruction. So I would like to see a gentle review process on how we better handle a new version where no one person makes such a decision, there is at least an independent review that we are not losing works where versions should exist. It does not have to be convoluted or power-based, though it should be reviewed against a set of criteria.

So I see that it is best that the each new work is rendered newly, with the wikidata freshly entered, AND THEN we look what we do with the old version. We can move and overwrite at that stage if we determine that is the best way to progress. — billinghurst sDrewth 03:11, 4 May 2021 (UTC)

I will also note that your Tennyson example, it directly matched to the work The Deserted House (Q7729810) rather than being its own edition of a work. There is not one version of many of our works, so we create editions with their provenance. — billinghurst sDrewth 03:16, 4 May 2021 (UTC)
Regarding 3. What if scan is exactly the same like on edition information? Do we overwrite old version and move it to subpage? And what if old version have no edition information or wikidata item? If user overwrite old version, he might move the overwrote poem to subpage. Tommy J. (talk) 12:37, 4 May 2021 (UTC)
That will be the review process to determine what to do. I am saying that we should typically not replace/overwrite as the first step of actions. If you have a new scan, transclude its edition of the work. Do you see the string of "what ifs" that make it hard to write rules for the newbie or less aware that we work on versions, and of our processes, and all the factors to check. — billinghurst sDrewth 13:48, 4 May 2021 (UTC)
I think the fundamental issue is that most texts are presented as the one-and-only one of text. However, every text is an edition. The Deserted House (Alfred Tennyson) is ambiguous because this poem probably exists in several versions. If this site wants to stop having confusion over texts, then it needs to do two things. 1) Ban non-scanned back copies. This whole entire situation exists because users are trying to replace non-scan backed versions with scan-backed version. Non-scan backed versions are always going to be dubious because there is no simple way to verify their authenticity or accuracy. 2) Set up rules for publishing works. At minimum, it should include the date. 3) Make the deletion on non-scan backed versions when a scan backed exists a criteria for speedy deletion. Languageseeker (talk) 14:42, 4 May 2021 (UTC)
In re your proposal #3, no. As a text repository, we need to ensure that incoming external links don't break. This is the point of what Cygnis is saying and a principal reason for Billinghurst's initial post. The various forms of disambiguation pages and the soft redirect process assist with ensuring this. Beeswaxcandle (talk) 17:40, 4 May 2021 (UTC)
Most texts are the one-and-only one. Once in a long while, a work will get reprinted with some changes, but it's the exception, not the rule. Reams of fiction and non-fiction get written every year and even after you filter for intent to publish and actually getting published, most of it just disappears. This is doubly true if you ignore facsimile reprints and other reprints that don't introduce any interesting changes.
I get it; you have an interest in certain works widely considered important that have important distinct editions. I don't. I'm much more interested in the rare and esoteric. Please understand the differing needs and concerns of some of your fellow users. I can't really imagine that continuing to push the banning of non-scan backed editions is in your best interest; work with the community as it is, don't walk in and trying and make huge changes.--Prosfilaes (talk) 17:55, 4 May 2021 (UTC)
@Prosfilaes: As far as I can tell, the issue that billinghurst is raising is the overriding of non-scan back editions with the text of a different edition: a user overriding Version X of a title with Version Y. This creates confusion in the WD. Therefore, billinghurst is requesting that users do not override a non-scan backed version prior to discussion. Otherwise, the WD becomes inaccurate. Cygnis insignis points out that excessive disambiguation breaks links and makes this site less usefull to Wikipedia.
I am arguing that this occurs for two reasons. First, many pages are created in a way that can lead to ambiguity. Yes, most texts are one-off, but that is no reason not to take more care in creating page name. Even title (year) would lead to less ambiguity than currently exists. I recently had a long discussion about disambiguation where the administrators stated that the current policy is to allow ambiguity and then disambiguate later. Poor naming leading to more future disambiguation and link breakage. Instead, disambiguation should be the exception and not the norm. Second, users are overriding these texts because they wish to improve the quality of the text. Many users see scan-backed versions as better than non-scanned back versions. This will continue until this site needs to make a firm policy on non-scan backed copies. If we allow them, they they can never be deleted. If they are to be replaced, then they should stop being made to avoid creating additional work.
Finally, I'm not against obscure work or only for those considered important. What is important changes over time. Paradise Lost used to be an obscure work. However, I believe that most users will come looking for the texts currently seen as important. Creating scan-backed versions of these works can attract more users that will proofread more works obscure, important, or otherwise. Languageseeker (talk) 00:59, 5 May 2021 (UTC)
Don't presume, ask. It is broader. I have numbers of reasons why, and essentially the free-for-all for numbers of situations the overwriting process is broken. I am saying that the community needs to holistically manage overwrites, and the way that I see it best to do it is to keep it simple "any new work coming in scan-backed to be transcluded as a new version". Then if we think a version is redundant, then the community manages it by its processes. This is typically not a single-person decision.

At this stage non-scanned editions are within scope, so please do NOT pollute this conversation with that matter. Separate conversation deserving of its own discussion at a later time. Dealing with what we have. — billinghurst sDrewth 02:46, 5 May 2021 (UTC)

My humble opinion and suggestion. The community will do what you all will decide, I'm just guest here. It seems to me you all plays too much value to non-scanned texts. This type texts are like Wikipedia articels without sources and footnotes. Are these letters, words, sentences and punctuation correct? Nobody knows it (even author of page) and a verification is very hinderet or impossible. Data based on non-scanned texts is like divination by the cards. "Create new text and the community can then have the conversation about what to do with old version". OK. But what is if community make a decision to delete? We will delete not just text but user's contribution and work also whole history of page. It's no good and honest practice. And it's destructive. In my opinion Proofread project offers most usefulness and credibility on this days. When user (usually experienced) overwrite non-scanned text then text's quality and credibility grow and whole history is preserve. Unbelievable text without source gains its strenght. Proofread pages aren't transcluded by newbies or very rarely. Few newbies know this project sufficiently to do it. What's with Wikidata? If user overwrite and move old text to adequate subpage good practice will be to visit Wikidata item and update all information. There is not a lot of work and it require just one visit preferably right after overwriting. I think that user who know how transclude and move pages also he know how visit and update Wikidata items. The plwikisource community attach much importance to Proofread Project from many years and it brings great results in my opinion. Forgive me for this long scribble, please. Tommy J. (talk) 21:35, 6 May 2021 (UTC)

@Tommy Jantarek: I've started a discussion about this at the Scriptorium if you'd like to express your excellent points there. Languageseeker (talk) 04:59, 8 May 2021 (UTC)
Pictogram voting comment.svg Comment @Tommy Jantarek: I don't disagree with your opinions about scan-backed texts being preferred than a copy and paste. I don't think that the community disagrees either. Now this may be an English language issue alone, that I cannot say, however, I started this thread because the practice of users has been problematic, so you are hearing my experiences with having to resolve problems.

There n be no updating of Wikidata. There are partial moves, partial overwrites, no link fixes, US editions replacing UK versions; illustrated versus non-illustrated versions. So to me we have to change the practice, instead of users themselves choosing to overwrite, instead create a new version [keep it simple]. The community can then decide what to do about versions, and how to go about it. — billinghurst sDrewth 09:50, 8 May 2021 (UTC)

Text of Template:migrate to needs to be adjusted as it encourages the in situ replacement, and no guarantee that the replacement is appropriate. We have no clear quality control on the placement of the the template to even know that a one-for-one replacement is the correct instruction. — billinghurst sDrewth 09:53, 8 May 2021 (UTC)

Tech News: 2021-18[edit]

15:43, 3 May 2021 (UTC)

Philosophical Transactions of the Royal Society A – Volume 184[edit]

Wasn't sure where to post about this since WikiProject Royal Society Journals seems to be dead, but I'm taking on the task of filling out Volume 184 of the journal Philosophical Transactions of the Royal Society A, published in 1893. Why Volume 184 specifically? Can't remember. This is a pretty big undertaking, and I'm mostly doing whatever random pages I feel like in whatever order, so anyone who's up for learning about chemistry or higher mathematics or astronomy or aether theory, etc., as it existed in the late 19th century: feel free to jump in! I already have about 50 pages "done" (several are still missing images, and a handful have ostensible misprints such as teeny tiny exponents that I can't read; nevertheless, those have been proofread as best I can otherwise), but that's just put a dent in it, let alone the other volumes of the journal. A fair warning that a lot of these pages require prolific use of the 'math' tag, but that certainly doesn't apply to all of them. Either way, I think it's a lot of fun, and it'd be great to have more eyes on the project than just my own. :) TheTechnician27 (talk) 17:33, 3 May 2021 (UTC)

@TheTechnician27: Thanks for letting the community know, and great to have you with us. I am not particularly into proofreading those works, though if you need assistance in working in the page: ns, or transclusions, etc. then please let me or the community know here. If you need a bot run through to apply text layers, then please put a detailed note on Wikisource:Bot requests and ping me. When one is jumping about for particular articles having the text there, and a tuned {{engine}} to search within the work can be quite handy. And with regard to proofreading, we all just do our best, and learn on a daily basis, so don't overly sweat that. — billinghurst sDrewth 01:56, 4 May 2021 (UTC)

Table formatting[edit]

The table under "Distances" at Showell's Dictionary of Birmingham/D, which spans three pages in the source, is malformed. What's wrong? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:00, 3 May 2021 (UTC)

It appears to render correctly on my browser, does it not on yours? Where is the 'malformation' appearing? I am unfamiliar with the coding style, but I may be able to fix it (or rewrite it). CYGNIS INSIGNIS 22:24, 3 May 2021 (UTC)
Actually, not correctly, but readable. I notice that something is suppressing the page numbers at those pages, can't remember what some of the causes for that are other than too clever formatting. CYGNIS INSIGNIS 22:32, 3 May 2021 (UTC)
@Pigsonthewing: the main issue was that you were closing the table on the first page: diff, which was coming through into the main NS. The right way is to close the table in the footer like this:
{{nopt}}
|}
Also, in theory, you should use a {{nopt}} at the head of the subsequent pages to avoid accidentally running rows together if the line break on the previous page is removed by accident.
I'm not actually sure why a paragraph is introduced in the page NS under the header. @Billinghurst: any ideas? Also there appear to be no page numbers at all for the third page onwards. Which is also bizarre, because AFAIK the issue I would expect that to be (Phab:T232477) should not apply if |- is at the top of the page, and the numbers don't even appear with JS off, and they're not fostered, they're just not there at all. Inductiveloadtalk/contribs 22:38, 3 May 2021 (UTC)
  • This is a general table problem; it happens when the table has header content that isn’t included (over a page break). It also happens here. TE(æ)A,ea. (talk) 23:04, 3 May 2021 (UTC)

@Inductiveload: the page numbering is screwed due to the addition of a table row marker at the end of the body sections—fixed those. I don't know why this continues as a community behaviour, I have long harped-on of this problem and have long-changed the help pages from the mistaken guidance. I know of its origins and some use of it as a practice that is tied in with the table row issues and our subsequent nop/nopt introduction, but it was always a bodgy solution. — billinghurst sDrewth 02:14, 4 May 2021 (UTC)

Ah yes, now I see, I had missed the extra |- at the ends of the pages. Clearly I was up too late last night!
Also, with sleep and caffeine in me in the right order, I now remember that the extra space under the header in page namespace is phab:T275388 and there's not a lot we can do about that as it stands (though the ugliness is contained to the Page NS, so it's not too bad). In theory, some kind of namespace awareness could help, but then we'd have two kinds of {{nopt}} and things will get complicated to explain. Inductiveloadtalk/contribs 08:29, 4 May 2021 (UTC)

end of a page in Page: ns[edit]

@Pigsonthewing: Apologies of for the length, but it needs to be first principles.

ProofreadPage (PrP) is a set of javascripts that makes a mediawiki page appear to be three separate sections. At the end of a body section you basically have the text end and a "noinclude" that determines where the footer section starts and ends. In the following examples, the highlighting represents Page: ns presentation, the unhighlighted is what the wikitext looks like

standard empty footer
mere bulky tomes,
[footer start]
[footer end]
mere bulky tomes,<noinclude></noinclude>

So where we have table splits we need to trick Mediawiki into continuing a table it is utilising Mediawiki's noinclude terminology as it is embedded and interpreted by ProofreadPage

N table end on the first line of the footer[page 1]
|-
|Bilston
|9
[footer start]
|}
[footer end]
|-
|Bilston
|9<noinclude>|}</noinclude>
Question? use of placeholder {{nopt}}[page 2], a hard return, table close[page 3]
|-
|Bilston
|9
[footer start]
{{nopt}}
|}
[footer end]
|-
|Bilston
|9<noinclude>{{nopt}}
|}</noinclude>
YesY without placeholder,[page 3] so hard return at first footer line, then table close on subsequent line
|-
|Bilston
|9
[footer start]

|}
[footer end]
|-
|Bilston
|9<noinclude>
|}</noinclude>


Pictogram voting comment.svg Comment Other coding direction in tables

  • a column row starter (|-) please do not terminate the body section with these, it can lead to aberrant behaviour, especially can foul up marginal page numbering on tranclusion
  • use of {{nop}} at the start of a new body section should be avoided[page 4]
  • we suggest to use {{nopt}} at the head of the subsequent body section to give mediawiki the sense of a new line for table row markers[page 2]
Notes
  1. doesn't work as the table close (|}) needs to be on a new line for wikitext
  2. 2.0 2.1 Template:nopt is an empty <span> when transcluded
  3. 3.0 3.1 use of the placeholder is actually redundant and only serves as a visual cue.
  4. Template:nop is an empty <div> when transcluded
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. — billinghurst sDrewth 00:53, 9 May 2021 (UTC)

@TE(æ)A,ea.: I am not sure to what you are referring. The header is just another <noinclude> section that is not transcluded. If you foul up formatting in that header section it will bleed through the page in Page:ns, but not come through with a transclusion. If you code in a that works in the Page: ns around with specific code inside the header, then it may foul up when you transclude. If you have complex and repeating table headers in Page: ns, then I always suggest that they become standard templates as that just makes your life easier. Well, that is how I have handled it in the past.

And this is why I don't do help pages. Too complex, nitpicking that takes me forever. — billinghurst sDrewth 01:49, 4 May 2021 (UTC)

I have previously setup {{Hussey Churches table header}} for a work, though would not be adverse to us thinking about a specific ability to have something work-specific like transcluding pseudo-templates that could be subpages of something like Index:Notes on the churches in the counties of Kent, Sussex, and Surrey.djvu. Now that we have .css style pages that are subsidiary to an Index: it is not a long straw to draw to think about how we have other work specific components in the Index: namespace subsidairy to the index. I know that I have had light level discussions about work level header templates and their preloading conversations. Thoughts @Inductiveload, @Xover, @Samwilson:? Do we push everything to Template: namepsace, what are the pros and cons of utilising work specific components? It adds complexity though it separates work specific aspects. — billinghurst sDrewth 02:31, 4 May 2021 (UTC)

[outdent] Thanks all; I don't follow all of the above, but the table is now working. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:53, 4 May 2021 (UTC)

tl;dr for the record:
  • Do not put |- at the end of a page
  • Do put this at the start of a continued table page
{{nopt}}
|-
  • If a table continues onto the next page, close it on this page in the footer with
{{nopt}}
|}
  • You might get an unexpected gap between the header and first row in the Page NS: this is mildly annoying, but doesn't bleed into mainspace.
  • tl;dr;tl;dr it's all a PITA.
Inductiveloadtalk/contribs 10:19, 4 May 2021 (UTC)
Help:Table hopefully contains all that information. — billinghurst sDrewth 10:42, 4 May 2021 (UTC)
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. — billinghurst sDrewth 12:42, 10 May 2021 (UTC)

French speaker needed[edit]

Would someone proficient in French kindly check the three pages starting at Page:Aerial Flight - Volume 2 - Aerodonetics - Frederick Lanchester - 1908.djvu/374? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 08:26, 4 May 2021 (UTC)

I've reviewed this pages; accents were missing in the source, e.g. in Pénaud; I've used a SIC only for the first occurrence in the text. I've used symbols ′ and ″ for prime and second. M-le-mot-dit (talk) 11:05, 4 May 2021 (UTC)

Google OCR does not work[edit]

⧼error⧽ undefined cURL error 60: SSL certificate problem: certificate has expired (see http://curl.haxx.se/libcurl/c/libcurl-errors.html)

Anybody knows where is the source of this problem? Ankry (talk) 09:04, 6 May 2021 (UTC)

@Ankry: It should be back up now. There was a transient problem with the SSL certificates for all services hosted on Tool Labs. I don't have details on the cause or remedial actions, but the issue was reported regarding multiple other tools and subsequent reports that the issue was no longer present. Xover (talk) 11:12, 6 May 2021 (UTC)
Yes check.svg Done confirmed. Ankry (talk) 11:38, 6 May 2021 (UTC)

Text in both margins[edit]

Suggestions, please on layout for pages like Page:The Grand junction railway companion to Liverpool, Manchester, and Birmingham; (IA grandjunctionrai00free).pdf/41, which has text in both margins. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:40, 6 May 2021 (UTC)

Push it all left, otherwise it looks like the teeth of a saw, and is just problematic when transcluded. If you are truly concerned about the Page: ns, then there are some templates that show right, and transclude left. — billinghurst sDrewth 14:14, 6 May 2021 (UTC)
And how will that relate to the subheadings "From Birmingham" and "From L'pool & Manch'r"? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:12, 6 May 2021 (UTC)
How about {{Sidenotes begin}} and its siblings, as seen in The_Solar_System/Chapter_1? The rendering won't be exact (in particular I’m not sure how you would get the rules down the page), but it will get you lined up with the text nicely in a way I’m not sure how else to do… — Dcsohl (talk) 17:25, 6 May 2021 (UTC)
Footnotes, with the cute arrangement of to and from distances linked to the nearest full stop, period CYGNIS INSIGNIS 18:38, 6 May 2021 (UTC)
I've implemented a form of that on the above page, but with the footnotes attached to the place where the mileposts are mentioned in the text. The footnotes are going to get very repetitive like that though. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:43, 6 May 2021 (UTC)
We also need to account for the change in presentation on pages like scan 64, which differentiates between two types of content. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:08, 6 May 2021 (UTC)
i would be tempted to ignore the double side notes. some right waypoints are in the text. unclear what the value added is. Slowking4Farmbrough's revenge 22:34, 6 May 2021 (UTC)
@Pigsonthewing: On second thoughts, don't see them them as marginal, and just see it as a table, and set it a three column table, centre floated, and manage the intervening lines as a merged column and embed {{RunningHeader}} for that row. (Don't answer questions late late at night when dog-tired.) — billinghurst sDrewth 00:30, 7 May 2021 (UTC)

Tech News: 2021-19[edit]

15:10, 10 May 2021 (UTC)

Lower case naming of book by myself[edit]

Index:Some feudal coats of arms.djvu - I think I should have used caps when creating this and I don't know how to fix it. It's because I saw it in lowercase in the WorldCat books catalog, but also the name is sooooo long. I think it should just be Some Feudal Coats of Arms. Can someone fix it. I'm sorry. --The Eloquent Peasant (talk) 21:43, 10 May 2021 (UTC)

@The Eloquent Peasant: the name of the file (and therefore the index page) isn't really important as long as its not actively misleading. Just update the link to mainspace to be Some Feudal Coats of Arms. No need up be sorry :) Inductiveloadtalk/contribs 22:00, 10 May 2021 (UTC)
Thank you! --The Eloquent Peasant (talk) 23:08, 10 May 2021 (UTC)
@The Eloquent Peasant: and the thing is that the standards used by cataloguers change so both are right at different times. So for us both forms are right, just ensure that you create redirects between the forms. In the end, as Inductiveload said, as long as it can be found as an accurate title, then no issues. We rile about the use of all capitals as that is just butt ugly. — billinghurst sDrewth 00:04, 11 May 2021 (UTC)
I have 1 GF. My husband says shouldn't come around 'cause she's butt ugly - ... but I like her and besides, while he's not lying, it's mean to say she is. She's a great person oh except she likes to drink too much. So for that reason, I keep her at bay and also because she lives 4540.9 miles away. All caps is crazy for a book name and all this makes me realize I should have more than just 1 GF. :) Thank you. :) --The Eloquent Peasant (talk) 00:10, 11 May 2021 (UTC)

I have .png and .jpg images for the book and have uploaded both (38 files) as .jpgs and as .png files but now I don't know which one would end up being used in the book here on Wikisource. I cropped and reuploaded one .jpg but the book has tons of images. Which is preferable here, .jpg or .png? See here: [[c:Category:Some feudal coats of arms (Book) The .pngs are a little big. Thank you. --The Eloquent Peasant (talk) 01:57, 11 May 2021 (UTC)

@The Eloquent Peasant: If you are talking page illustrations, we are file type agnostic and more interested in the output. Probably the best guidance of which is better in which situation is c:Commons:File types, and as while their primary use is the transcluded work here, their re-use anywhere should be more in your thinking. — billinghurst sDrewth 02:27, 11 May 2021 (UTC)
@Billinghurst: ok. --The Eloquent Peasant (talk) 02:29, 11 May 2021 (UTC)
@The Eloquent Peasant: in this case, since the images come from a source that is already compressed with a JPEG-like compression scheme (the IA usually uses JP2, but the idea is similar) there is not a lot of value in PNG. This is because PNG expends a lot of file size on slavishly recording all the pseudo-random noise produced by the JP2 compression, which is pretty much incompressible under the PNG format.
However, if you were to extract the images and clean then up so that they are black-on-white diagrams and the image noise is removed, then PNG would be a good choice. This is because the sharp edges between colours (i.e. black and white) represents "high-frequency" image data, which produces substantial image noise when compressed lossily as JPG. For example, below is the image noise (coloured red) that is introduced when a diagram is saved as a JPG:
April 01-40N-2100-Fieldbook of Stars-025 - JPEG noise.png
Also, since the colors are limited in a greyscale, you may find that a greyscale PNG of a cleaned diagram is roughly comparable to a JPG anyway (YMMV, there are too many variables to make a global statement here).
tl;dr rule of thumb: JPG for photos and things that are already JPGs, PNG for "clean" diagrams. Inductiveloadtalk/contribs 07:18, 11 May 2021 (UTC)

Title for a Copyright Office letter[edit]

I have at User:BD2412/Affirmance of Refusal for Registration (Prancer DNA Sequence) an untitled letter from the Copyright Office to a copyright applicant. The letter indicates who it is from and to, and has a correspondence ID, but I'm not sure how to title it for the move to mainspace. Wikisource:Style guide appears to offer no guidance on this. BD2412 T 04:33, 11 May 2021 (UTC)