Wikisource:Scriptorium

From Wikisource
Jump to navigation Jump to search
Scriptorium
The Scriptorium is Wikisource's community discussion page. Feel free to ask questions or leave comments. You may join any current discussion or start a new one; please see Wikisource:Scriptorium/Help. Project members can often be found in the #wikisource IRC channel webclient. For discussion related to the entire project (not just the English chapter), please discuss at the multilingual Wikisource. There are currently 425 active users here.

Announcements[edit]

Index transclusion status now in the Index page edit form[edit]

As many of you may have noticed in your watchlists, Index page transclusion status and validation dates are no longer recorded in a template, but are a proper part of the Index page edit form. All existing uses of the templates have been migrated over. The usage remains unchanged - transclusion status refers to how much of the work is transcluded, and is somewhat independent of the proofread status - it is possible for a work to be fully transcluded but not validated.

There was a brief period where some Indexes had multiple categories while the usage was changed over. These should naturally resolve as the categories update, or you can force it by purging or editing the page. As always, let me know if something is looking broken even after a purge. Inductiveloadtalk/contribs 13:00, 22 April 2021 (UTC)

1000 pages processed in the Monthly Challenge[edit]

Wikisource laurier.svg

The first-ever Monthly Challenge, after a fairly sedate start, passed 1000 processed (marked no text, proofread or validated) pages yesterday, and has sustained over 100 processed pages for 2 days straight. Very, very approximately, This represents 10% of pages at enWS processed in the last 2 weeks.

In the spirit of friendly competition, there were more pages processed the day before yesterday than Mission 7500, the French equivalent challenge (which recently has been racking up ~12000 page a month) did yesterday. We're still to reach a milestone of 200 pages in a day, at which point the stats table gets a shiny green row.

There is nothing different about proofreading in the Monthly Challenge, and everyone is free to dive in. Nominations for future months are always open, but if you really want your favoured book to feature, you will need to help make a space for it! Inductiveloadtalk/contribs 01:05, 13 May 2021 (UTC)

Proposals[edit]

New Request for Comment on Wikilinking Policy is open[edit]

I have just opened Wikisource:Requests for comment/Wikilinking policy. You will find there a proposed complete overhaul/rewrite of the current policy, which is now ready for review by the wider Wikisource community. It is proposed that the RfC will be open for two weeks. Please make your comments there rather than here. Beeswaxcandle (talk) 08:33, 14 March 2021 (UTC)

@Beeswaxcandle: I think 2 weeks / 72 hours is a little bit too aggressive, even for a presumed uncontroversial policy proposal like this. I understand the reasoning, but I just don't think the community is able to move that fast. For example, we have several long-time contributors that are currently in a phase where they check in only every couple of weeks. And I know for my own part that the local Covid status could easily make me too busy to check in here for weeks on end. We could still have an accelerated timeline (just not quite as accelerated as 2/72) if we notify of the proposal in an site notice and maybe even a talk page message to any established contributor that has been active in the last three months (or similar).
PS. And let me repeat my previous private kudos in public: you took my ongoing whining about the old policy and turned it into a concrete proposal for a new policy. Great work, for which I am extremely grateful! --Xover (talk) 09:25, 14 March 2021 (UTC)

Tweak archive settings for the Scriptorium[edit]

Currently the configuration for automatic archiving of the Scriptorium is set to archive threads in which there has been no new comments for 30 days, and to archive threads which are explicitly marked as resolved 31 days after the date they are marked as resolved. This means that in practical effect nothing ever gets archived by being marked as resolved.

In order to have the ability to clean out this sometimes a bit overwhelmingly long page I propose we change the interval for resolved sections to something more reasonable like a week, or possibly even 3 days. We rarely explicitly close threads here, and when we do it's the "Hey, how do I do this? / Here's how. / Ok, thanks."-type threads. Conversely, the threads that really need long-term visibility are either marked with the "do not archive until" tag (which lets you set an arbitrary future date before which the thread is ineligible for archiving) or, for things like the RFCs and proposals, once the discussion in the "Proposals" section is closed they should be posted as an announcement in the "Announcement" section where they will stay for an additional minimum of 30 days (the ordinary auto-archive interval).

Absent indications to the contrary my expectation is that this proposal is uncontroversial, so if there are no comments on this proposal I will take that as tacit approval. If anyone has any concerns with it I would appreciate comments to that effect, or even just "Wait, I have to think about it first". --Xover (talk) 09:17, 29 March 2021 (UTC)

And since nobody objected or yelled "Wait!", I've now tweaked the settings accordingly. I'll leave this thread open for a bit for stragglers, and after that anyone that wants to tweak the settings further can just open a new thread. --Xover (talk) 19:27, 9 April 2021 (UTC)
@Xover: Meh. This proposal gets lost in the morass of other components at the top of the page. Personally think that we need to rethink the scope of how the page works. As it is not working efficiently. I think that if we are going to have so many active proposals that they need to be subpages of this page and transcluded in, or linked out with some better means of notifications through Mediawiki:Watchlist-announcements (as we used to do. Numbers of the proposals that have come forward are blue sky thinking and more like WS:RfCs for long discussions, rather than simple proposals. — billinghurst sDrewth 00:40, 9 May 2021 (UTC)

I would happily see users who turned up yesterday (apparently) censored from kludging up the page, and this small community's time, with revived proposals. The page would also be a good deal shorter if nearly every thread was spared the often confounding and adversarial commentary of our resident bigot. With that aired, I think the scriptorium provides a lot of information to silent readers wishing to improve their contributions, buried within the commentary are solutions that are difficult to find in the help pages. Three days to a week is a very short time frame for this site, and often discussions need more airing and thoughtful input. I think the proposal addresses some of these concerns, but hope that urgency does not override the development of guidelines through broad and considered opinions. CYGNIS INSIGNIS 11:07, 9 May 2021 (UTC)

@Cygnis insignis: My proposal in this thread (which has been implemented) was only in regards threads that are manually closed (by adding {{section resolved}}), which we almost never do. All other threads are archived 30 days after the last comment. That's why I considered it appropriate to make the change so quickly and without a real !vote: it makes very little practical difference.
Billinghurst's subsequent comment (which echoes my own thoughts on some of the problems with this page) is directed at the broader issue of this page getting unwieldy, and will need both a broader set of changes to address, and broader discussion before any changes can be decided on. Xover (talk) 12:34, 9 May 2021 (UTC)
@Xover: Excuse the moaning then, I haven't seen any archiving here that was objectionable and what you've implemented sounds reasonable. CYGNIS INSIGNIS 12:41, 9 May 2021 (UTC)

Moral disclaimers for certain works[edit]

There are certain works that have a core message or consistently incorporate certain themes that most people would find offensive and morally reprehensible. I'm thinking specifically about works that were made for the purpose of promoting white supremacy. Some notable examples of these are: Thomas Dixon's The Clansmen and The Leopard's Spots; D.W. Griffith's films The Birth of a Nation (1915) and Intolerance (1916); Henry Ford's The International Jew (1920); Adolf Hitler's works; etc.

I think works such as these definitely need to be transcribed here, so that they can be viewed for historical purposes (as in, to understand what their arguments were and why they were made), and a transcription could for example make it easier for a user of our content to produce a rebuttal to said work. But the issue is that works like these are so bigoted in tone that their messages are simply indefensible, cruel, and morally reprehensible. I imagine many people who read our transcriptions of those works may get the idea that Wikisource's community, or the users who took the time and effort to work on the transcription, actually support the bigoted messages of these works, despite what Wikisource's project pages say about the project being NPOV.

So I propose that we create a disclaimer template, that we can put in the "Notes" section of the front matter page's header template. The template should say something to the effect of:

This text consistently promotes ideas that are particularly hateful or bigoted in nature. Please remember that Wikisource's community and its contributors do not necessarily endorse any opinions or ideas presented in any of its works, including this one. Works are presented as-is with no censorship involved, as transcription is done with a neutral point of view in mind, without bias for or against any particular ideology.

By the way, I think the disclaimer should only be included in works that have a consistently disreputable tone that may easily cause offense. I don't think that works such as Bobbie, General Manager or The Achievements of Luther Trant which casually dropped the n-word in a few times, but don't bring up racial issues much at all, should be given the template. However, a book focusing primarily on racial issues, taking a white supremacist stance, would qualify. PseudoSkull (talk) 19:12, 8 April 2021 (UTC)

I very much appreciate the underlying issue, but I'd be inclined not to do this. While it would have benefit for the most extreme and uncontroversial cases, such as those you list, there would be a tremendous number of works in a "grey area" where editors disagree, and/or where we lack the resources to even detect or evaluate subtle but reprehensible views.
Perhaps an alternative would be to put some careful work into a thorough essay along the lines you suggest, and link to it from somewhere prominent on the Wikisource main page. Rather than trying to attach it to every reprehensible work, simply express our position clearly in one central place.
I always think it's worthwhile to think about the precedent of traditional libraries. Would you expect to find your local library had inserted a position statement into its copy of Mein Kampf? It seems unlikely, though I could certainly see them having a general brochure available at the front desk explaining why they carry such works. -Pete (talk) 19:32, 8 April 2021 (UTC)
@PseudoSkull: Just to tie my comment a little more closely to your proposal, and focus on how things would play out in practice: How would you imagine things going if somebody strongly disagreed with you, and felt that Bobbie, General Manager was indeed reprehensible? (I have no familiarity with this particular work, just following your example.) How would we come to a decision? Would the process tend to deplete the time or emotional energy of various volunteers? Would the end result, regardless of what it is, bring much benefit to the reader? -Pete (talk) 20:12, 8 April 2021 (UTC)
Some editors have had similar discussions regarding the practice of using project disclaimers on works such as encyclopedias. I'm pretty sure no consensus was ever reached, and thus no action ever taken. In my personal opinion, a general disclaimer that covers all Wikisource works, perhaps placed prominently on the Main Page and in the footer, should suffice for both purposes. —Beleg Tâl (talk) 20:11, 10 April 2021 (UTC)
@PseudoSkull: I am not adverse to such a template being added to the corresponding talk page, and the use of "edition = yes" in the header to put the pointer. I have a preference to keep commentary out of main namespace, and keeping it as clean as possible. — billinghurst sDrewth 13:04, 9 May 2021 (UTC)

Pictogram voting comment.svg Comment Noting that in the footer of every page that we produced there is a link to Wikisource:General disclaimer. The text there should be reviewed and suggestions made. — billinghurst sDrewth 00:48, 9 May 2021 (UTC)

Creation of Version Pages for the Individual works of Charles Dickens[edit]

In the discussion of the proposed move for Oliver Twist to it’s correct edition name and the creation of a version page on Oliver Twist, User:Xover informed me that such a proposal requires a formal vote prior to a second version of the text being proofread. Therefore, I’m submitting a formal proposal to the community to discuss the pros and cons of creating version pages for the individual works of Charles Dickens. It’s important to note that Dickens usually published 6-12 distinct editions with his editorial revisions and listing them on the author page occupies a significant amount of space. Languageseeker (talk) 15:45, 11 April 2021 (UTC)

I think this discussion (which is occurring in several places) is a bit stuck, and could benefit from a more explicit articulation of the basic problem you're trying to solve. Personally I think you've made it pretty clear, but that's just my perception…it also seems like others have had trouble seeing it. So I'll take a crack at that:
Up until recently, the author page for Charles Dickens dedicated a few lines to the novel Oliver Twist. (The same issue applies to many of his novels, but I'll use this one as an example.) These lines identified several editions of the novel; but they were not comprehensive, nor did they offer complete bibliographic information. More recently, this was fleshed out in much greater detail (see here). But this increased detail, which helps the reader readily learn about the various editions, has an undesirable side effect as well, i.e. it makes the Author page unwieldy and difficult to parse. One might reasonably assume that most readers are interested in an overview, like "which novels did Dickens write," and that only more specialized readers want to delve into the weeds of which editions are which, etc. A natural organizational principle for a website is to put the high-level information on the main page, and then move the more detailed information to the page linked, which was accomplished by creating Oliver Twist versions. This appears (to me) to be in keeping with common practice and with documented procedures at Wikisource.
Other experienced editors seem to disagree. I believe the principle is that a "Versions" page should only exist when there are multiple editions or versions transcribed on Wikisource, not merely published elsewhere.
I think the introduction of this last principle is the root of the present problems. As far as I can tell, this principle is not in Wikisource's documentation of versions pages, so an assumption that other editors would know it strikes me as potentially a bit rude. But more significantly, I think the principle itself is a bit myopic. A reader may well want to learn about multiple editions, and they may be glad to find bibliographic information that will permit them to find them in a traditional library, and/or links where they can find it at Gutenburg, Internet Archive, LibreVox, information at Wikipedia, etc. Does the reader really care, as a primary principle, whether or not they find a transcription specifically on English Wikisource? To me that seems like a principle that overvalues our specific work, at the expense of broadly respecting the work of scanning, transcribing, documenting, and otherwise preserving published works (which occurs at many websites and traditional institutions).
I would argue that we should freely move detailed information about a work that has had multiple editions published to a "versions" page, irrespective of whether or not any particular number of them has been transcribed here at Wikisource, whenever that action serves to make an "Author:" page more readable/useful (or for any number of other reasons). It doesn't strike me as particularly controversial, nor contrary to any policy I've found, to do so. -Pete (talk) 18:45, 11 April 2021 (UTC)
Thank you for your extremely detailed and cogent summary of the matter. You've said it better than I probably could. One of the biggest issues is that the works of Dickens have an extremely complicated publication history that requires access to either his letters or the Clarendon Dickens to sort out. In certain years, Dickens released multiple distinctive corrections of a particular work. Therefore, you can't even say the 1838 edition of Oliver Twist. Instead, you have to describe the title page to identify the edition. Does it say Charles Dickens or Boz? To add to the confusion, in at least one case, Dickens published two different revisions of one edition differing in one plate and several other corrections. Furthermore, Dickens corrected a number of errors in each revision and the printer made even more. Therefore, the scholarly consensus is that the texts got worse as time progressed. Not all of these versions are available as scans on IA. Therefore, to make sure that future users can be sure that they're tracking down the right editions, I'm recording the descriptions of the individual versions on Wikisource so that future editors know what to find. Languageseeker (talk) 19:39, 11 April 2021 (UTC)

Not convinced. Disambiguation pages are meant to be simple directing pages. They are not meant to be long explanatory documents, and definitely not the encyclopaedic articles. If we need to do something special then can we look to do it in the author: ns, probably a subpage where we can curate things in a more holistic sense. LBT has done that with quite a few detailed descriptive and explanatory subpages to her poetic authors. — billinghurst sDrewth 14:38, 12 April 2021 (UTC)

Why not? Either we're going to have these on author pages or author subpages or versions pages, and having them on author pages would make them huge, so put them on the versions page.--Prosfilaes (talk) 23:19, 12 April 2021 (UTC)

Why not? Because there is no version page and means unnecessarily disambiguating when we have no guarantee that the works will ever exist on-site. Get the works and we disambiguate. This is not the encyclopaedia, and encyclopaedic explanations do not belong on versions pages. — billinghurst sDrewth 00:14, 13 April 2021 (UTC)
The bigger question is do works by an author belong on the author page even if there are no scans available? Each edition of Oliver Twist is a work by Charles Dickens distinct from his other works. Perhaps, we can just group them in by headings. Languageseeker (talk) 00:29, 13 April 2021 (UTC)
We have tons of lists of works we may never have, some of which won't be uploadable for decades. It seems entirely within our mission and practice to offer lists of important variants of one work. There might be a place where this gets too much, but a carefully selected set of works by Charles Dickens seems pretty far from that place.--Prosfilaes (talk) 00:57, 13 April 2021 (UTC)
(e/c)A list of works by an author belongs on the author's page, even if they are currently redlinked. At some time in the future a scan will appear. The purpose of a versions page is to list the versions we do currently host so as to provide a way of finding them easily via a search for the work's name. Most readers coming here to read a particular work just want to know if we have a copy and don't really care about a list of copies we don't yet have. In other words, I don't support the proposal to create version pages for each of Dickens' works in anticipation of hosting other editions at some undefinable time in the future. Beeswaxcandle (talk) 05:00, 13 April 2021 (UTC)
For many works, a scan may never appear, and if it does, it will be at some undefinable time in the future. There are a lot of works that I'd expect a scan for long after all the editions of Oliver Twist are complete. I'm not dead-set on this going on version pages, but we have works that people are actually interested in eventually doing, and putting them all on Dickens' page is going to make too much noise.--Prosfilaes (talk) 07:48, 13 April 2021 (UTC)

A scenario worth considering: How does a reader come to land on a Wikisource page about Oliver Twist, and does Wikisource serve up something that is appropriate? I imagine that most people who arrive here would be referred by another Wikimedia site, or by a Google search. If they land on a specific edition which provides no indication that there are other versions, or why they are looking at this version, is that a good outcome? If not, how should we address it? One approach would be the one Languageseeker has suggested. I have not yet seen an alternative proposed, but I'd like to. I suppose we could have an extensive "notes" section at Oliver Twist -- is that what those opposing the existence of a versions page would like? Or if not that...what? -Pete (talk) 04:52, 13 April 2021 (UTC)

I've kind of answered this in my previous post. However, we do not want an extensive note about other editions at the sole edition we currently host. However, a note could point to the Author and/or the Wikipedia page for detailed bibliographic information. The quickest way around this issue for Oliver Twist is to locate another edition, proofread it and then make it available. At that point create a versions page and put the {{other versions}} template on each of the hosted editions. The versions page would then have two editions on it. Beeswaxcandle (talk) 05:06, 13 April 2021 (UTC)
Poor Oliver Twist really did come into this world to make trouble. If this is the policy, it should be stated and all pages that violate it should be sdeleted. For now, it seems time to create a version page for The Pickwick Papers because there are two versions. May I suggest calling the present one The Pickwick Papers (Project Gutenberg). Languageseeker (talk) 05:36, 13 April 2021 (UTC)
@Beeswaxcandle: I appreciate your direct answers, and I think they demonstrate clearly where we disagree. (As a side note, I think it's unfortunate that these sort of basic philosophical questions about what we're doing here are so unresolved, and given that they are, it makes sense that we'd have strong disagreements about specific things like this. But I digress.)
Anyway, I disagree with this:
"The purpose of a versions page is to list the versions we do currently host..."
I believe that is a purpose of a versions page, but not the be-all, end-all reason. I believe that readers come here with expectations and needs that vary according to the user, and according to the kind of work they're seeking; but I doubt that very many of them are especially concerned with the distinction of whether a work lives here on Wikisource or not. I believe the reader is well served if our pages provide some context about the work they're viewing; and in some cases, there might be particularly important context to provide. From what Languageseeker has said, it seems that Dickens' works are just such a case: there are many editions with substantively different content, and varying provenance, and the Wikisource editors who happened to choose one of those editions may or may not have even been aware of the variety of choices available, much less made a well-informed decision about which to transcribe. In my view, the least we can do is prominently and concisely provide contextual information about what editions exist, before guiding them to the one we happen to have.
It's worthwhile to keep in mind that Wikimedia's structure beyond Wikisource may deepen the sometimes erroneous impression that Wikisource's transcription is of the most authoritative work. A Wikidata item about the work (as opposed to the edition) may link to the Wikisource transcription of a specific edition. That might be OK in a case where there are no major differences among editions; but where there are such differences, is it really a good thing for Wikisource to have no page for the Wikidata item to link to? The Wikidata item for Moby Dick links to a nice, concise versions page which contains only one transcribed version, and one version that appears not to be transcribed here. Is there a reader that is harmed by the existence of that versions page? Personally, I am pleased to know that there are different UK and US first editions, and I'm fine with clicking through that page to the one that exists on Wikisource. Is it really better that the Wikimedia universe convey the impression, in all the places that consult Wikidata, that Wikisource has nothing about Moby Dick?
In short, I think it's important that our policies around versions page leave some room for individual judgment by people who know something about the work. I'm not arguing that we need to create a versions page for every single work that has multiple editions, but rather that where an editor believes there is value in doing so, we should not generally interfere with that editor creating such a page, linking it appropriately on Wikisource and Wikidata, etc. I do not see the harm in taking this approach, and I'd been under the impression up until now that it's the approach we do take. -Pete (talk) 21:18, 15 April 2021 (UTC)
  • this last principle is the root of the present problems I disagree with this assessment. I think the root of the misunderstanding here is a desire for Wikisource to be something which it is not. In particular, Wikisource is not a bibliographic database. The purpose of Wikisource is not to collect all bibliographic data for a single edition, nor to collect bibliographic data for all editions. It is especially not for collecting extensive bibliographic data on an arbitrary subset of editions of a given work that someone finds particularly interesting.
    We do collect such data when it aids our primary work, but then we do it on WikiProject pages (if there is active ongoing work related to it) or Author talk pages (long term storage of research for the benefit of other contributors). If there are (semi-)objective criteria for whatever the subset is, it may also be appropriate to create a Portal: for that subset (a portal being a thematic collection of works or editions of works).
    This is sometimes sinned against: we list lots of editions on Author: pages that should ideally only contain works, and our versions pages tend to amass entries for editions that we do not yet have (for various reasons), and mostly this is fine and bothers no one. It's not something that really has to be enforced as a bright-line rule, and I think a lot of contributors find them useful or at least not in any way bothersome.
    But the problem comes with special pleading for one's pet project to override the wider good of the project. For example by insisting we move aside a proofread work (our primary purpose) for an arbitrary selection of bibliographic data (not at all our purpose), for editions we do not currently have and have no particular expectation of appearing any time soon. Such a plea from someone actually working on proofreading multiple editions of Dickens' works would get a very sympathetic hearing with me, because at that point there is a genuine need to structure pages to accommodate multiple editions. This current request, and all the words expended on it, is not that. --Xover (talk) 06:26, 16 April 2021 (UTC)
But we frequently are a bibliographic database. We have author pages, like Author:Isaac Asimov, stuffed with works that will not be PD in decades. A careful distinction between works and editions is purely bibliographic, artificial and not productive for us. I can see an argument that we should only link works/editions that are actually in the process of being worked on, but that's not what we do. If people on Wikisource want to list a set of works they want to work on, I don't think it matters whether they're "works" or "editions" by some definition.--Prosfilaes (talk) 08:22, 16 April 2021 (UTC)
I would actually support removing all the dashed redlinks for works in Author ns. Completely ugly and unnecessary. I would also comment that works under copyright should use {{copyright until}} and not be plain redlinked. Though having these arguments here will detract from finishing the pertinent argument which I still do not support. — billinghurst sDrewth 12:30, 16 April 2021 (UTC)
No, we frequently amass bibliographic data as a by-product of our main purpose, but that's fundamentally different from being a bibliographic database. Even ignoring the issue of our purpose, we do not have any tools that would set us up to be useful for creating such a database or for using us as such (one primary mode of use would be semantically rich querying: even the monstrosity that is WorldCat can distinguish queries for authors vs. titles). If your goal is a bibliographic database you and everyone else will be better served by working on either Wikidata or OpenLibrary, both of which have far far superior tools for the purpose than we do, and are set up along similar principles (crowed-sourced, openly licensed).
The distinction between works and editions is not artificial, but definitional; and it happens to be a good distinguisher for us in practice. Listing all works by an author is desirable and achievable, but listing all editions quickly becomes impossible. That's why this thread only proposes to collect an arbitrary subset of them. It also makes author pages entirely unwieldy and no longer fit for their purpose, which is why this thread proposes to move them to a separate page. In other words, ignoring the work—edition distinction and the difference between bibliographic data collected as necessary and incidental to our main purpose, and as a primary purpose, would lead to preemptively creating versions pages for every single work and producing—byte for byte, character for character, and page for page—more bibliographic data than actual content.
And you turn the issue on its head with If people on Wikisource want to list a set of works they want to work on, I don't think it matters whether they're "works" or "editions" by some definition. It's always ok to only list the works you actually want to work on. We're a wiki, so someone else will hopefully add the rest at some point. It's preemptively listing editions that you have no intention of ever working on, and supplanting actual proofread content that someone else actually has worked on, that is the problem here. That's why it's necessary to talk of high-falutin' principle stuff like "the purpose of Wikisource": doing that only makes sense if our primary purpose is amassing bibliographic data. It is also only a problem when you start focussing on the editions as the primary information: so long as you stick to works there's no need to move mainspace content around (except possibly to correct titles), until you're actually proofreading an additional edition of it. --Xover (talk) 09:16, 16 April 2021 (UTC)
The distinction between works and editions can be both definitional and artificial; working on LibraryThing has shown me whether two books are the same work can be a fraught and complex discussion. There are real Ship of Theseus issues in some works; are all the editions of the Encyclopedia Britannica the same work? If yes, you're combining volumes that have no text in common as the same work. Are all editions different works? Then you're designating as different works things with merely orthographic changes. Not only is the third option complex and fact dependent, the whole Ship of Theseus thing means that's there's no clear lines.
No, listing all works by an author is often not achievable, and is frequently problematic in the same way that listing all editions is. A newspaper journalist may have at least one work in every daily newspaper for thirty years. A proper list of Dickens' works includes any number of articles from Household Words and other periodicals, plus 12 volumes of letters. I fail to see why adding select editions that people want on Wikisource is a problem.--Prosfilaes (talk) 07:46, 17 April 2021 (UTC)
Shrug. There are certainly edge cases; but the difficulty in making the determination does not affect the nature of the distinction. And as I said, amassing metadata about some additional editions isn't a problem in itself so long as it is treated as a by-product of our primary purpose, a secondary concern. It is when it is promoted to a primary concern it starts causing problems. For example by making Author:Charles Dickens so unwieldy that those "tidying" it (that is what a tidied author page looks like?!?) feel it imperative to move the one actually proofread edition of Oliver Twist we have aside in favour of their own subjective selection of "important" editions, and, when declined, to create a Oliver Twist versions and a subsidiary network of things like Oliver Twist (Charles Dickens Edition) (and disagreeing is apparently so rude that one becomes persona non grata with the proponents). It leads to pseudo-encyclopedic non-neutral link-farms like David Copperfield (Authoritative? egads! Why don't we just create an Amazon Affiliate account and be done with it?).
Meanwhile, not a single page of any Dickens work appears to have been proofread, beyond a single Match & Split that didn't even bother to clean up the obvious breakage afterwards. Why would we privilege being a poor man's LibraryThing (which is not our purpose, and for which we have no even remotely adequate tools) over actually proofreading works (the purpose for which the project does exist)? I'm more than averagely fond of bibliography too, and would like to do it within the Wikimedia movement, but the way to do that is advocating for improvement to and integration with Wikidata and the front ends / user interfaces to it, that the WikiCite folks have been having conferences about for going on a decade without any measurable progress.
It is entirely appropriate to discuss individual exceptions where merited. Dickens is a big (notable/important) author, and a prolific one. I am sure there is a coherent argument to be made related to the nature of the early editions of his works (it just hasn't been presented yet). As an exception I'd certainly be willing to entertain the notion, though I suspect it would lack some key factors to persuade me (part of the need can be met by one or more Portal: pages, and the other parts are moot unless a significant number of actually proofread works exist to create a practical, rather than theoretical, need). But as a principle I am vehemently opposed, both because it turns the nature of Wikisource on its head for no good reason, and because it just simply is not a good idea. --Xover (talk) 09:54, 17 April 2021 (UTC)
As I said, these are not some random editions that I happened to hear about. These are the editions that Charles Dickens revised himself. Such revisions range from correcting errata, to rewriting passages, to replacing plates. I'm planning on adding transcription projects as soon as Commons fixes its system. Honestly, from this conversation, I believe that we need an actual vote system and not just long discussion threads after which an administrator decides what to do. Languageseeker (talk) 12:44, 17 April 2021 (UTC)
Also, by properly defining the edition, it allows the reader to know which edition they are reading. Imagine, if the first version of Hamlet on Wikisource was Q1, would you vote against moving it to Hamlet (Quarto 1) and listing the other Quartos and Folios? This is an exact parallel case. For Oliver Twist, besides the errata, Dickens rewrote much of the book in 1846 making there two very distinct texts. At this point, a few of the Dickens works have versions created prior to me and others do not. Languageseeker (talk) 12:52, 17 April 2021 (UTC)
@Languageseeker: We are not saying any of that isn't the case, you have told us over and over and we understand. When you or someone else has another edition we will disambiguate it. Until then, record the information that you want in the Author namespace as that is where we have typically done that work, and we have not prematurely moved editions just because there may be another edition. — billinghurst sDrewth 13:32, 17 April 2021 (UTC)
On fr. we do our best so that access to the latest version of the text corrected by the author is the privileged goal of the work of wikisourcists, but all editions are welcome too. --Zyephyrus (talk) 15:10, 17 April 2021 (UTC)
Great to hear from fr., you're the reason why I discovered the English Wikisource. I'm astonished by how many texts of major authors you have scan backed. Merci!
For Dickens, it's actually the opposite. The first editions are the best and the last edition is usually considered the worst. Languageseeker (talk) 15:19, 17 April 2021 (UTC)
The nature of the distinction is that it's an artificial one. For any two works, there is a set of works in Borges' library (or generatable by computer program) that are one letter away from each other, connecting the two works. It's not difficult to make the distinction; it's impossible to make in any non-arbitrary way.
My problem is not that we should collect bibliographic data; it's that we shouldn't. We shouldn't have long lists of works that no one is going to upload any time in the next decade on pages like Author:Isaac Asimov, but as long as we do, we should not use bibliographic rules to decide whether or not an item gets listed.--Prosfilaes (talk) 16:12, 17 April 2021 (UTC)

I recently saw a nicely constructed page of editions in the main-(ie reader-)space, that contained a link to a complete text here, being routinely edited to add a light blue (external) link to IA.org for the 1st ed. A solution to any objection might be to add the index to consolidate its inclusion in main-space, implying that someone [else] might want to labor on proofreading that choice as well; an emergent property of this example is moving that [more or less tolerated] bibliographic data from the Author namespace (our catalogue?). CYGNIS INSIGNIS 12:30, 17 April 2021 (UTC)

Formatting in header template[edit]

  1. I propose (again) that the title of the whole work be italicised.
  2. I propose (again) that the author of the work—or their 'contributed section'—not be italicised.
  • <emphasis>Supprot</emphasis> I also tink that, cheers to nom. CYGNIS INSIGNIS 12:44, 17 April 2021 (UTC)
  • Symbol oppose vote.svg Oppose Italics are usually fine for native speakers, but having worked on a multi-lingual dictionary, I can attest that italicizing text makes it harder to read for many non-native readers, especially those whose native writing system is not based on the Latin alphabet. There are things we take for grated about italicized text that do not apply in languages like Russian or Japanese. Personally, I do not think the name of the author or translator should be italicized either. --EncycloPetey (talk) 02:44, 18 April 2021 (UTC)
    that's one support for item 2, which is what constantly reminds about item 1. It's an interesting point, interfering with access. CYGNIS INSIGNIS 14:26, 23 April 2021 (UTC)

Allow the Shakespeare Quarto's with splitting the Pages[edit]

This proposal has two components. First, expand the scope of Wikisource to allow all copies of Shakespeare Quarto's. Second, allow the transcription of these Quartos without splitting the individual pages. Languageseeker (talk) 01:03, 18 April 2021 (UTC)

(1) What is the advantage to Wikisource of transcribing multiple copies of the same printing? (2) How do you propose to preserve pagination if there are multiple pages to each scan page in the scan copy? --EncycloPetey (talk) 02:39, 18 April 2021 (UTC)
All 32 copies of Hamlet have proofread text that needs mainly formatting. There can be printer variants between the texts. Also having all 32 copies will attract visitors, especially since the original site has been taken offline because OUP decided not to upgrade the flash infrastructure to html. For transclusion, you can use Help:Transclusion#Adding_section_labels. Languageseeker (talk) 02:48, 18 April 2021 (UTC)
I am not understanding the proposal as written. How is adding versions of a work expanding a scope. If that is not the proposal, then stop speaking in jargon and have an expectation of an author's work or what is done at another site. Are you proposing no scans, etc. If you are talking bout 32 different sourced versions of Hamlet, +++, that is within scope. Versions are versions, versions are allowed. Otherwise, I simply don't understand the scope of the proposal. — billinghurst sDrewth 03:03, 18 April 2021 (UTC)
@Billinghurst: Because you said it was out of scope and removed the links.

Update Inclusion Guidelines to prohibit future non-scanned backed works[edit]

I’m proposing to update the inclusion guidelines to exclude all works that are not transcluded. The veracity and accuracy of non-scanned back texts cannot be verified without significant effort while scan-backed works can. Currently, there are efforts to replace non-scan backed versions with scan-backed versions to improve the overall quality of this site, but this is continually made more difficult by the addition on non-scan backed works. If this proposal passes, all new, non-scan backed works will be deemed out-of-scope and speedily moved to the user's namespace. For works that are born digital, a Wikisource edition will require the creation of a PDF or other suitable digital reproduction. Languageseeker (talk) 19:44, 7 May 2021 (UTC)

So, you are proposing that we delete ca. 208,277 pages of content (i.e. 41% of our content). This includes works that are digitally native and therefore do not have printed texts to be scanned. Your proposal would do away with the match&split process and make it unavailable for the purpose you recently used it for in bringing in PG texts. Your proposal precludes the possibility of bringing in works for which there are no scans available. Beeswaxcandle (talk) 23:09, 7 May 2021 (UTC)
Question Question Are you proposing deleting all current non-scan-backed works, or are you proposing a ban-hammer on future non-scan-backed works? Either way, I’m opposed, but it’d be good to at least have that clarification. — Dcsohl (talk) 00:30, 8 May 2021 (UTC)
I am not proposing to delete past content, but to place a moratorium on future. For works born digital, a pdf can be created to preserve the content. If the goal of this site is to create accurate copies of textual sources, then there must be an original to compare it to. Otherwise, there is absolutely no way to guarantee accuracy. Placing a moratorium will allow for a slow, transition process to begin. If there is some overriding compelling reason why a particular work cannot be scan-backed, then that can be treated as an exception.
Match-and-split is a problematic and difficult process. Why it can be used to salvage past works, there is no reason to create additional work for the future.
@Dcsohl: Is there any particular reason why you are opposed? Languageseeker (talk) 03:41, 8 May 2021 (UTC)
Your description doesn't match "not proposing to delete past content". You also need to provide an explanation of how this will be policed. How will the exception process work? In terms of the "born digital", what's to stop me uploading a text then creating a pdf of the same content I just uploaded, and voila it matches? There's still no verification of the source or its public domain status. How are we better off? I recommend you withdraw this version of the proposal and rewrite it. A single paragraph of why does not explain the how. Beeswaxcandle (talk) 07:37, 8 May 2021 (UTC)
"all non-scan backed works will be deemed out-of-scope and speedily deleted." is that meant to mean all new non-scanned works or all new and current works? Maybe you forgot to include new in there? MarkLSteadman (talk) 22:02, 8 May 2021 (UTC)
I just think there are other valid sources for works besides scan-backed, the most notable of which would be PG. Are we going to forbid the import of all PG works henceforth? It should definitely be harder to bring in non-scan-backed works, but I’m simply not in favor of making it impossible. — Dcsohl (talk) 02:07, 9 May 2021 (UTC)
I would forbid the import of all PG works henceforth. They're well archived; WP can durably link to them. There's no value in copying them here.--Prosfilaes (talk) 15:03, 9 May 2021 (UTC)
Symbol support vote.svg Support, non-scan-backed works are bad for the following reasons:
  • It's hard to determine what the source of our transcription is, if no source is given.
  • It's hard to determine version information: is it the original, is it from a more modern reprint, a revised edition? In the cases of smaller works that are usually published in collections such as songs or poems, which collection was this particular version published in? Another common question: what version contained this introduction by Bob Bobberson, and is that version still under copyright? This information can unfortunately remain unknown without a scan.
  • It's hard to determine copyright status of the work, or our particular version of the work, when necessary.
  • It's harder if not impossible to verify for accuracy. Gutenberg texts have no guarantee of validity (and are sometimes wrong), they often arbitrarily correct typos or perceived typos (which is against Wikisource policy), and are by no means comparable to a scan of an original printed work.
  • It's easy to give an incorrect source link by mistake (see Talk:Winesburg, Ohio as an example), especially if you copy-paste large amounts of text from other sites within a short time span.
Quality over quantity. Editors and readers of our content alike need something original to compare our text to on the fly. Furthermore it is very obvious that modern Wikisource pages with scan backing are of far better quality than those from the past, or the present, that are copy-pasted from other sources with no scan. We have too large a backlog of pages from the 2000s or early 2010s, that were inserted with no scans to back them, simply because the technology wasn't there yet for it or that technology was in its infancy. We certainly don't need any more! PseudoSkull (talk) 04:35, 8 May 2021 (UTC)
  • Multiple many times oh-so-much opposed to this proposal. Requiring scan-backing for new texts is in itself a no-brainer; the hard part is defining what to do with existing non-scan-backed texts, and how to deal with born-digital texts and other new texts for which, for whatever exceptional reason, it is not sensible to require scans. These are the things a modified policy needs to address, and this proposal doesn't even attempt to do so.
    In particular, a dumb requirement for scans without addressing the details will just lead to a proliferation of arbitrary self-created PDFs that correspond to no published edition, hiding the problem in a "scan" instead of making it obvious. Does a Gutenberg text get any less problematic because someone printed it to a PDF file before cut&pasting the text here? Xover (talk) 06:10, 8 May 2021 (UTC)
  • no I have no issues with encourage, strongly encourage for old works, but as a requirement NO. There are some things that are not available as a scan, or not available as a scan of suitable quality. Also modern works are electronic and do not require scans, and do not require proofreading, and making a scan to the transclude to allow the text is straight daft. — billinghurst sDrewth 09:00, 8 May 2021 (UTC)
  • This type moratorium on future is very good conception in my opinion and it will bring much benefits and will raise value of project. More about it I wrote on this too long thread. Could we delete old pages? Absolutely no. We can't to delete anyone's work. I propose to place main emphasis on texts with scans and change old non-scanned texts to new texts with scans. In my experience on one much smaller project I know how many mistakes are on non-scanned texts. Mistakes by procedure copy-paste, disorted words and whole sentences, disastrous punctuation. We can't also prohibit to create version without scans especially if scans are not available on this moment. On my opinion data based on non-scanned texts are worth as much as texts and articles with no source. In my experience I know that the more are texts with scans, the more users (even newbies) add texts with scans, and non-scaned texts appear less often. And, at last, small digression, @Billinghurst:, I'm hearing your experiences and I assure you I appreciate it very much. I think that we need to oblige users to updating Wikidata preferably right after overwriting old non-scanned texts, if a adequate items are existing on Wikidata. Maybe it is impossible and I'm just idealistic dreamer. I think, if a scan exist or appear we should will place the most emphasis on change old and unbelievable text with no source to the new believable and verifiable. Exactly like on Wikipedia articles, when user is adding new informations based on sources he can change old unbelievable informations. Tommy J. (talk) 13:57, 8 May 2021 (UTC)
  • Oppose. This is a terrible proposal, with obvious, and obviously disastrous, consequences. I oppose any scan-backing requirements, and especially one so draconian as this. While you may find the existence of non-scan-backed works reprehensible, they are nonetheless an integral part of English Wikisource. Certainly, and especially for larger and/or more important works, the encouragement of scan-backing is acceptable; but a requirement of the sort you propose will greatly damage the project. TE(æ)A,ea. (talk) 15:02, 8 May 2021 (UTC)
  • Oppose. I can certainly see where this proposal comes from. However, our backlog of non-scan-backed works that should be scan-backed ("Shouldies") has two origins: 1) very old works that predate modern WS norms, and, indeed, the ProofreadPage extension ("Oldies") and 2) new works that should be scan-backed from the outset ("Newbies").
"Oldies" ae uniquely annoying, because they are usually not without value and are often "core" texts, often via PG (Which probably made sense in the mind noughties, when there were far, far fewer scans online). I certainly don't think that deletion-without-replacement is a good idea. While these works are probably a chilling effect on building out the core scan-backed corpus, because scan-backing an existing work isn't as "sexy" as a whole new work, deleting them outright with no replacement plan will just be blowing holes in the library for ideology's sake. I think an unsourced or non-scan-backed work is (just slightly) better than no work at all.
"Newbies" are easier to deal with, since they can be addressed at the time by engagement with the contributor.
I don't have a good handle on this, but "Newbies" don't seem to be an overwhelming proportion of "Shouldies". Most "Shouldies" that I come across are from the mid-noughties and are often complete, but unloved.
At the risk of talking the talk without walking the walk (since I am not personally volunteering to take point), I think what we actually need here is something like:
  • As I have previously suggested, figure out a more collegial and less adversarial improvement venue than WS:PD. This will allow us to put eyes on "Newbies" and hopefully get them sorted before they get buried. This might be a hypothetical WS:SCANS (or a whimsical name like The Bindery, whatever), possibly repurposed from the obsolete WS:OCR.
  • For "Oldies", management is broadly similar, but there contributors are generally long gone. My proposal, which is contingent on the Monthly Challenge becoming a "thing" is to reserve a slot or two for "Wikisource-internal" works, which could be an "Oldie" in need of scan backing. This would provide a slow, but hopefully steady, impetus to slowly chip away at the backlog. If we can bring more proofreading firepower to bear on it that the MC can contain, we can set up a separate WikiProject, but, honestly, I can't see that happening.
What I don't really want to see here is writing new policy that will strengthen the (IMO undesirable) tendency to view policy-backed deletion as a quality control measure rather than a last resort for unsalvageable works). I would like to see it made clear to new users that if a scan is available and sensible, a work should be scan-backed. But it is already pretty clear, e.g. at Help:Adding texts.
@Languageseeker: my suggestion to you personally is that if you'd like to see progress made on the issue, firstly you should attempt to quantify it. We have 283 works tagged with {{migrate to}}, and 24 tagged {{scans available}}. That's not an epidemic of "Shouldies", but I certainly do not think that count is anywhere near the true count. Secondly, I suggest that you (continue) work on the Monthly Challenge as that is, IMO, a very interesting and engaging way to get eyes on specific "important" (either due to "Shouldie" status, or as a "core" text, FSVO "core"). Finally, if you still feel MC is not enough, I suggest taking your quantified statement of the issue from step one and using that to set up scan-backign workshop where we problematic works can be laid out for review and improvement. Inductiveloadtalk/contribs 22:12, 8 May 2021 (UTC)
... AND users can link to PG from author pages, there is no need for their versions to be here. — billinghurst sDrewth 12:44, 10 May 2021 (UTC)
My thoughts on this in general are: 1) We should be very strict on {{no source}} when works new works are created without a source and if those are not resolved by engagement with the contributors at that point consider deletion. 2) It would be good to easily track the size of that category over time and see how we can resolve many of the works in it 3) We as a community should use things like POTM / Monthly Challenge / Community Collaboration / WikiProjects etc. to include some component to identify and encourage migration of the backlog so that we can bring up the proportion of scan backed content up significantly to bring this https://phetools.toolforge.org/graphs/Wikisource_-_texts_en.svg down rather than flat. MarkLSteadman (talk) 22:40, 8 May 2021 (UTC)
  • oppose filling a void here with proofread content from PG can be linked from other texts, a virtue of this site. Adding scan based texts is all I do, and I've gone to bother of adding an alternative to a PG text, but I value having a link to anything usable. The minutes taken to convert a text for use here versus the hours of one or more users it not such a delicate balance in evaluating their worth. CYGNIS INSIGNIS 18:12, 9 May 2021 (UTC)

Bot approval requests[edit]

Importation of Bookworm Bot from French Wikisource[edit]

French Wikisource uses BookwormBot to generate useful stastical information about Index such as the number of pages proofread, validated, blank, etc. Would it be possible to import this bot into English Wikisource? Languageseeker (talk) 22:47, 20 April 2021 (UTC)

Symbol support vote.svg Support Symbol neutral vote.svg Neutral (see below) for the purposes of trying out the frWS model of a page-wise monthly goal, but it's less a matter of "importing" and more a matter of asking @User:Coren what we need to do to enable them to turn it on for us and do that, and also gain consensus to grant the bot flag (which is fine by me). Inductiveloadtalk/contribs 19:10, 23 April 2021 (UTC)
Pictogram voting comment.svg Comment @Inductiveload: User:Coren does not seem to be extremely active anymore. Would we need him to set up the bot or could we do it ourselves? Languageseeker (talk) 03:51, 24 April 2021 (UTC)
Where is its code? [I will presume that it has been suitably licensed as Coren was good that way.] What are the prerequisites for running the bot? Fro where are we planning to run the bot? It says from the DB, rather than pulled from the API. Which db? Has all the data been added to WD, and pulled from there, or what? — billinghurst sDrewth 08:57, 26 April 2021 (UTC)
Pictogram voting comment.svg Comment It might also be worth investigating if we could build the raw per-index "page-at-status-X counting" functionality into the ProofreadPage extension and then 1) we wouldn't need any bot and 2) all Wikisourcen can benefit. I do not have a handle of how murderous (or not) that might be on the server side with respect to page render times, so it may be a complete non-starter.
I have opened phab:T281195, but I probably don't have to time to dig into a PHP job any time soon. Inductiveloadtalk/contribs 21:56, 26 April 2021 (UTC)
The Wikisource:Monthly Challenge stats are now (I hope!) being automatically generated by a new script, which was a total duplication of effort that exists in BookwormBot, but seems to be at least functional for now. So I am now neutral - I'm not sure we need BwB any more. Inductiveloadtalk/contribs 22:29, 8 May 2021 (UTC)

Index:Who's who in the Far East, 1906-7, June (IA whoswhoinfareast00hongrich).pdf[edit]

I'd like to use my bot (User:SLiuBot) to import texts of the book from The Integrated Information System on Modern and Contemporary Characters (a database curated by (Q10875101)). I have already uploaded 10 test entries on my user pages (see User:Stevenliuyi/Who's Who in the Far East (June) 1906-7). Since I am new here, any advices/suggestions are very welcome. Several other English biographical dictionaries in that database are also in the public domain, and I also have the plan to import those dictionaries after completing this one, but that will be a separate bot request. Right now I just want to make the first one right. --Stevenliuyi (talk) 04:20, 30 April 2021 (UTC)

I have fixed some mirror issues in the test pages. If there is no objection, I plan to upload all texts in the book. --Stevenliuyi (talk) 18:04, 4 May 2021 (UTC)
I have finished uploading the texts (see Who's Who in the Far East (June) 1906-7). Please let me know if you have any suggestions. --Stevenliuyi (talk) 01:24, 6 May 2021 (UTC)

Repairs (and moves)[edit]

Designated for requests related to the repair of works (and scans of works) presented on Wikisource

The Pickwick Papers[edit]

Move The Pickwick Papers to The Pickwick Papers (Gutenberg) because The Posthumous Papers of the Pickwick Club (Charles Dickens edition) also exists. Languageseeker (talk) 13:50, 17 April 2021 (UTC)

I have moved the work and left a redirect. I am sorry but The Posthumous Papers of the Pickwick Club (Charles Dickens edition) is simply not ready for display as it is basically a pointer to another page set of pages, and a set of pages under a complete unwieldy name. The target is not a suitable target. Display of the works like that as two volumes is in my opinion an efficient display methodology. Just because a work was published in two volumes due to limitations in publishing methodology is no reason for us to present it like that. — billinghurst sDrewth 03:15, 18 April 2021 (UTC)
I have converted the page The Posthumous Papers of the Pickwick Club (Charles Dickens edition) to a redirect as that is pretty much how it should be at this stage. — billinghurst sDrewth 03:20, 18 April 2021 (UTC)

Index:Posthumous papers of the Pickwick Club (Serial Volume 19).pdf[edit]

Can pickwick19_20 0037a and pickwick19_20 0037b be inserted from [here] into File:Posthumous papers of the Pickwick Club (Serial Volume 19).pdf Languageseeker (talk) 13:42, 22 April 2021 (UTC)

Index:Rudimentary Treatise on the Construction of Locks (1853).djvu[edit]

The text layer in this is off by one page. The original DJVU (from before pages being removed) also had this problem although the cover's text layer didn't contain anything. -Einstein95 (talk) 14:49, 10 May 2021 (UTC)

@Einstein95: Yes check.svg Done I had to compress it quite aggressively due to phab:T278104, so please let me know if any pages have become unreadable and I'll try to tweak it. Xover (talk) 06:24, 11 May 2021 (UTC)
@Xover: FYI: After an unreasonable amount of experimentation and being told that >100MB files are basically user error (rest of rant omitted), I have found that bypassing the upload pages altogether, creating a File: page at Commons with the description, and then uploading a file directly into that with Rillke's chunked upload JS works for me.
Also Pywikibot should now be fixed (pending next release) to allow async uploads, and has been flawless for me since. My hypothesis is the PHP library used by IA-Upload has to same issue. Inductiveloadtalk/contribs 07:01, 11 May 2021 (UTC)
There is no way I am buying that this is a client-side issue. The exceptions getting thrown are down in the DB layer. If manually pre-creating a file description page works then that must mean it is the metadata part of the overall file upload process that is triggering it (not, perhaps, surprisingly), but it's still a server-side problem for which this is merely a workaround.
Who told you it was user error? The Pywikibot folks? Chunked upload, which exists specifically to upload >100MB files, is definitely supported (fsvo) functionality. --Xover (talk) 07:09, 11 May 2021 (UTC)
Rant continued on your talk page for avoidance of derailing this thread ^_^. Inductiveloadtalk/contribs 07:46, 11 May 2021 (UTC)

Other discussions[edit]

Policy on substantially empty works[edit]

[This is imported from WS:PD, where it applies to multiple current proposals, and several other works].

We have quite a few cases of works that are "collective" or "encyclopaedic" in that they comprise many standalone articles of individual value, which are basically just "shell pages", with no substantial content of any sort, not even imported scans or Index pages. For example, and this isn't intended to make any statement about these specific works, they're just examples and they may well get some work done soon during their respective WS:PD discussions:

Based on the usual rate of editing for things like that, unless dragged up into a process like WS:PD, they'll remain that way a very, very long time. I think it is perhaps there might be a case to host a mainspace page for this work, even though there is zero, or almost zero actual content. Do we want:

  • Mainspace pages where this is a tiny bit of information like header notes, scan links and maybe detective work on the talk page (not in this case). This provides a place for people to incrementally add content. Also gives "false positive" blue links, since there is actually no "real" content from the work itself, or
  • Do not have a mainspace page until there's some content. Only host this in terms of scan links author/portal scan links, much like we do for something like a novel.

Personally, I lean (gently) towards #2, but with a fairly low bar for how much content is needed. Say, Indexes, basic templates, a title page and one example article. Ideally, a completed TOC if practical, especially for periodical volumes/numbers. It is fair to not wish to transcribe entire volumes of these work, it is fair to not want to import dozens of scans when you only wanted one, it is fair to only want an article or two, but it's not fair, IMO, to expect the first person who wants to add an article to have to do all the groundwork themselves, despite having been lured in with a blue link. That onus feels more like it should be on the person creating the top-level page in the first place.

I do see some value in periodical top pages with decent lists of volumes and scans where known, because these are often tricky and fiddly to compile from Google books/IA/Hathi, so it's not useless work, even if there are no imported scans (though imported is better than not).

We currently have a large handful of collective works listed for deletion right now in various levels of "no real content", and, furthermore, every single periodical that gets added can fall into this situation unless the person who adds, so I think we could have a think about what we really want to see here. Inductiveloadtalk/contribs 15:43, 3 July 2020 (UTC)

  • I believe that, if there is no scan as an Index: page, the main-namespace page should not exist unless it is being actively completed or is already mostly completed. A few pages (of the volume itself) is not very helpful, and is entirely useless if their is no scan given. TE(æ)A,ea. (talk) 15:59, 3 July 2020 (UTC).
  • I think such preparatory information would ideally be on more centralized WikiProject pages (for the broad subject), both for clarity and to assist in keeping different efforts consistent -- but that it certainly should be retained as visible to non-admins. I think that the red vs blue link issue is minor (but not totally negligible) and outweighed by the disadvantages of hiding the history of previous efforts. I strongly encourage redirecting such pages to appropriate WikiProject pages (after copying over the details there). JesseW (talk) 18:11, 3 July 2020 (UTC)
  • @JesseW: I agree that history shouldn't be deleted, but I think we should approach this in terms of what we want to see from these works, rather than what to do with the handful of examples at PD. There are hundreds of periodicals we could have but don't, and this applies to those as well. If we can come to a conclusion about what is and isn't wanted, we can make all the deletion requested works conform to that easily enough. Inductiveloadtalk/contribs 20:55, 3 July 2020 (UTC)
  • I think these pages are necessary to list index pages and external scans of multi-volume works (such as encyclopaedias and periodicals) especially if they are wholly or partly anonymous or have many authors or are simply large. I think it makes no difference whether such pages are in the mainspace, the portal space or the project space (except that it is harder to find pages outside the mainspace). The point is that these works often have so many volumes (often dozens or hundreds) that they must have their own page, and cannot be merged into a larger portal or wikiproject. If the community starts insisting on index pages, what will happen is the rapid upload of a large number of scans for the periodicals that already have their own page. Likewise if the community insists on transclusion. I also think it is reasonable to have a contents page in the mainspace, as it allows transclusion of articles. Most importantly, new restrictions should not immediately apply to existing pages that were created before the introduction of the restrictions. This is necessary to prevent a bottleneck. James500 (talk) 23:55, 3 July 2020 (UTC)
move the works to a maintenance category, and i will work them; delete them and i will not: i find your sword of Damocles demotivating. Slowking4Rama's revenge 01:55, 5 July 2020 (UTC)
@User:Slowking4: I am not proposing a sword of Damocles. I agree that the imposition of deadlines is counter-productive. I do not support the deletion of any of these pages. I would prefer to see them improved. James500 (talk) 04:38, 5 July 2020 (UTC)
TEA is on his usual deletion spree. not a fan. will not be finding scans to save texts, any more. he can do it. Slowking4Rama's revenge 00:15, 6 July 2020 (UTC)
The entire point of moving this here, and not staying at WS:PD is to decouple from the emotions that get stirred up in a deletion discussion. Let's keep deletion out of this. If we come up with some idea of what we do and don't want, then we can go back to WS:PD and decide what to do. I imagine that all that will be needed will be a fairly limited amount of housework to bring those works up to some standard that we can decide on here, and all the collective works there will be easy keeps. Hopefully with some kind of consensus that we can point at to outline a minimum viable product for such works going forward. There are hundreds and thousands of dictionaries, encyclopedias, periodicals and newspapers that we could/will, quite reasonably, have only snippets of. How do we want to present them? What, exactly, is the minimum threshold? Let's head of all those future deletion proposals off at the pass, because deletion proposals often cause friction. Inductiveloadtalk/contribs 00:47, 6 July 2020 (UTC)
and yet deletion is the default method to "motivate" quality improvement. i reject your assertion that "emotions get stirred in a deletion discussion", rather, anger is a valid response to a repeated broken process being kicked down on the volunteers. it is unclear that a minimum threshold is necessary, rather a functional quality improvement process is. until we have one, you should expect to see this periodic stirring of emotions, as the non-leaders act out. Slowking4Rama's revenge 11:53, 9 July 2020 (UTC)
@Slowking4: Thank you for presenting this opinion, and I'm sorry if I have not made myself clear. We do need to figure out how to avoid a de-facto process of using WS:PD as an ill-tempered ad-hoc venue for "forcing" improvements on people who have somehow managed to generate works that are so in need of improvement that another user has nominated them for deletion. Please also consider looking at #Re-purpose_WikiProject_OCR_to_WikiProject_Scans for an idea to have a "functional quality improvement process" to which such works could be referred upon discovery rather than kicking them straight to WS:PD. If you have other ideas or you have previously suggested something similar to address these frustrations, you could detail them there. Personally, I think we should always prefer improvement over deletion. Exactly what the remediation is (refer to a putative WP:Scans, WS:Scriptorium/Help, directly WS:PD as now, or something else) is not what this thread is for. This thread is for discussing, what, if anything, should be the tipping point for deeming a page "lacking" and doing something about, whatever "something" is. I don't think I can be much clearer that this is not about deletion. If we also have a better venue for improvements, then that's even better.
For example, my personal feeling and !vote on A Critical Dictionary of English Literature is "keep and improve", despite it lacking scans or even links to scans, having only one article and no other content, not even a title page: in short, failing almost every criterion suggested so far in this thread. The only thing it does have is have is good text quality of the one entry. I personally do not think this work should be deleted, but I do think it should be improved in specific ways. The first half of that sentence is not the focus of this discussion, the second half is. Inductiveloadtalk/contribs 14:18, 9 July 2020 (UTC)
deletion threat has been an habitual method of communicating by admins since the beginning of the project. and text dumps have been habitual following in the guttenberg example. culture change and process change would be required to change those behaviors. we could may it easier to start scan backed works, but the wishlist was not supported. Slowking4Rama's revenge 21:00, 14 July 2020 (UTC)

I don't think this needs to be much of an issue going forward -- we all agree that it's OK to create Index pages for scans, even if none of the Pages have been transcribed yet; so the only case where this would come up is recording research where no scan has yet been identified as suitable to be uploaded. And for that, I still think a WikiProject page is the right location, not mainspace. (Or, if you must, your userpage.) JesseW (talk) 00:59, 6 July 2020 (UTC) I realized I may not have been clear enough here -- in my view, the ideal process goes like this:

  1. Decide on a work you are interested in (in this case, a periodical/encyclopedic one) -- don't record that anywhere on-wiki (except maybe your user page)
  2. Find and upload (to Commons) a scan of one part/issue/etc of the work.
  3. Create a ProofreadPage-managed page in the Index: namespace for the scan. (You can stop after this point, without worry that your work will later be discarded.)
  4. EITHER
    1. Put further research (on other editions, context, possible wikification, etc.) on that Index_talk page.
    2. Proofread a complete part of the scan (an article from the magazine issue, a chapter from the book, a entry from an encyclopedia, etc.) and transclude it to the mainspace (and create necessary parent pages), and put the further research on the Talk: page of the parent mainspace entry.

If you can't find any scan, and don't want to leave your working notes on your user page, put them on a relevant WikiProject's page.

If you come across such research done by others and misplaced, follow the above process to relocate it to an appropriate place, then redirect the page where you found it to the new location. That's my proposal. JesseW (talk) 01:08, 6 July 2020 (UTC)

@JesseW: It's not clear to me in your above whether when you use the term "index" you refer to a ProofreadPage-managed page in the Index: namespace, or a general wikipage in the main namespace on which an index-like structure (and/or a ToC, or similar) is manually created. Could you clarify? --Xover (talk) 05:14, 6 July 2020 (UTC)
I meant the namespace. Clarified now. JesseW (talk) 05:17, 6 July 2020 (UTC)
  • Hoo-boy. Y'all sure know how to pick the difficult issues…
    My general stance is that: 1) scans and Index: (and Page:) namespace pages have no particular completion criteria to meet to merit inclusion, and can stay in whatever state indefinitely (there may be other reasons to get rid of them, but not this); and 2) the default for mainspace is that only scan-backed complete and finished works that meet a minimum standard for quality should exist there.
    That general stance must be nuanced in two main ways: 1) there must be some kind of grandfather clause for pre-existing pages; and 2) there must exist exceptions for certain kinds of works that meet certain criteria. I won't touch on the grandfather clause here much, except to say I'm generally in favour of making it minimal, maybe something like "No active effort to get rid of older works, but if they're brought to PD for other reasons they're fair game". The design of a grandfather clause for this is a whole separate discussion, and an intelligent one requires analysis of existing pages that would be affected by it. It is always preferable to migrate pages to a modern standard, so a grandfather clause is by definition a second choice option.
    Now, to the meat of the matter: the exceptions…
    We have a clear policy to start from: no excerpts. Works should either be complete as published, or they should not be in mainspace. But quite apart from the historical practices that modify this (which are somewhat subjective and inconsistent, so I'll ignore them for now), there are some fairly obvious cases that suggest a need for more nuance than a simple bright-line rule alone provides. The major ones that come to mind are: 1) massive never-completed projects like EB1911 or the New York Times (EB because it's big; NYT because new PD issues are added every year); 2) compilations or collections of stand-alone works with plausible claim to independent notability.
    For encyclopedias and encyclopedia-like things, we have to accept some subsets due to sheer scale of work. But when that is the grounds for exception, there needs to be some minimum level of completion. I'm not sure I can come up with a specific number of pages/entries or percentage, but it needs to be more than just a single entry (and, obviously, only complete entries). For this kind of exception to apply, I think it needs to be a requirement that the framing structure for it is complete: that is, the mainspace page should give a complete overview of the relevant work even if most of it is redlinks. That includes title pages and other prolegomena when relevant. For a periodical like the NYT, that means complete lists of issues with dates and other such relevant information (e,g. name changes etc.). For preference, these kinds of things should be in Portal: namespace or on a WikiProject page until actually complete, but that will not always be practical (EB1911 and NYT are examples of this). Mainspace or Portal:-space should never contain external links (i.e. to scans) or links to Index: or Page: space (except the implied link of transclusion and the "Source" tab in the MW UI provided by ProofreadPage).
    For exception claimed under independent notability there are a couple of distinct variants.
    Newspaper or magazine articles need to have a certain level of substance in addition to a specific identifiable byline (possibly anonymous or pseudonymous, and possibly identified after the fact by some other source, such as the Letters of Junius) in order to qualify. It is not enough to ipso facto be a newspaper article, a magazine article, a poem, or an encyclopedia entry. On the one hand we have things like dictionaries and thesauri, where an entry could be as little as two words. Or a one-sentence notice without byline in a newspaper. Or two rhymed lines (technically a poem) within a 1000-page scholarly monograph.
    To merit this exception it should be reasonable to argue that the "work" in question should exist as a stand-alone mainspace page (not that we generally want that; but as a test for this exception, it should be reasonable to make such an argument). This would clearly apply to moderately long entries in the EB1911 written by a known author that has their own Wikipedia article. It would apply to short stories or novella-length serialisations in literary magazines by authors that have later become famous (or "are still …"). It would apply to various longer-form journalistic material from identifiable journalists (again, rule of thumb is notable enough for enWP article), including things in magazines that have similar properties. For most periodicals the most relevant atomic (indivisable) part is the issue not the entry or article, but with some commonsense exceptions.
    It would, generally, not apply to things that are works by a single author, like a scholarly monograph that just happens to be arranged in "entries" rather than chapters. It would not apply to things that are essentially lists or tables of data. It would not apply to short entries in something encyclopedia-like or entries that are not by an identifiable author. The OED for example, iirc, is a collective work where entries are by multiple not individually identifiable authors (and each entry is mostly very short too); only the overall editor is usually cited.
    For works claiming this exception too the framing structure should be complete, even if most of it are redlinks. The same general rules about Portal:/WikiProject and no external or Index:-space links apply. An exception would be for periodicals where new issues enter the public domain every year; and we should generally avoid including even redlinks for the non-PD issues here (but may allow them in a WikiProject page). For non-periodical works in multiple volumes where some volumes were published after the PD cutoff, including listings for the non-PD volumes (but not links to scans; those are a copyvio issue) is ok.
    Poems, short stories, and novellas are a special class of works here. A lot of these were first published in a magazine (possibly serialized), and a lot of them exist as multiple editions in substantially the same form. Some exist in multiple versions. These should all primarily exist the same way as chapters as part of their various containing works; but there are some cases where we might want to have, for example, a series of connected pages of the poems of Emily Dickinson. I am significantly ambivalent about this practice, as it amounts to making our own "edition" or "collection" of her poems (in violation of several of our other policies), but I acknowledge that it is an established practice and it is something that has definite value to our readers. It may be that it is actually a practice that should be governed by its own dedicated policy rather be attempted to be handled within these other general policies.
    For the sake of example; applying this to the works Inductiveload listed at the start of this thread would shake out something like this:
    Auction Prices of Books—This work appears to have no sensible subdivisions and is in any case by a single author. I see no obvious reason to grant this work an exception, except under sheer volume of work and even there I would want to see both a substantial proportion completed and some kind of ongoing effort towards completion (no particular time frame, but definitely not infinite and definitely not as an effectively abandoned project). In a deletion discussion I would very likely vote to delete the mainspace pages here (but, as nearly always, to keep the Index: and Page: namespace artifacts). I don't see this as a reasonable candidate for a Portal:, nor really a good fit for a WikiProject (though I probably wouldn't object to a WikiProject if someone really wanted one).
    Central Law Journal/Volume 1—A single volume is too little, so I would want to see a complete structure for the entire Central Law Journal, with level of detail for each volume similar to the one existing volume. Each article in the journal can be individually considered for a stand-alone work exception; but for the collection I would want to see at minimum a full issue finished to justify having the mainspace structure, and preferably multiple issues (in a deletion discussion I might insist on multiple issues). Index: and Page:-space artefacts can, of course, stay. A Portal: might make sense for selections from the journal, of articles that meet the standalone work exception. A WikiProject to coordinate work and track links to scans etc. might be a decent fit here, if someone wanted that. As it currently stands I would probably vote delete for the mainspace artefacts (with option to move whatever content has reuse value to a non-mainspace page for preservation; and undeleting if someone wants to work on something is a low bar).
    A Critical Dictionary of English Literature—The top level mainspace page has near-zero value, existing only to link to the single transcribed entry. For a credible claim to exception to exist it would need to be a complete framework for the work as a whole, and significantly more than a single entry must be complete. I would probably also want to see ongoing work, unless a substantial percentage of the entries were complete. The single finished entry is eligible to claim a standalone work exception, but I think it probably would not meet my bar for that (I might be wrong; and the rest of the community might judge it differently). In a deletion discussion I would probably vote to delete all the mainspace artifacts here (as always keeping Index:/Page: stuff) but with a definite possibility that I might be persuaded on the one completed entry (an absolute requirement for convincing me would be to scan-back it: as a separate issue, my tolerance for grandfathering of non-scan-backed works is small, and effectively zero for new/non-grandfathered works).
    Bradshaw's Monthly Railway Guide—Would need a full framework and a number of individual issues finished to merit a mainspace page. I see no credible subdivisions for a standalone work exception, but might be persuaded otherwise if, say, one of the train tables was used as a (reliable primary) source in a Wikipedia article (implying some sort of notability beyond just being raw data). In a deletion discussion I would probably vote to delete all mainspace artifacts here. If anyone made the argument, I would entertain the notion that there is value in treating train tables like poems, and hosting a series of train tables like we do Dickinson's poems; but that would require a substantial number of them completed.
    For everything above my stance is nuanced by a willingness to accept temporary exceptions for things that are actively being worked: active being operative, but with no particular deadline to complete the work. We have differing amounts of time available, and some works are so labour-intensive or tedious to do, that my person threshold for "active" is a pretty low bar to clear. If it's months and years between every time you dip in and do a bit I might start to get antsy, but days or weeks probably won't faze me. And that the projected time to completion is very long at that pace is not particularly a problem so long as it is not infinite. Within those parameters I would always tend to err on the side of letting contributors just get on with it in peace, regardless of any of the policy-like rules sketched above.
    I also want to emphasise that I think this is a very difficult issue to deal with. There are a lot of competing concerns, and a lot of grey areas that will likely take individual discussions to resolve. My balance point on this issue is partly formed by a broader concern about our overall quality (we have waay too many works of plain sub-par quality, and too many not up to modern standards) and a hope that by preventing the creation of these kinds of works (rather than deleting them after creation) we will be able to retain the good and desirable exceptions without dragging down quality, and without the traumatic and stressful events that deletions and proposed deletion discussions are.
    And for that very reason I am grateful this issue was brought up here for discussion, and I hope we can end up with some clear guidance, possibly in the form of a policy page, going forward. And in any case, since it will create de facto policy, this is a discussion that needs to stay open for a good long while (there are several community members that have not yet commented whose opinion I would wish to hear before closing this), and depending on how well we manage to structure the consensus, may also require a formal vote (up in the #Proposals section). --Xover (talk) 09:03, 6 July 2020 (UTC)
  • Symbol oppose vote.svg Oppose. It is becoming clear that a policy on incomplete works in the mainspace is going to place enormous pressure on individual editors. I think it would be more effective to start a wikiproject devoted to scan-backing works that lack scans and so on. James500 (talk) 12:14, 6 July 2020 (UTC)
    • @James500: FYI, this thread was made in order to provide an exception to the current policy of "no excerpts". A literal reading of the policy as it stands has a plausible chance of coming down delete on the mainspace pages over at WS:PD. This thread is a chance to come up with a better way to support such partial collective works. That we have several substantially incomplete and abandoned collective works lolling around in mainspace is actually the result of laxity in respect to stated policy (not to say I think it's a bad thing). The deletion proposals, whatever you may think of them, are actually not in contradiction to policy. That said, as always, there is scope to adjust policy. Which is what this is.
    • Now, in terms of a WikiProject to scan back works, I think that is a good idea. See #Re-purpose_WikiProject_OCR_to_WikiProject_Scans above, which proposed to reboot Wikiproject OCR as a scan-backing Wikiproject. Inductiveloadtalk/contribs 14:40, 6 July 2020 (UTC)
      • The policy says "When an entire work is available as a djvu file on commons and an Index page is created here, works are considered in process not excerpts." A literal reading of that policy is that no scan-backed work is an excerpt (it is expected to be completed eventually). Further the policy refers to "Random or selected sections of a larger work". A literal reading of that expression is that it does not include lists of scans, or auxilliary content tables, as they are not "sections" (they are not part of the work), and that not every incomplete portion of a work is either "random or selected" (which would not include starting from the beginning and getting as far as you can, with intent to finish later). I could probably argue that an encyclopedia article or periodical article is a complete work. James500 (talk) 15:16, 6 July 2020 (UTC)
  • Nice wall of text, Xover (and I say that with great respect!) -- it generally makes sense and sounds good to me. As another hopefully illustrative example, take The Works of Voltaire, which I've been digging thru lately. I think this would very much satisfy your criteria as a large work, with sufficient scaffolding to justify the mainspace pages that exist for it. I would love to hear others thoughts on that. JesseW (talk) 16:07, 6 July 2020 (UTC)
    @JesseW: Yeah, apologies for the length. Brevity is just not my strong suit.
    The Works of Voltaire probably qualifies on sheer scale of work, yes. I don't think the current wikipage at The Works of Voltaire is quite it though: as it currently stands it is more WikiProject than something that should sit in mainspace (its contents are for Wikisource contributors, to organise our effort, not our readers, who want to read finished transcriptions). It also mixes a work page with a versions page in a confusing way. So I would probably say… Move the current page to Wikisource:WikiProject Voltaire; create a new The Works of Voltaire as a pure versions page, linking to…; The Works of Voltaire (1906), that is set up as a work page with the cover and title (and other relevant front matter) of the first volume, and an AuxTOC (and possibly also the {{Works of Voltaire}} volume navigation template). I don't know how tightly coupled the volumes of this edition are (does the first volume have a common ToC or index of works for all the volumes?), so some flexibility on format may be needed to make sense. But as a base rule of thumb it should start from a regular works page and deviate only as needed to accommodate this work (mainly the size is different).
    In any case… With a volume or two completed (they're only ~350 pages each) I'd be perfectly happy having something like that sitting around. With less then that I'd possibly be a bit more iffy, but it's hard to put any kind of hard limit on that. And with somebody actively working on it I'd be in no hurry whatsoever regardless of current level of completion.
    PS. I'm pretty sure a large proportion of the contents of these volumes are works that would qualify under "standalone works" that could exist independently in mainspace, regardless of what's done with the The Works of Voltaire page. Even his individual poems and essays can presumably make a credible claim here (because it's Voltaire; less famous authors would have a higher bar). Better as part of the edition, but also acceptable on their own. --Xover (talk) 16:56, 6 July 2020 (UTC)
  • @JesseW: I personally take no issue with this page's existence (actually I think it's a nice work and good way to allow an important author's works to be slotted in piece-by-piece. I have some general comments which overlap with this thread (written before Xover's reply, so pardon overlap):
    • First off, I differ with Xover in terms of the scan links: I think they're better than nothing, and I don't see much value in duplicating the volume list onto an auxiliary page just to add scan links. However, I can sympathise with the sentiment that our mainspace shouldn't direct users off-wiki (or at least off-WMF). But if we don't have the scans, and that's what the user wants, they're leaving anyway. Real answer: import moar scans!
    • No scan links are necessary where the volume exists in mainspace and is scan-backed (e.g. v3)
    • Ext scan links should only be used when there is no Index page or imported scan. Use {{small scan link}} or {{Commons link}} when possible (e.g. v2)
    • The first volume list could probably be in an AuxTOC to mark it out as WS-generated content.
    • The "Other editions" section belongs on an auxiliary namespace page (Talk, Portal or Wikisource). I suggest the Talk page is best in this case. Inductiveloadtalk/contribs 17:35, 6 July 2020 (UTC)
  • @Xover: I am in agreement with the majority of what you say. Particularly, I think a framework around any collective work (be it a single-volume biographical dictionary or a 400-issue literary review spanning 80 years) is the critical prerequisite, plus at least some scans, the more the merrier. Where I think I differ:
    • I am inclined to be a bit more relaxed in terms of how much of a work we need. As long as a single article exists, it's not "trivial" (e.g. only a short advert or some incidental text like a "note to correspondents", as opposed to an actual article), it's well-formatted and scan-backed, and a complete framework exists, including front matter and a TOC, such that's it is easy for anyone to slot in new pieces, I'd be fairly happy. Lots of periodicals have all sort of tricky bits like tables of stocks or weather tables and writing into policy that those must be proofread in order to get the "real" articles into mainspace would be a chilling effect, in my opinion. If you allowed an exception, it would be verbose and tricky to capture the spirit without saying "unless, like, it's totally, like, hard, man".
    • I am not dead against scan links in the mainspace at the top level, when such a top-level page exists. See my comments on Voltaire above. I am against them where they could sensibly be on an Author page and they are the only mainspace content.
    • I am ambivalent on the presence of, e.g., disjointed train timetables. It's not my thing to have a smattering of random timetables, but as long as they're individually presented nicely, it's not too offensive to my sensibilities. I might question the sanity of someone who loves doing tables that much, but whatever floats the boats! Also, I think that this might circle back to "good for export" - a mark which certainly would require completed issues or volumes. If you want to get that box ticked, you have to do it all.
    • Re the "notability" aspect of individual articles, I'm not really bothered by that, as I don't think we'll see a flood of total dross because few people really want to take the time to transcribe 1867 articles about cats in a tree from the Nowhere, Arizona Daily Reporter, and, actually I think some of the "dross" can be quite interesting in a slice-of-life kind of a way (always assuming well-formed and scan-backed). And the real dross is usually so bad (no scans, raw OCR, etc) that it can be dealt with outside of this topic. I think part of the value of WS is the tiny, weird and wonderful, not just in blockbusters like War and Peace and Pultizers. I think I might like to see more of our articles strung together thematically via Portals, but that's another day's issue. Inductiveloadtalk/contribs 17:35, 6 July 2020 (UTC)
      • @Inductiveload: We appear to be mostly in agreement. But… instead of me dropping another wall of text on the remaining points of disagreement, maybe that means we're in a position to try to hash out a draft guidance / policy type page with the rough framework? Then we could go at the remaining issues point by point. Because I think I'm in with a decent chance to persuade you to my point of view on at least some of them, but this thread is fast getting unwieldy (mostly my fault). It would also probably be easier for the community to relate to now, and much easier to lean on in the future. --Xover (talk) 18:31, 6 July 2020 (UTC)
        • @Xover: If there are no more comments forthcoming after a couple of days, I think that makes sense. I don't want to railroad it: considering we have at least one !vote for "do nothing", I'd like to see if there are any other substantially different opinions floating about. Inductiveloadtalk/contribs 17:41, 7 July 2020 (UTC)

The quantity of text here has grown far faster than my ability to absorb it, so rather than continue to put it off, here's my position: I don't see any problem with transcriptions that are scan-backed, even if the transcription only covers a small fraction of the entire scan. If Sally chooses (say) to transcribe a favorite story, that happened to be published in an issue of Harper's back in the 1890s, and goes to the trouble of uploading the full issue, but only creates pages for the one story that interests her, I think that's great. It doesn't matter to me whether she intends to work on the other pages or not. If it's not scan-backed, but it's fairly high quality, I am personally willing to do some work trying to locate a scan and match it up to the text; I'd rather we take that approach, than deletion, though of course deletion is the better option in some cases where the scan is very hard to come by.

If all this has been said above, or if I've misunderstood the topic, my apologies. Please take this comment or leave it, as appropriate. -Pete (talk) 02:00, 8 July 2020 (UTC)

Apologies, I see I had missed the point.

I disagree with Xover's statement that a top-level page for a publication, with a link only to a single article within the publication, has "near-zero value." Such a page can serve an important function linking content together in ways that help the reader (and search engines) find the content they're looking for, or understand the context around it. For instance, A Critical Dictionary of English Literature is linked from the relevant Wikidata entry. The banner on the Wikisource page clearly tells a Wikisource reader that they won't find a full transcription here; and with a simple edit, it could link to a full scan on another site, or (with perhaps a little more effort) even transcription links here on Wikisource. This page has been here since 2010; we don't have any way of knowing what links might have been created elsewhere in the intervening decade. (I do think that new pages like this should not be created without a scan at Commons to be linked to.) -Pete (talk) 02:12, 8 July 2020 (UTC)

I'm really bad with walls of text, so I have only read a tiny portion of the above discussion. But I want to mention a couple of things that I think are worth considering in this discussion.
  • Most of the time, a mainspace "work" that is only a table of contents, but which has none of the actual content, and is not actively being worked on, can be (and should be) deleted as No meaningful content or history under our deletion policy.
  • A mainspace work that has only a little bit of content, but that content is a work unto itself within the scope of Wikisourse, should be kept. Most periodicals are like this. For an example, see the Journal of English and Germanic Philology which only has one hosted article, but that hosted article is scan-backed and firmly within scope.
  • On some occasions, empty mainspace works do have value. I ended up creating the page The Roman Breviary, depsite containing no actual content, mostly because there are a lot of works that link to it, using many different titles, and if someone uploaded a copy of the work under one title then many of the links would remain red because they point to different titles of the work. This could be easily solved by creating redirects to a simple placeholder page, so I did. I tried to make the placeholder page as useful as a placeholder page can be, as it contains useful information about the history and authorship of the work, and links to the Index pages where the transcription will take place.

Anyway those are my 2 cents, sorry if they are redundant —Beleg Tâl (talk) 00:40, 29 July 2020 (UTC)

Proposal[edit]

Since there has been no extra input for a month, and not wanting this section to get archived without at least attempting a proposal, I have started a proposal #Collective work inclusion criteria above. Inductiveloadtalk/contribs 11:00, 25 August 2020 (UTC)

Since the proposal has now slipped off the main page (to here), with vague support for the first part (collective work inclusion criteria) and a fairly consistent opposition to the second (no-content pages), my plan is to transfer the first part, as guidelines rather than policy, to Wikisource:Periodical guidelines. As non-binding guidelines, they can then be worked on further in situ. Sound OK? Inductiveloadtalk/contribs 08:10, 16 April 2021 (UTC)
The example given in Wikisource:Periodical guidelines might be improved, PSM is and was an exercise that has gone its own way (no offense to @Ineuw:, this is a site under development and that is only one example).CYGNIS INSIGNIS 13:05, 17 April 2021 (UTC)
@Cygnis insignis: You would be wrong to think that I am offended. Remember that when I started, I knew everything. By now, so much of that knowledge is lost that I am happy to listen. Would you elaborate please? — Ineuw (talk) 19:50, 17 April 2021 (UTC)

I've created Bradshaw's Monthly Railway and Steam Navigation Guide (XVI) - it couldn't be done on one page, due to the very high number of template transclusions. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:52, 1 September 2020 (UTC)

None ASCII characters in article titles[edit]

the Dictionary of National Biography uses date ranges in disambiguation extensions to article titles. There was pressure by some Wikipedia editors to change from dash to ndash to fit in with the Wikipedia policy on this issue. Wikipedia handles the issue by having ndash in the article title and a redirect with a dash.

The use of dashes was justified for Wikisource because it simplifies the URL and that no redirects were necessary.

I am currently working on Wikipedia linking article to Wikisource articles in the Royal Naval Biography I have just come across these two articles:

The are no ASCII redirects

So should the articles to be moved, or should there be redirects, or are the names fine as they are and there is no need for redirects? -- PBS (talk) 14:59, 25 March 2021 (UTC)

@PBS: They should be moved. For punctuation like that (and dashes, quote marks, etc.) we use plain ascii in page names (nb. in page names: article titles are a different story). --Xover (talk) 15:24, 25 March 2021 (UTC)
I am using the term "article titles" to be the part of the URL that is used to make the URL unique. I think you are using the term page name to mean the same thing. -- PBS (talk) 15:29, 25 March 2021 (UTC)
Oh, yes, sorry; I should have been clearer. I just meant it as an aside in case there was confusion, but assumed you knew that so didn't want to over-explain. I should have just left it out. But, yes, the page name, which is what ends up in the URL, uses ascii punctuation. I'm just hedging around "article" since mainspace wikipages on Wikisource can contain zero or more "articles" (with or without non-ascii titles)—from a newspaper, for example—unlike Wikipedia where a mainspace wikipage by definition is "an article". --Xover (talk) 15:56, 25 March 2021 (UTC)

<-- There could be a lot of these how do I go about requesting a bot to run through the titles? -- PBS (talk) 12:24, 27 March 2021 (UTC)

Make a request at WS:BOTR. Beeswaxcandle (talk) 17:32, 27 March 2021 (UTC)

Small note: the turned comma in M‘Farland represents a superscript c (McFarland), not an apostrophe. So "McFarland" would be the better ASCII equivalent, perhaps with "M'Farland" as another redirect. (And Mc is just an abbreviation for Mac, as Geo is for George, but that's a separate discussion.) Pelagic (talk) 23:04, 17 April 2021 (UTC)

Universal Code of Conduct – 2021 consultations[edit]

Universal Code of Conduct Phase 2[edit]

The Universal Code of Conduct (UCoC) provides a universal baseline of acceptable behavior for the entire Wikimedia movement and all its projects. The project is currently in Phase 2, outlining clear enforcement pathways. You can read more about the whole project on its project page.

Drafting Committee: Call for applications[edit]

The Wikimedia Foundation is recruiting volunteers to join a committee to draft how to make the code enforceable. Volunteers on the committee will commit between 2 and 6 hours per week from late April through July and again in October and November. It is important that the committee be diverse and inclusive, and have a range of experiences, including both experienced users and newcomers, and those who have received or responded to, as well as those who have been falsely accused of harassment.

To apply and learn more about the process, see Universal Code of Conduct/Drafting committee.

2021 community consultations: Notice and call for volunteers / translators[edit]

From 5 April – 5 May 2021 there will be conversations on many Wikimedia projects about how to enforce the UCoC. We are looking for volunteers to translate key material, as well as to help host consultations on their own languages or projects using suggested key questions. If you are interested in volunteering for either of these roles, please contact us in whatever language you are most comfortable.

To learn more about this work and other conversations taking place, see Universal Code of Conduct/2021 consultations.

-- Xeno (WMF) (talk)

20:45, 5 April 2021 (UTC)

Invitation to m:Talk:Universal Code of Conduct/2021 consultations/Discussion[edit]

I am interested in hearing the input of Wikisource users about the application of the Universal Code of Conduct, especially from the perspective of interactions on Wikisource. Xeno (WMF) (talk) 23:56, 17 April 2021 (UTC)

Should mdash be surrounded by space or not?[edit]

Always used mdash without surrounding spaces which I believe is/was the community standard. However, the "Clean up OCR" script surrounds it with spaces as it wraps the paragraph lines. For me, line wrapping is the last step in proofreading and final spell checking.

As shown in the above comparison, I have my own AutoHotkey text standardization and line wrapping script, which is identical in every respect to "Clean up OCR", but I can modify it to whatever the current community requirement is.

I needed an additional an AutoHotkey script which does a partial cleanup of the text without line wrap, to help me identify the OCR errors between mdash and the hyphen, by surrounding the mdash with a space, and I left it so in final format. But some editors noted this and that is why this post. Which should it be?— Ineuw (talk) 23:53, 11 April 2021 (UTC)

We replicate the work. In the works that I have seen, typically not. I have no idea what the grammaticians and the editors of the world have a rule to apply at WP. We also have not typically done the half spaces, and all the other unicode that can apply. — billinghurst sDrewth 23:58, 11 April 2021 (UTC)
If there is a space, make sure that it's a non-breaking space with &nbsp;. —Justin (koavf)TCM 02:18, 12 April 2021 (UTC)
IIRC, em dashes will always linebreak even with &nbsp;? They break when set close-up against letters or other punctuation. Pelagic (talk) 01:28, 18 April 2021 (UTC)
Not. CYGNIS INSIGNIS 04:26, 12 April 2021 (UTC)
The "rule of thumb" usually offered in the past is that there should not be spaces around em-dashes. Where spacing appears around such dashes in older works, it is usually half-spacing that flanks the em-dash rather than full spaces. Wikisource has opted not to use half-spacing, and so in most cases we collapse half-spacing around em-dashes. However, there are works where there is clearly a full space or even a double-space next to one (or both) sides of an em-dash, such as at the end of a line of dialogue, so there isn't a simple rule that would apply in every situation. Hence a "rule of thumb", although the Wikisource:Style guide (s.n. Formatting, 7. Punctuation) explicitly advocates for "Whichever dash is used, it should not be flanked by spaces." (emphasis mine) This statement applies in the majority of situations. --EncycloPetey (talk) 23:21, 12 April 2021 (UTC)

Accessibility on this site[edit]

I have recently added some accessibility features to tables such as table captions and was encouraged to post about it. These are required by WCAG best practices and provide a very high impact on making the site useful for the blind with very low effort. Note that I have also ported over {{sronly}} so these won't display, so there are no concerns about styling. Is there any good reason to not include table captions on data tables? Should we implement best practices as decided by Web authorities and accessibility advocates or is there some reason why we shouldn't? —Justin (koavf)TCM 02:12, 12 April 2021 (UTC)

I asked you to support your assertion that table captions are required, and you failed to do so. I have also pointed out that the place where you applied them is not a data table, but is applied purely for layout and is temporary for the benefit of proofreaders. The so-called data are copyright statuses for the works we cannot yet import and links to Index pages of works that we have. As works on the list are validated, the so-called "data" is removed and will eventually disappear altogether, hopefully within the next two years. I also asked several times for you to start a discussion, and am glad to see that you have now done so. --EncycloPetey (talk) 03:02, 12 April 2021 (UTC)
Excuse me, please they are data tables. What is it you think that a data table is? I have provided citations for using captions as best practice for data tables. Please do not keep on asserting that data tables are for layout when they are not. I also don't think that the implication that information should only be accessible if it's going to last an indefinite amount of time withstands even the mildest scrutiny. Just because the Sun is going to swallow the Earth in a few billion years, that doesn't justify us not using best accessibility practices. My point is relevant to all data tables, not only the ones that you think someone will remove at some point. Rather than make this discussion about a single table that you keep on misidentifying as not being a data table, I am asking a broad question that refers to a site-wide culture of using best practices for persons with disabilities. Are you in favor of doing that or are you opposed to it?Justin (koavf)TCM 03:33, 12 April 2021 (UTC)
Let’s try to keep this conversation civil. Accessibility is an important issue and should be a priority. Maybe a detailed proposal would be a good start. Proofreading is an important task that recognizes differences in ability and helps to make texts more accessible. However, it depends on certain visual abilities. There’s an need for a discussion of what areas need accessibility features and what kind. Languageseeker (talk) 05:41, 12 April 2021 (UTC)
I propose that data tables have captions. —Justin (koavf)TCM 06:05, 12 April 2021 (UTC)

UPictogram voting comment.svg Comment Use of term "data tables" is ambiguous, please give specific examples. Please consider using WS:Sandbox for some permanent exaples. Thanks. — billinghurst sDrewth 14:01, 12 April 2021 (UTC)

"Data tables" are distinguished from "layout tables" by the former showing the sorts of things that actually belong in a table (e.g. Nations in a certain Olympics and how many medals they won or vendors who owe your company an invoice per month) and the latter is the misuse of a table to provide the layout of elements on a webpage instead of using CSS. Here is a data table of school lunches. Here is a fictional budget and items sold in a data table. All data tables should have captions and semantics for columns and rows. —Justin (koavf)TCM 21:15, 12 April 2021 (UTC)
We reproduce what was published. Are you advocating that we generate captions and add semantics that are not in the published works? — billinghurst sDrewth 06:13, 13 April 2021 (UTC)
To be fair, correct semantics like scope=row/column on header cells actually probably should be used wherever appropriate. Not all tables have a caption in the original, but technically speaking, when they do, we also should be using the |+ syntax rather than a direct styled thing like {{center|Caption}} (per-work CSS can help to target those captions for auto-styling anyway). Perhaps when they don't, a {{sronly}}-type affair could be the table equivalent of an image alt attribute (something else we should really be doing for accessibility. We should also (technically) use {{lang}} whenever encountering non-English text.
For fully-correct semantic table markup, we are a bit hamstrung by the ongoing failure of Mediawiki to provide <col>/<colgroup> elements, as well as the "direct" nature of our data (semantic markup is easier if you're a site generating a table from a database of numbers - you can just change the server to generate the table as such).
I think while there is a lot of work that can be done to improve accessibility, if we're going to get anywhere it needs to be a more structured effort, perhaps in concert with https://phabricator.wikimedia.org/tag/accessibility/ (maybe a column in https://phabricator.wikimedia.org/tag/wikisource/?) than just ad-hoc threads at WS:S with no clear end state or process to get there.
The biggest problem I have with accessibility is that it is, in general, extremely hard for any "abled" editor to know if they have just produced something accessible, or almost-entirely inaccessible - it looks the same to them. Screen-reader software is rare, finicky about platforms and can be expensive. I have (after non-inconsiderable effort) managed to get Orca working, but it's very laborious to check what is written comes out OK. It would be good if someone knew of a service that could directly translate hunks of HTML into "screenreader text" to see how what we currently have comes out. Perhaps some kind of interactive tool could be built around it. User:Koavf: any ideas? Also, if you are serious about improving accessibility, writing documentation about what is and is not good practice will help others (including myself) understand how we can improve. Inductiveloadtalk/contribs 13:10, 13 April 2021 (UTC)
we did do some meetups with DCPL braille library with editing on screen readers (adding alt text to images), and at Gallaudet, but it would take some grant resources to do some UX. but in the mean time, maybe an accessibility user group on meta would be a start. Slowking4Farmbrough's revenge 22:08, 13 April 2021 (UTC)
Barring Foundation-wide standards and lacking any particular local ones, would you be in favor of adopting en.wp's as a rule of thumb? —Justin (koavf)TCM 02:59, 14 April 2021 (UTC)
I have no idea of enWPs, and anyway they generally are free-creating tables rather than replicating a work. I also know that (some of) our tables can be very busy and complex and the idea of adding further complications and complexity does not enchant me. I would want something easy and reproducible that does not pollute the work, can be done in wikitext, and not make me have to work at undoing any system imposed formatting. So at this stage I see that we are needing some developed guidance and priority in what can be done to improve the readability of a work. I would think that for a while we would look to voluntary compliance and encourage the WMF to consider it s part of their development. — billinghurst sDrewth 14:35, 14 April 2021 (UTC)
@Billinghurst: All you do is add "+Caption" at the beginning. It is done in wikitext and couldn't be much easier. —Justin (koavf)TCM 12:13, 24 April 2021 (UTC)
  • As proposed, this seems apt only to bludgeon other contributors with one's own preference. Accessibility is a large and complex issue that deserves a wider perspective than this. --Xover (talk) 06:18, 14 April 2021 (UTC)
    • The wider perspective is WCAG, the international standard for web accessibility. What Koavf says about table captions is correct. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:52, 16 April 2021 (UTC)
    • @Xover: "this seems apt only to bludgeon other contributors with one's own preference" What? How is this any different from alt text? Why shouldn't we have accessibility on this site? —Justin (koavf)TCM 12:13, 24 April 2021 (UTC)

Line numbering coming soon to all wikis[edit]

-- Johanna Strodt (WMDE) 15:08, 12 April 2021 (UTC)

Questions: (for those who understand the techno-speak) Will line numbering apply in all namespaces? Will it appear in the proofreading window of the Page namespace? And can it be deactivated by an editor if they find it distracting? --EncycloPetey (talk) 23:10, 12 April 2021 (UTC)
It was said "... you can enable line numbering ...". I would suggest that you ask over on the extension talk page whether it was tested ad will function in the Page: namespace. — billinghurst sDrewth 06:07, 13 April 2021 (UTC)
@EncycloPetey, I think this only appears if/when you have the colored syntax highlighting turned on. Do you normally use that? If not, then you shouldn't see it. Whatamidoing (WMF) (talk) 05:30, 21 April 2021 (UTC)

Importing the Shakespeare Quarto Archive[edit]

So, the Shakespeare Quarto just got taken down this week because of the end of flash. However, the texts and images are available on [1] is there any easy way to import the texts and files of the proofread quartos into Wikisource? The encoding scheme is here [2]. Rescuing all 32 quatros from the digital oblivion seems like a worthy project. Languageseeker (talk) 04:16, 17 April 2021 (UTC)

IA-upload offline (error 503)[edit]

@Samwilson: If you are around, would you be able to have a look at IA-upload, it is telling me that the service is not available. Thanks if you can. — billinghurst sDrewth 12:55, 17 April 2021 (UTC)

@Billinghurst: Sorry about this! I've restarted the web service and it's back online now. It was down for 23 hours and 23 minutes. The error log was being filled with URL rewriting debug output (I've turned that off now) and I couldn't see at a glance what went wrong. Will keep an eye on it. The uptime log is here: https://stats.uptimerobot.com/BN16RUOP5/782616657Sam Wilson 00:39, 18 April 2021 (UTC)
@Samwilson: Could you investigate the situation, the IA upload has been mostly down for the past few days. Languageseeker (talk) 02:00, 20 April 2021 (UTC)

The Tragicall Historie of Hamlet Prince of Denmarke[edit]

What exactly is this? --EncycloPetey (talk) 00:47, 18 April 2021 (UTC)

The first Quarto of Hamlet from the defunct Shakespeare Quarto Archive. Languageseeker (talk) 00:49, 18 April 2021 (UTC)
Then why are there two different redlinks with bad syntax and a transcription project for one of the redlinks? Is this the First Quarto of Hamlet or a mishmash? --EncycloPetey (talk) 00:52, 18 April 2021 (UTC)
There are two copies in the world. The shelfmark identifies the copy. There is one Index because Commons is not playing nice with Pattypan. Languageseeker (talk)
@Languageseeker, @EncycloPetey: it is out of scope and unnecessary. We don't have either work, and we don't create work pages like that in main namespace. — billinghurst sDrewth 00:58, 18 April 2021 (UTC)
Um, there was a transcription project. Languageseeker (talk) 01:00, 18 April 2021 (UTC)
(ec) I have moved it to user namespace. There is a whole section above about creating ugly stick pages. I keep hoping that I never see {{ext scan link}} in main namespace again. So very tempted to make it so that template does not display in main ns. — billinghurst sDrewth 01:03, 18 April 2021 (UTC)
I moved the links to the main version page of Hamlet and created a proposal to make them in scope. Languageseeker (talk) 02:13, 18 April 2021 (UTC)
And I have taken them out. Get your proposal through first, red links typically not there unless they are appear somewhere else onsite — billinghurst sDrewth 02:53, 18 April 2021 (UTC)

Pictogram voting comment.svg Comment I have made the above link as a redirect. We would never have that as a versions page, as we only ever have one versions page for a work. If/when there is a completed transcription, they will appear on Hamlet (Shakespeare) as per all other versions. — billinghurst sDrewth 02:56, 18 April 2021 (UTC)

And how are they supposed to be finished if the scan link cannot be posted? Languageseeker (talk) 03:31, 18 April 2021 (UTC)
It can be posted, but not in the mainspace. This template belongs in the Author: and Portal: namespaces. Beeswaxcandle (talk) 05:47, 18 April 2021 (UTC)
@Languageseeker: What makes you think that you are the only person with works that are unfinished and unlinked from the main namespace? These works are no different from any other work, so please use the existing system. If you want to set up a project please look at Wikisource:Wikiprojects just like the rest of us have had to do. — billinghurst sDrewth 06:07, 18 April 2021 (UTC)
@Beeswaxcandle: There is nothing in any of the documentation or pages about style and format that states the template belongs in the Author: and Portal: namespaces only. Nor in fact is there any documentation about its use in any particular namespace(s). But—supposing for the moment the link must be placed in the Author ns—where on Author:William Shakespeare (1564-1616) should the small scan link to the First Quarto of Hamlet be placed? --EncycloPetey (talk) 16:16, 18 April 2021 (UTC)
Templates are typically required for the purpose of the page, and the addition of this external link template to a "versions" page is changing the purpose of the page then to disambiguate our versions. The typical use of the template is to point to things from our curated author and portal namespaces. There may be a special case for its use in main ns, but it is not on version pages. So it is a false argument to say that it doesn't say it cannot be used in main ns, it is the use that you are undertaking. — billinghurst sDrewth 00:39, 19 April 2021 (UTC)
So maybe the argument that you would like to develop is how does its use on a "versions" page fits with what we are doing, rather try and argue that the template doesn't have a rule that it cannot be used in another namespace. Context is far more important. — billinghurst sDrewth 00:43, 19 April 2021 (UTC)
[ec] But the Versions page come into being when content from the Author namespace moves into the Main ns as a Versions page. That is, when we have only a single edition, that edition is listed on the Author page. when there are multiple editions to manage, that content moves from the Author namespace to the Main namespace. It remains the same content; only its location has changes. Why then would a template apply to that content when it occurs in one location but not in the other?
When I created all the disambiguation and versions pages for the plays of Shakespeare, I advertized that process here in the Scriptorium and asked for feedback on what I was doing. At no point did anyone raise objections about the use of the {{small scan link}} and {{ext scan link}}. I also point out that, prior to the creation of those pages, we had only copy-paste Gutenberg texts for most of Shakespeare's plays. Once the versions pages went up, we had (at least) eight people join in with the transcription of various editions of Shakespeare, and they found those editions because there were links to the Index pages of the scans on the Versions pages. There were Index pages, and some of them had languished for years, but making those pages known led to the completion of several texts and the cleanup of several others.
I still do not understand why there is such strong resistance to the use of {{small scan link}} next to the listed works they contain. They allow editors to see that a transcription has been started and find it. We repeatedly have cases where an unlisted scan is finished, only to find it is a duplicate of an existing work, and the person who spent all that time is usually (and rightly) furious that no one bothered to coordinate listings. Removing {{small scan link}} from versions pages will exacerbate rather than ameliorate that issue. And as I have pointed out, such listing have directly led to the completion of long-languishing works.
If the matter of the listings is the same, why does it matter which namespace is located in? If we had the Versions pages in a separate "Work:" namespace like the Italian Wikisource has done with "Opera:" would that make it OK to use the template? If not, then why not? If so, then the argument against the use of {{small scan link}} in the Main ns is circular. --EncycloPetey (talk) 00:54, 19 April 2021 (UTC)
So this comes back to the basic questions
  • What is a versions page, our versions, some versions or all versions?
  • When should a versions page exist?
  • When would we make exceptions to either of the prior two questions?
At this stage we are appear outside the scope of Wikisource:Versions and as I attempted to address in the next section, we have got there by creep, not through clarity of our scope. If we are going to morph, then we need the central directional pages to move with them. We also need to be clear and explicit to the broader community that scope is being expanded, not implicit or having to guess. — billinghurst sDrewth 03:43, 19 April 2021 (UTC)

Creep of complexity and business of disambiguation pages, especially version pages, and other cruft into main namespace[edit]

Over time our simple listing disambiguation pages of works onwiki have been creeping to more and more complexity. They are becoming pseudo-stub pages and getting listings off-site, and pages of less relevance, definitely not transcluded works. We are getting into more and more disagreements about the purposes of these pages. And having to have the argument about what is appropriate on these pages is sucking up good time and simply becoming tiring repeated argument. We need to resolve the issue at a higher level so we don't have to have this continuing bickering of the detail.

Templates in use on the pages include

The past couple of months we are now getting people talking about future works and adding redlinks to works that we don't have, may never have. These pages don't rely on future promises, they should be used for works on site now, and possibly where we have had to disambiguate for them, eg. in a journal that we hold the scans.

Links of pertinence

In my opinion, we are not the encyclopaedia, we shouldn't be having discussions about all the versions of a work, or encyclopaedic argument about the differences or their histories in our main namespace. All that belongs at English Wikipedia, or what we have designated our curated spaces and we have author and portal namespaces for those sorts of pages. For many years we have been trying to tidy up the cruft, and keep our main namespace to be for presented work; it was to be our quality space, just for works. I feel we are drifting to cruft, so what type of product are we trying to put into our namespace? Otherwise what is the purpose of the main namespace? Up until now it has not been designated a find aid for things but works we have. — billinghurst sDrewth 06:03, 18 April 2021 (UTC)

I think that a lot of this creep has to do with a few factors. First, the difficulty of adding a text. Users find a scan on IA, but become too challenged by uploading the file to Commons and then decide to an external scan link to preserve their work. Uploading to Commons should be performed by a bot to clear that backlog. We can even gather all the IA ids and ask Fæ to upload them. Second, this site lacks a clear direction in creating a central corpus of key works in English. Instead, users decide to work on whatever strikes their fancy. By contrast, French Wikisource has the Mission 7,500 that provides a set of works to work on every month. This has resulted in French Wikisource having transcribed texts. Because English Wikisource does not have a core set of key texts, there are more version pages with no scans or texts or indexes. Version are a problem of important texts and not of minor texts.
For important texts, version pages can serve a critical function by designating which texts should be on Wikisource. Researching the publishing history of a book takes time and access to certain sources. In general, I believe that Wikisource should only include version of texts that the author contributed to, those that have important scholarly value (critical editions), or with new illustrations. A version page can serve as a space to detail these editions.
For now, the IA does not have scans of every book ever published. Therefore, there is work to be done by future generations. The version pages can serve as key guides for the future. Research today, proofread tomorrow.
As for the presence of small small scan link, they allow the user to see how they can contribute to this project. This is a collective proofreading platform. It makes no sense to hide the proofreading from users. Nor does it make sense to ask an individual user to proofread an entire work by themselves and then post it to mainspace. Transcription projects belong in mainspace.
I would recommend the following changes to Wikisource.
  1. Only allow scan backed new texts unless the user provides a compelling reason why that is not possible.
  2. Restructure the POTM to follow the French model of presenting several texts and focus on building core texts.
  3. Allow for version pages when a text has a complicated publishing history and more than three editions.

Languageseeker (talk) 04:23, 19 April 2021 (UTC)

Pictogram voting comment.svg Comment This issue is presented as something that has happened "The past couple of months", but the issues have been around as long as Wikisource has. Anyone working on large projects such as the 1911 EB knows that we have had long-standing redlinks in the Main namespace for years. With regard to versions and disambiguation pages, we had a case where the community opted in favor of red linked titles in 2015. Calling all of this "cruft" and appealing to abstract philosophy is not a practical approach. Saying that "it has not been designated a find aid for things but works we have" misses entirely the fact that disambiguation pages are find aids for locating other things; they are not themselves works, but are dynamic and changing entities by their very nature. The same is true of versions pages; these pages will and do change. Hiding works and their progress from listing on the basis of some abstract philosophy will not get the work of Wikisource done. --EncycloPetey (talk) 15:33, 19 April 2021 (UTC)
That is a different story with a different argument.
  • We all have texts that we want done, and our means for having these seen and listed has been through the the Author and Portal namespaces. This selective means of putting additional editions as available is outside how we have been doing them and outside of our existing guidance per Wikisource:Versions.
  • Redlinks that I mentioned were specifically related to versions/disambig, not the general case. On version pages they would appear to be outside of our guidance at Wikisource:Red link guidelines#Main namespace, and not at this time our guidance on the Versions either. Citing EB1911 is interesting though not relevant to the case.
  • To cite the 2015 example as a community consensus should be seen as a misrepresentation that the community undertook a conversation to change our policy or our guidance, and it should not make it incapable of review and change. Or it could be that this is an example of some exceptions to the rules, rather than the new normal.
  • I am aware that matters evolve, though it should be through open conversation and agreement with the expectation that our guidance will do so too. It is also reviewable, which is what I am doing here. However, the basics are that these are a variant disambiguation pages. There is a basic concept behind these types of pages.
If we are moving "FINDING AIDS" into the main namespace? Is it only going to be version pages? What makes them special? Why would they include links to off-site? What is the equity? What is the benefit? How does it work in the holistic sense? Should we start version pages for every work that has multiple editions? How will it work for biblical works? It is a can of worms and once you open it without thinking about the broader consequences rather than your own particular desires. It is why we have the guidance and why we achieve consensus for these changes and their scope. — billinghurst sDrewth 00:25, 20 April 2021 (UTC)
As I said, I don't think that every work needs a version page. However, having a version page can help identify the specific subset of a work that should be on Wikisource. To give an example, we don't want every copy of Hamlet ever printed on this site. There are only a few specific editions that should be in scope. Posting an edition on a version page can be seen as an invitation to comment on the in-scope nature of the work prior to a user sinking a significant amount of work into it. I've seen many mere reprints of 18th century works that I've replaced with links to scans of original editions.
Second, by providing what you term a "finding aid," Wikisource can prevent users from adding editions that this site would not want to host. For example, A Child's History of England (1900) is the only scan backed copy of this text. However, this edition is a mere reprint with no unique illustrations or input from the author. It's a random edition pulled from the IA because the PG text was based on it. It basically has no value. As soon as another copy of this work appears, I would submit it for deletion.
Third, without knowing about other editions, it makes it more difficult to properly name a text leading to future work. For example, image that the first version of Hamlet posted on this site was Q1. Would be call this Hamlet. If so, then in the future, an administrator would need to move it Hamlet (Quarto 1). If we had a version page, we would first call it Hamlet (Quarto 1). Languageseeker (talk) 02:24, 20 April 2021 (UTC)
I think you're too prescriptive on versions. I will not support the deletion of any scan backed version, because it's not the "right" version. Every printed edition is in scope. Certainly, with Hamlet, every different bowdlerization and every different abridgment have an interest to someone. We don't generally proscribe the best of anything; we include what people want to add, provided it has been printed. I encourage people spending their time on good versions, but I don't support proscribing what is the good version; if someone insists on a version of Oliver Twist respelled and mildly abridged for an American audience, that's their right to work on that here.--Prosfilaes (talk) 03:11, 20 April 2021 (UTC)'
For a specific example of where a random edition is important, if you want to know how The Pilgrim's Progress affected Carl Sandburg's writing, you don't want the best edition; you might want the version the American Tract Society published in the 1850s, because that was the copy in his library. Maybe he also read others, but you want to get the ones he had hands on, not the one's influence by Bunyan.--Prosfilaes (talk) 05:29, 20 April 2021 (UTC)
This is not outside how we have been doing it; the Weird Tales pages have been up since 2009. Citing EB1911 is also a general example, quite relevant to your last statement. As for your last statement, I think it's pretty clear we who are advocating this want versions pages to point to all versions of works that users are interested in working on. There's some disagreement about details, but it seems that any place we're pointing to multiple works, we should have links to works that people are interested in doing. As for me, we shouldn't be making author pages, links on author pages, or version pages if nobody is realistically going to work on them; Author:Isaac Asimov should be trimmed to the bones. Allowing any authors to have an author page with all their works is a can of worms; what if someone imports the entire LoC authority list? In practice, however, it's not much of an issue.
Recently, you begged off writing guidelines. That doesn't seem to be anyone doing that job here, but it's hard to make official changes if no one is willing to clearly document them, and it's hard to debate when there's unwritten guidelines that are leaned on if and only if it's convienent.--Prosfilaes (talk) 03:02, 20 April 2021 (UTC)
Is this a helpful page Panchatantra? It that a versions or is that an article? Which bit are we wanting for clear navigation at enWS. — billinghurst sDrewth 10:44, 20 April 2021 (UTC)
@Prosfilaes: I think that I am entitled to cry off some things around here, and I never said that they were unimportant. I think that I do enough in a range of areas where no one else is contributing, so please excuse me from one of the things which I find difficult. — billinghurst sDrewth 10:48, 20 April 2021 (UTC)
Meh. That's certainly not how I would write Panchatantra, I would definitely remove all the modern works and cut down on the header, but it's certainly useful to have other editions that aren't yet on Wikisource listed there.--Prosfilaes (talk) 14:33, 20 April 2021 (UTC)

Disabled music scores[edit]

Nine months have already passed since the music scores produced by the score extension were disabled and looking at task T257066 it seems that nobody really bothers about contributors’ complaints. At w:Help talk:Score#Can we please re-enable display of the score images? it was suggested to replace vorbis="1" with %vorbis="1"% or %sound="1"%, which at least enables to see the image of the score. I tried it and it seems it works this way. What do others think about such a temporary workaround (especially as "temporary" can mean "really long" in Wikimedia environment). --Jan Kameníček (talk) 13:27, 19 April 2021 (UTC)

@Jan.Kamenicek: I ran a bot job to do that replacement in January (IIRC ~450 instances). I don't think any more have appeared since, because only cached scores can be shown until "they" fix it (i.e. the "fix" will re-enable existing scores, but not allow the creation of new ones, AFAIK). Inductiveloadtalk/contribs 13:35, 19 April 2021 (UTC)
@Inductiveload: I am afraid there are some which the bot probably did not catch, like Page:The music of Bohemia.djvu/32. --Jan Kameníček (talk) 13:39, 19 April 2021 (UTC)
Huh, quite right. Perhaps because it's not transcluded it didn't get caught in the dragnet. Please hold. ^_^ Inductiveloadtalk/contribs 13:45, 19 April 2021 (UTC)
@Jan.Kamenicek: OK, I hit another 40-ish. Let me know if you see any more. Inductiveloadtalk/contribs 14:19, 19 April 2021 (UTC)
It was transcluded, but now it is OK (i.e. at least the picture is visible), thanks! --Jan Kameníček (talk) 14:58, 19 April 2021 (UTC)
Huh, so it is. So Now I have another bug in my random pile of JS junk to figure out! Anyway, thanks for the heads-up. Inductiveloadtalk/contribs 15:12, 19 April 2021 (UTC)

Pictogram voting comment.svg Comment The way to get resources for such a fix is through the annual/biannual priorities lists, or finding a hacker. SCORE only came about after a long time as one of our participants, GrafZahl, showed a particular interest and even that took ages due to security issues. This will be a case of developing a case for a fix and lobbying. Sitting, waiting, hoping will not bring the solution. — billinghurst sDrewth 22:44, 19 April 2021 (UTC)

Contributors’ job is to contribute, tech team’s job is to provide technical support, WMF’s job is to provide funding. Any project can work well only when everybody does their job well. It is quite enough when contributors report bugs, why should they lobby for the bugs to be fixed, why should they lobby for something that should be just natural? What else should they do? Sorry, but I am definitely not "sitting". I want to be adding content above all, but also try to do some little maintenance too, report bugs, and in better times I also take part in or organize various wikiactivites in the real world. I do not want to waste more of my wikitime on neverending pleading the tech team for support. System where volunteers have to lobby for support can result only in losing them (or at least in not gaining them). The bug was properly reported in the phabricator which is the place determined for such reports. Now it is the tech team’s turn to fix it (and it has been their turn for 9 months already). --Jan Kameníček (talk) 23:51, 19 April 2021 (UTC)
I hear you, I am not saying this is the ideal, I am mentioning my view of the reality. Reality != perfect world. If we had a perfect world, I would not be manually adding information to WD, writing spam filters, deleting spam, undoing edits and telling people to read the guidance, ... — billinghurst sDrewth 00:35, 20 April 2021 (UTC)
yeah, scores was always unsupported, and a miracle it worked at all. one of these days a musical hacker will reverse engineer lilypond in open source, and we will have another mission to transcribe all that public domain sheet music. but until then, it will remain locked away behind paywalls and hard to find archive scans. Slowking4Farmbrough's revenge 00:19, 22 April 2021 (UTC)

Tech News: 2021-16[edit]

16:48, 19 April 2021 (UTC)

Firefox extension: TitleCase[edit]

Transform strings into Title Case, Proper Case, Start Case, Camel Case, Upper Case, and Lower Case. You have 2 ways to change text. Either by right clicking on the field and changing the case or by highlighting and only changing what you highlighted.

TitleCase

Just tripped over this Firefox extension that allows case manipulation of text through block, right click. I know that I regularly manipulate case when transcribing, though not as easily as this with the forms that I have. — billinghurst sDrewth 14:45, 20 April 2021 (UTC)

I've been using this for a while; its a great time-saver. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:11, 24 April 2021 (UTC)

A book is currently to be used on a contest[edit]

Dear Wikisource fellows. I would like to inform you that Index:Scented isles and coral gardens- Torres Straits, German New Guinea and the Dutch East Indies, by C.D. Mackellar, 1912.pdf is currently used on a proofreading contest. You will expect new users editing the book. I will monitor the progress and provide feedback to participants if needed. In case I missed something, kindly let me know. Much appreciated. ··· 🌸 Rachmat04 · 02:49, 22 April 2021 (UTC)

@Rachmat04: Sound exciting. Thanks for organizing this. Languageseeker (talk) 03:23, 22 April 2021 (UTC)

Suggested Values[edit]

Timur Vorkul (WMDE) 14:08, 22 April 2021 (UTC)

Call for Nomination of Texts[edit]

There's good signs that English Wikisource might try the French Wikisource approach of having a set of texts for the community to work on with new texts added each months and incomplete texts rotated out after three months. Therefore, I'm calling upon the community to submit nominations for texts that they would like to have on this site. These texts should be important and have a broad appeal. Here is a list of some texts that I can think of and that others have suggested in various places.

  1. Masnavi I Ma'navi (transcription project) (Wikisource Islam)
  2. The Mahabharat (external scan) (Wikisource India)
  3. The Portrait of Dorian Grey (start transcription) (Wikisource LGBTQ+)
  4. Paradise Lost (transcription project) (Seventeenth Century)
  5. Clarissa, or the history of a young lady (transcription project) (Eighteenth Century)
  6. Manhattan Transfer (transcription project) Etsu Inagaki Sugimoto, A Daughter of the Samurai (transcription project) (Celebrating the Public Domain)
  7. The Wanderer (Fanny Burney) (transcription project) (Women Writers)
  8. The new Negro: an interpretation (transcription project) (Black Writers)
  9. Uncle Tom's Cabin (1851 First Edition) (transcription project) (Slavery in the USA)
  10. Librarian's Copyright Companion (Legal Texts)
  11. Anna Karénin (transcription project) Russian Fiction
  12. Enquiry into plants (transcription project) Classics of Science
  13. The sidereal messenger of Galileo Galilei (transcription project) Classics of Science
  14. Remarks on prisons and prison discipline in the United States (external scan) Reformists
  15. Commentaries on the Laws of England (transcription project) (Legal Texts)
  16. Commentaries on the Constitution of the United States (transcription project) (Legal Texts)

Languageseeker (talk) 00:59, 24 April 2021 (UTC)

Manhattan Transfer is up for an upcoming PotM. It would be inappropriate to poach titles from that project. --EncycloPetey (talk) 01:10, 24 April 2021 (UTC)
Fair enough, I replaced it with a different work. Languageseeker (talk) 01:39, 24 April 2021 (UTC)
Define "important". All texts published through a non-vanity press are/were important to someone—otherwise they would not have been published. To go a step further, all 12,456 works in Category:Index Not-Proofread were important enough for someone to create an Index here for them. But you can't put all of those into your proposed rotation. Beeswaxcandle (talk) 01:51, 24 April 2021 (UTC)
The definition of important is left to the individual user. To prevent an overwhelming number of nominations, I’m asking each user not to nominate more than five texts.
For now, I’m planning on having 15 texts to start with. I know that important is a tricky and nebulous term and I know that I don’t know every text. That is why I’m asking the community for nomination. I’m looking for a diversity of texts. What text from your community or interest would you like in Wikisource? Which text do you think would others find interesting? Languageseeker (talk) 02:00, 24 April 2021 (UTC)
There are definitely some works that are more important than others. Quite a few historically-notable texts only have Project Gutenberg text needing match-and-split (if I had to suggest one: Uncle Tom's Cabin) As for one needing proofreading, there's the recently-declared-C.C. Librarian's Copyright Companion. Mcrsftdog (talk) 02:04, 24 April 2021 (UTC)
@Mcrsftdog: Added to the list. Thanks. Languageseeker (talk) 02:56, 24 April 2021 (UTC)
  • This work looks very nice; I would like to proofread it. However, the text layer is offset in several places; could someone (Xover?) please shift it? (The first is at p. 3; the p. 3 text layer doesn’t exist, and the p. 4 text layer is put there instead. This happens on a number of other pages, as well.) TE(æ)A,ea. (talk) 12:24, 25 April 2021 (UTC)
    @TE(æ)A,ea.: I can probably fix it, but it's not clear to me which File: and Index: we're talking about. Link? Xover (talk) 10:20, 26 April 2021 (UTC)
The Harvard Classics contains many significant works, with Index pages set up already. Among them are Don Quixote, Two Years Before the Mast, and Anna Karenina. We are very weak in world literature. In science, Galileo's The Sidereal (or Starry) Messenger is a short but seminal work which is not scan-backed, and Theophrastus' On the History of Plants is a work of monumental importance that we don't have at all. --EncycloPetey (talk) 04:13, 24 April 2021 (UTC)
I think that these are all good texts to have. My one concern is that Anna Karenina is a reprint. What do you think about the Leo Wiener translation ([ https://archive.org/details/completeworksofc09tols/page/n19/mode/2up external scan]). It also has the advantage of being part of a set of complete works. Languageseeker (talk) 04:41, 24 April 2021 (UTC)
I am not experienced enough with the translations to recommend one or another. I will, however make one more recommendation: any works by Dorothea Dix who pushed for reform in the care of the institutionalized. We have zero works from her, which is tragic. --EncycloPetey (talk) 04:49, 24 April 2021 (UTC)
I would like to suggest Commentaries on the Laws of England and Commentaries on the Constitution of the United States. We don't have classic jurisprudence at WS at all. Ratte (talk) 14:01, 24 April 2021 (UTC)
@Ratte: For the Blackstone Commentaries, do you have any strong feeling about featuring the original edition vs the 12th edition? Languageseeker (talk) 15:08, 24 April 2021 (UTC)
You mean 3rd edition of Book I? we don't have 12th edition here. When Blackstone has published the first edition of Book III in 1768, he decided to republish other books together with Book III. It's the one-time publication of all the four books, which have cross references to each other's pages. Ratte (talk) 15:32, 24 April 2021 (UTC)
@Ratte: Wikipedia states that the 5th and 12th edition are considered the best editions and HathiTrust has the 12th edition available. I'm wondering if it' better to feature the 3rd edition to finish it or go on to the 12th edition. Languageseeker (talk) 17:26, 24 April 2021 (UTC)
Now it is clear. Yes, I agree, it is better to go on to the 12th edition. Can you upload a scan of it? Ratte (talk) 17:47, 24 April 2021 (UTC)
My main reaction is looking at including humanities/philosophy/theology in the next set of texts to pair with the Classics of Science (probably more accurately STEM as we may want math as well). I like the idea of having some categories highlighting geographic (India / Islam / Russian), temporal (18th, 17th), author (LGTBQ+, Women, Black) diversity and then a set around professional (legal), STEM, and the humanities as subject diversity as a breakdown. Then within some of those broad groups we can then pick works (e.g. pick a century --> nominate an important work missing from that time period, pick an area --> pick a work, etc.). MarkLSteadman (talk) 22:48, 24 April 2021 (UTC)
@MarkLSteadman: Glad you like it. Is there any text from humanities/philosophy/theology that you would like to add to the list? Languageseeker (talk) 22:55, 24 April 2021 (UTC)
In philosophy we have no works by Moses Mendelssohn or Johann Joachim Winckelmann. We also don't have any works by Humboldt. Friedrich List was the founder of the historical school of economics and we have none of his. My background is in physics actually so I would defer to any humanists for gaps here... MarkLSteadman (talk) 23:33, 24 April 2021 (UTC)


@EncycloPetey, @Beeswaxcandle, @TE(æ)A,ea., @Mcrsftdog, @Ratte, @MarkLSteadman: I’ve begun to build the page for the Monthly Challenge for May and I would love some feedback. Still adding texts. See user:Languageseeker/MC Languageseeker (talk) 15:04, 25 April 2021 (UTC)

  • Languageseeker: It looks good, more or less, to me, from an aesthetic point. I would recommend putting a border around the flags, because the flags of Japan and England blend in with the white background. Also, the Librarian's copyright companion title stretches over two lines, which is unappealing; I think the title should be shortened (in the display) so that it remains on one line. As for the content, I agree (generally) with the idea of some longer-term works and some shorter-term works, but I believe more care should be taken as to the selection of categories. For new users, especially, some more fiction and less technical or scientific writing would be appreciated. TE(æ)A,ea. (talk) 21:08, 25 April 2021 (UTC)
Maybe an indicator of beginner-friendly or beginner-beware? Also, it is not clear why Islam / India have `Wikisource` before them while the other's don't. MarkLSteadman (talk) 22:00, 25 April 2021 (UTC)
@TE(æ)A,ea.: Thank you for your feedback. I'll add the border for the flags to my to-do list. Are there any works of fiction that you think should be featured? I'm especially interested in key texts that have been proofread and need to be validated. Over the next few days, I'm going to add a few more works of fiction.
@MarkLSteadman: I'm trying to make sure that the majority of the texts will not be too difficult. There will also be a separate talk page so that users can ask questions. I see this project as a means of training new users so I'm trying to strike a balance between being easy material for users to learn the skills and also giving users a few challenges. Let me know if any text strikes you as being to difficult. Languageseeker (talk) 00:30, 26 April 2021 (UTC)
I thought one of the differences here was that it was supposed to allow collaboration on works with more difficult formatting (e.g. Plays, tables (Heller), or footnotes), characters (e.g. s as in the 17th Century works featured, diacritics, or the Greek characters in the Loeb), or content (e.g. images or equations). Not for all the works, but for some of these categories. Currently I can at least some of these challenges with the Richardson, Milton, Shakespeare, Theophrastus, and Heller selections. My suggestion was maybe thinking about indicating these are good ones to get started, while these might be a little more challenging? Maybe something to think about for later? Or looking to provide a little more guidance around some of those issues in the discussion page? MarkLSteadman (talk) 00:48, 26 April 2021 (UTC)

BTW, if anyone wants to add a text to the page the format is {{MC-Cover|Index|Cover Page number|Title to Display|Publication Date|Author|Subject (the green text)|N|Country for the Flag}}

  • page = replaces the author with the specific page
  • cover = use a specific image for the cover
  • author = Text to display in the Author Field


@EncycloPetey, @Beeswaxcandle, @TE(æ)A,ea., @Mcrsftdog, @Ratte, @MarkLSteadman, @Billinghurst:@Xover, @Inductiveload: The near final version of the Monthly Challenge page is done Monthly Challenge. Any and all feedback welcome as always. Languageseeker (talk) 22:40, 26 April 2021 (UTC)

  • By the way, pinging (by {{re}} or otherwise) only works if you sign your comment. This page looks good, if we are comparing it to the main page of WS:PotM; however, if it will be placed on the front page, there needs to be some small part that can be transcluded there. The Community collaboration is represented on Template:Collaboration, which you can use as reference. A short blurb, for lack of a better word, would be readily accepted on the front page, once you finished setting up this project. TE(æ)A,ea. (talk) 21:45, 26 April 2021 (UTC)
  • Pictogram voting comment.svg Comment Can I ask that this be converted into a project like we have done for all other Wikisource:WikiProjects This subject matter is probably a replacement for project for Wikisource: Community collaboration and is what we have done on such occasions and is part of the current schema and means to address extended projects. It should not be seen as the replacement for PotM which has its particular purpose. — billinghurst sDrewth 23:48, 25 April 2021 (UTC)
I think that the consensus was that this will remain separate from both PotM and Current Collaboration for now. To get this fully running will also require the importation of the Bookworm Bot from French Wikisource which might take a while because the user the bot has largely left as far as I can tell. The Monthly Challenge is designed to help introduce new users and to improve the core collection of Wikisource. I need to translate/write-up the FAQ from the French Wikisource. Most of my time was taken up in creating the template for adding the texts to the page. I'm trying heavily to make it visually appealing.
The only consensus is that this should not replace PotM. The possibility of it coming in as a community collaboration has not been discussed. In the 12 years I've been contributing here, there have been a total of seven (7) collaborations in that box on the Mainpage. Billinghurst's suggestion is worthy of consideration. You must also consider how this proposed project will work on the Mainpage. There is limited space in the "Current Collaborations" box, which is where it needs to appear. Beeswaxcandle (talk) 01:15, 26 April 2021 (UTC)
@Beeswaxcandle: My comment is that if setup it can become the current community project, which then becomes a very simple decision of the community to migrate from the existing project to the next. It also then becomes a future decision for the community when we migrate to the next. The community has already had the discussion and the consensus for priority and flag projects, so the angst about having the project doesn't need to be had, we want these flag projects, WHEN someone is willing to run them. — billinghurst sDrewth 01:30, 26 April 2021 (UTC)
This idea is based on the French Wikisource Mission 7500 and the box should ideally be in the same location: Upper Right hand corner of the Front Page above the New Text section of a similar size to the Explore Wikisource box. It should be very visible so that new users can see it right away. The French Wikisource Mission 7500 project is yielding around 8,000 to 12,500 pages proofread and validated a month. So, this seems like a very worthy model to attempt to replicate. Languageseeker (talk) 01:46, 26 April 2021 (UTC)
Meh, you don't ask a lot. It is a community project and belongs in the community project space. Get your project up and sorted, with its requisite documentation and valid pages and system, and prove the concept. If we then want to have a conversation about the main page, and the various positions, then that is entirely a different conversation. — billinghurst sDrewth 04:08, 26 April 2021 (UTC)
I’m asking to give the experiment a fair chance. For me, the placement matters because it makes it extremely visible on the front page. I don’t want to get buried. Even if the Current Collaboration section is replaced with this Monthly Challenge, it would still require a reconfiguration of the template controlling that section. I’m not entirely sure how to do that. I’m happy to do the translation of the French if that is necessary. If this requires a community discussion, I’m happy to do that if you point the way to where. The Bookwormbot also requires the granting of a bot flag and importation that an administrator needs to do. However, it can be plugged into the current template easily as just another field. Languageseeker (talk) 05:13, 26 April 2021 (UTC)
Actually what you are asking for is for more than a fair trial. You are asking for more focus than any other project has had. You are asking for your project to have more focus than our completed works. I don't rate your proposal higher than completed works, especially as the completed works will appear in the completed list. You are suggesting proposed works are ahead of completed content. Anyway, that is getting way ahead of yourself as you don't even have a working project yet. Do the leg work, produce your system. When you have a project to which we can point people then we can progress. — billinghurst sDrewth 08:22, 26 April 2021 (UTC)
@Languageseeker: Do not remove line breaks, but please remove any hyphens when they are used to break words across lines I think this is a difference in en/fr process that shouldn't be imported. enWS common practice (rightly or wrongly) is to remove the line breaks. It might be better just to link to WS:MOS and Help:Formatting conventions than to call out a single rule? Inductiveloadtalk/contribs 22:03, 26 April 2021 (UTC)


@Inductiveload: I thought that it make more sense not to remove line breaks because it’s much harder to spot and correct mistakes when the line breaks are removed. However, if that’s a hard rule, I’m happy to remove that comment and start a separate discussion about whether removing line breaks makes sense.Languageseeker (talk) 22:40, 26 April 2021 (UTC)
I mean, I get why people leave them in (and we say to take them out, AFAIK for borked interactions between Mediawiki's almost indecent love of P-tags and our templates without consistent newlines in DIVs). But it doesn't make sense to me to have separate style guidelines for MC works. I'd say, just point at the existing guidelines and say "do that". When changing things, generally, change one thing at a time. Inductiveloadtalk/contribs 22:45, 26 April 2021 (UTC)
@Inductiveload: Ok. Removed the offending line. Didn't mean to break any major rules. Languageseeker (talk) 23:00, 26 April 2021 (UTC)
You should not be differentiating from any of the guidance of the style guide. It makes things difficult if you start to have different rule sets. — billinghurst sDrewth 12:03, 27 April 2021 (UTC)
RE: Validation I see you're adding some texts to be validated. This will require some oversight, as we frequently have new editors who do not properly understand what "Validation" actually entails. Some think it's a quick check, without actually comparing against the scan copy. They therefore rush through to get the work to get it "done". Some new editors end up using spell-check, and also do not compare against the scan of the original. Any process that advocates (or seems to advocate) for speed or for completion without caution and training. --EncycloPetey (talk) 02:04, 27 April 2021 (UTC)
@EncycloPetey: I agree that validation can be a tricky thing to master, but I also think that it's important to train new users. To help address, your concern, I added a note about the goal of validation and a link to the validation help page. Languageseeker (talk) 02:49, 27 April 2021 (UTC)
We also do have a Wikisource:Validation of the Month, but it is not advertised on the <Main page. --EncycloPetey (talk) 04:14, 27 April 2021 (UTC)
@Languageseeker: is there a particular reason Mathnawí is using volume 2, even though volume 1 is virtually empty: Index:The Mesnevī (Volume 2).pdf?
Also, I think it should be a "thing" that the index pages have to be fixed up (i.e. status is "to be proofread") before they can be entered for MC. Inductiveloadtalk/contribs 11:14, 27 April 2021 (UTC)
@Inductiveload: Volume 1 of Mathnawí' appears to try to establish the definitive Persian text and so is mostly in Persian as far as I could tell. To your second point, that makes sense. Languageseeker (talk) 13:23, 27 April 2021 (UTC)
@Languageseeker: Ah right, then that makes sense! Inductiveloadtalk/contribs 13:34, 27 April 2021 (UTC)
I was working on proofreading the 20 pages of Introduction and then planning on blanking the Persian part for exactly that reason to prevent exactly that confusion and get into proofread status quickly. MarkLSteadman (talk) 13:54, 27 April 2021 (UTC)

IA tool allowing duplicates[edit]

Yesterday, the IA tools allowed me to upload a number of duplicates leading to a large waste of time concluding with the duplicate indexes being deleted. Anyone else having the same issue or know what is going on? Languageseeker (talk) 18:25, 25 April 2021 (UTC)

Relatively recently (but I don't remember exactly when) some of us were complaining that the IA Upload tool wasn't permitting us to upload a DjVu file when Faebot had already uploaded a pdf of the same. It looks like this has now been fixed. Beeswaxcandle (talk) 18:47, 25 April 2021 (UTC)
Could it at least warn about duplicates? I don't want to create redundant indexes everywhere wasting everyone's time. Languageseeker (talk) 19:16, 25 April 2021 (UTC)
Often, you need to manually check for duplicates first. Sometimes the IA copy has to be edited to repair pages, remove duplicate pages, or strip Google notices. The IA tool won't always catch those as "duplicates". --EncycloPetey (talk) 19:18, 25 April 2021 (UTC)
The phabricator task T269518 which was closed as resolved short time ago should allow duplicates if they are of different formats (after a warning is launched) but should not allow exact duplicates, see the comment of @Samwilson: from 9 April there. So if some duplicates were uploaded in the way described above and without warning, it is a bug. --Jan Kameníček (talk) 21:52, 25 April 2021 (UTC)
They weren't exact duplicates—.djvu vs .pdf, which is correct behaviour. Also, some were a different printing of the same edition, which no tool is ever going to pick up. Beeswaxcandle (talk) 22:57, 25 April 2021 (UTC)
  • Pictogram voting comment.svg Comment The responsibility to check Commons for the existence of a work does not lie with the IA-uploader tool, it lies with the uploader. There are many means that copies of a work can occur at Commons. There can be multiple editions, there can be copies of the work from multiple sources, so there needs to a be more mature approach than relying on the upload tool, especially as the PDFs were uploaded by a person to cover their needs, though they are generally inferior text layers than the DjVus. There is not even the guarantee that the best copy of the work has been uploaded; nor that the copy uploaded is the best to proofread. If you are using a PDF layer, you are often making things harder for your proofreading and more likely to get errors in the produced work.

    To more fully comment on an identified issue it is more helpful to have some examples and processes followed rather than just react to general complaint. — billinghurst sDrewth 00:14, 26 April 2021 (UTC)

Diskussion:Projekte[edit]

I just discovered this at de.wikisource,

Rules For New Projects (English Translation)[edit]

Each new project with an extent of more than 50 pages must meet the following points:[1]

  • 1. The script meets our requirements. See text base.
  • 2. Scans must be uploaded to Commons and the quality of the scan has to be good enough for proofreading.
  • 3. To meet the requirement of the 4-eyes principle the point a) or b) must be fulfilled:
    • a) Before or while the work on the project a quid pro quo in an equal extent is expected. (e. g. Proofreading)
      or
    • b) The project has found enough backers to be finished in a comprehensible timespan. To search for helpers this page can be used.
  • 4. To be clear, every project over 50 pages must be announced here, before start. Before start means: No index nor articles should be created before approving of the project. Is there no concern in the span of ten days the project can start.[2]

It is strongly recommended to also announce little projects with big parts of non latin letters, like greek, hebrew or handwritten scripts.

Also look at:


Some interesting sentiments are expressed, although it is presumably a reaction as policy by community members to abandoned projects. I don't see a concern where this remains in 'work- or proof-reading space' (the indices and their pages) and it is done with restraint and forethought, but obviously there are many practices here that would not meet the same degree of explicit or tacit approval at that sister's community. CYGNIS INSIGNIS 06:15, 26 April 2021 (UTC)

References

  1. beschlossen im März 2010 (Permalink zum SKR)
  2. Siehe Diskussion Juli 2015
The German Wikisource has decided to do many things differently from the other Wikisources. For one thing, they never adopted the idea of an Author namespace. Their Wikipedia has a number of very different approaches from everyone else as well, such as permanent parallel duplication of categories. It does mean that they often attempt approaches that no one else has tried, because they are not simply doing what everyone else is doing. --EncycloPetey (talk) 17:37, 26 April 2021 (UTC)
I think having such a policy fairly obviously has the exact outcomes you would expect: extremely low proofreading rates (~20k proofread or validated pages/year, vs 230k at enWS and 400k at frWS) as well as low participation (125 vs 428 and 253 active users). A less expected outcome (for me at least) are that the overall deWS proofread:validated ratio is still only ~1:1. Whereas it's "worse" (roughly 3:1 at both enWS and frWS), I'd have expected much better, since it should trend to 0. Also the very low productivity compared to frWS, where the pages/active user is ~10 times higher (enWS is "only" 4 times higher). For more fun stats: https://phetools.toolforge.org/statistics.php
While I imagine the overall quality in mainspace is much better, at what cost? And could we not achieve the same outcome through being stricter on pre-emptive transclusion of unfinished works? For example, adding a date to {{incomplete}} and having them flag up after n months?
Also, I disagree with the underlying implication that a proofread-but-not-validated text is somehow worse than no text at all. Inductiveloadtalk/contribs 18:07, 26 April 2021 (UTC)
With our Translation namespace it was expected that users would be using the Page: ns to do their translations, and this has not particularly been enforced, especially as we move over number of old translations. dWS ddidn't/doesn't use ProofreadPage so they can have a different level of approach/tolerance to translations. If we had them in the PrP environment and not naked in Translation: ns, then we would have no ugliness there at all, and accordingly infinite patience. — billinghurst sDrewth 11:50, 27 April 2021 (UTC)
I think "(English Translation)" just means it is an English translation of the deWS rules for all works, not that it only applies to translations. Particularly, since deWS requires permission (and a 10 day delay) on even creating the index page (which is, I guess, why they only have 142 "unproofread" or "incomplete" indexes: petscan:18955661), they do not have infinite patience, even in the working spaces.
Over here, we have had at least one "textbook" translation recently: Translation:The Three Princes of Serendip. Inductiveloadtalk/contribs 12:11, 27 April 2021 (UTC)
Two ideas I recall seeing around are a section on the author page that separately lists active transcription projects and a search button for linked indexes. I would prefer just the latter to avoid so much self reference on the site, and it is a better practice for avoiding concerns like working on a new index that is already half done elsewhere (as mentioned by Encyclopetey somewhere). If links to indices are manually added wherever, then it becomes remiss not to add that to the list of things to do aside from proof reading. CYGNIS INSIGNIS 13:05, 28 April 2021 (UTC)
it is good to have german wikisource as an example of what not to do. at wikimanias, we ask each other what is up with the "sick man of europe" [hopelessly lagging] wikisource. if you think i am exaggerating, check out the statistics: back in 2010, en & de had the same proofread pages around 100,000. now, de is at 300,000 and en is at 1300000. [9] the unhealthy wikis like de wikisource and wikinews are a choice by the admins at those projects: there is no technical reason they cannot be as productive as the english, french, italians and polish.Slowking4Farmbrough's revenge 22:45, 28 April 2021 (UTC)
@Slowking4: please rephrase that second sentence. CYGNIS INSIGNIS 04:05, 29 April 2021 (UTC)
ok. describing chronic failure has historical antecedents. but the lesson remains, do the opposite - no quid pro quo, no preconditions to start an index, no precondition of announcements or support, no requirement of text base (scanned back) pages. Slowking4Farmbrough's revenge 15:28, 29 April 2021 (UTC)
there might be pertinent comment after that attempt at a rejoinder, but I lost interest … CYGNIS INSIGNIS 15:51, 29 April 2021 (UTC)

Looking for (ancient) Armenian speakers to help transcribe a text on Wikisource[edit]

Recently the French Wikisource community started to work on the transcription of Grammaire de Denys de Thrace. So far you might wonder what the relation with Armenian speakers? Have a look to the full title instead:

GRAMMAIRE de DENIS DE THRACE, tirée de deux manuscrits arméniens de la bibliothèque du roi. Publiée en Grec, en Arménien et en Français, et précédée de considérations générales sur la formation progressive de la Science glossologique chez les anciens, et de quelques détails historiques sur Denis, sur son ouvrage et sur ses commentateurs ; PAR M. CIRBIED, membre de la société royale des antiquaires de france, professeur d’arménien à la bibliothèque du roi. extrait des mémoires de la dite société.

That's a bombastic title if you want my opinion. 😂 However it gives a good overview of what it contains. Especially, it make transparent that the book includes a huge load of material in Armenian. So the text is basically translation and comments of a text in Armenian, itself a translation of an ancient Greek text - also provided in the book. As the French Wikisource community has limited skills on Armenian, it makes the work of transcription far more complicated. Thus this call.

We are looking for people able to make transcriptions of text in Armenian alphabet. It would be even better if we could find people with ancient Armenian, since it is expectable that the text will likely contain oddity of the past. And if we can find someone who can moreover speak French or English to interact in discussions with the community, it would be perfect. Note that the main requirement is simply being able to read Armenian alphabet and to write it in simple Unicode transcription. In particular, there is no expectation that the potential helpers would make any work on the formatting that often require to deal with locale templates.

Please be bold in spreading the word wherever you think that could reach potentially interested Armenian speakers.

With all my warm love, Psychoslave (talk) 07:31, 26 April 2021 (UTC)

IP Masking Engagement[edit]

Hello Wikisource community, this is about IP Masking engagement which the Anti-Harassment Tools team is carrying out.

The point of the engagement is to understand how the project will impact editors. Also, we want to know which other tools you will need to be able to effectively govern the projects in absence of IPs.

Please read more on the IP Masking project here.

Please add your comments on the talk page.

Best regards,
STei (WMF) (talk) 12:43, 28 April 2021 (UTC)

Tech News: 2021-17[edit]

21:24, 26 April 2021 (UTC)

Mass move/rename of pages[edit]

We have a simple and effective SQL tool for mass deletions, I was wondering if we also have a mass move/rename script? — Ineuw (talk) 15:04, 27 April 2021 (UTC)

Help with Validating a Text[edit]

I was wondering if someone could validate Index:Paradise Lost Manuscript. It's part of the upcoming Monthly Challenge series and I think that it would make an extremely impactful first text to validate. The Index consists of the 34 pages that are book 1 of Paradise Lost and the only extant part of the manuscript. The text was written by an amanuensis. Languageseeker (talk) 00:43, 29 April 2021 (UTC)

Taking a quick look, there are two consistent issues I see. (1) Indentations of lines from the original is not replicated, and line indents are to be replicated for poetic works. (2) You've used the poem-tag throughout; this can cause huge headaches for multi-page works when they are transcluded. The poem tag does not always behave predictably, so for poems that span multiple pages, line breaks are preferred. --EncycloPetey (talk) 05:23, 29 April 2021 (UTC)
I read the indent point (1) as a new stanza, without bothering to check if that was what the printer did. What is the plural for 'amanuensis', because that is how this document was apparently compiled. Point (2) should be policy, the resources of this site have laboured to accomodate a tag that does little more than obviate the need to add breaks. CYGNIS INSIGNIS 13:23, 29 April 2021 (UTC)
@EncycloPetey, @Cygnis insignis: Thank you both for your feedback that helps me to clarify/understand what remains to be done on this manuscript. I think that I would like to reproduce the look of the manuscript as much as possible. There are four major tasks remaining when it comes to formatting 1) Replace poem with br tags 2) add in {{ls}} 3) add missing plines 4) add gaps. Languageseeker (talk) 13:43, 29 April 2021 (UTC)
In a printed text I have no hesitation in disposing of the indent and replacing that with an empty line, a la wiki, but in the case of a new transcript I am not so sure. Perhaps I should nominate for deletion as 'self- or un-published' to save me worrying. CYGNIS INSIGNIS 14:04, 29 April 2021 (UTC)
When the text is in prose, I agree with you about dispensing with line indents, but with poetry an indented line does not signify the start of a "paragraph", and may be an internal line of a stanza that has been indented. Deciding when indents are new stanzas and which are internally indented lines is an editorial decision, and choosing to make a break where there was an indent only, will affect the way it is read. --EncycloPetey (talk) 15:03, 29 April 2021 (UTC)

Score needed[edit]

Could somebody please oblige by transcribing the short score on Page:S.S. Bremen - G. Howell-Baker - music by E. Edgar Evans.jpg? It's beyond my skills. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:36, 29 April 2021 (UTC)

  • Andy Mabbett: I have done this, although, because the score feature is still disabled, nothing renders. TE(æ)A,ea. (talk) 19:08, 29 April 2021 (UTC)

dentition renderin'[edit]

"the cat proceeded to its bowl of biscuits

CRUNCH CRUNCH CRUNCH!. [apol. to Brautigan] :(

I need to add a dental formula in a work, there is some coding for that at the big sister: this. That could be copied over to Template:DentalFormula for the convenience of those familiar with it, but any stable code that allows me to apply numerals and separators above and below a 'line' (or other demarcator) would be useful for the odd instances I have found. This is useful for odd fractions too, 16ths, but I couldn't find anything mentioned in the Help pages. CYGNIS INSIGNIS 13:12, 29 April 2021 (UTC)

Would it be possible to use Template:Sfrac? To produce, for example, 2.1.3.3/2.1.3.3 ? That template supports rendering all characters as far as I know in both the "numerator" and "denominator," and the Wikipedia Template:DentalFormula is based on it as far as I know. Mathmitch7 (talk) 14:45, 29 April 2021 (UTC)
[e/c] There's {{sfrac}}: 2.2.2.2/2.2.2.2. Export is a bit wonky on some less capable clients. I am unsure of the best practice for fractions in the general case, especially w.r.t. accessibility. Inductiveloadtalk/contribs 14:46, 29 April 2021 (UTC)

Translations and Reprints from the Original Sources of European History[edit]

Today I happened to come across the work Translations and Reprints from the Original Sources of European History, published by the University of Pennsylvania's History department from about 1897-1907. As far as I can tell, archive.org has full or nearly full coverage of the series, which has several volumes. I was originally looking for a PD English Translation of "fr:Qu’est-ce que le tiers état ?" by Author:Emmanuel Joseph Sieyès which is available in full on the French Wikisource but not yet validated, and I happened across this series, where excerpts of that text appear in the 6th volume. I guess my question is, would this source be something that others would be interested in having all volumes up on commons and a central page here with a table of contents? It seems quite wide-reaching and may help us expand English-language coverage of minor texts from European authors. I'm just curious if this is a work people think would be useful. Mathmitch7 (talk) 14:40, 29 April 2021 (UTC)

@Mathmitch7: I generally think it's a very good thing to set up this kind of collective work "on spec", because the set-up is quite a lot of faffing about for many users, but dipping in to proofread an article or two is easy once that groundwork is laid. But make sure it's well linked from authors pages and perhaps Portals so it can be found. If it can't be found, no-one will come, even if you build it.
In theory, if you don't proofread any articles, it can't have a mainspace page, however. C.f. Wikisource:Scriptorium/Archives/2021-02#No-content_mainspace_pages for a long but fizzled discussion to try to determine best practices for that. Inductiveloadtalk/contribs 14:56, 29 April 2021 (UTC)
I would be interested in doing some of that, though probably very little overall, however, I have a preference and see no harm in having sections of the volumes with red-linked parent titles that follow this sister's convention of series/vol/sect, eg, I care for about 10% or less of The Emu, linking the 'parent titles' from The Emu/volume 3/Extinct Tasmanian Emu would be, as someone here said, "guaranteed to engender disappointment". CYGNIS INSIGNIS 15:29, 29 April 2021 (UTC)
@Mathmitch7: That looks like an amazing find. I’m sure some of those translations are still probably the only ones ever made. I took a quick look and there appears to be a new series as well. Languageseeker (talk) 19:22, 29 April 2021 (UTC)
Alright, so I've now uploaded one scan of each volume available on the internet archive of this series, you can find them at commons:Category:Translations and Reprints from the Original Sources of European History (UPenn series). I'll try to get them up to wikisource soonish, but I have some other things I have to do today. I'll note that the "new series" mentioned (which I also uploaded) are more monographs than big edited volumes, but I think they will still be helpful. Also, I was unable to find a version of Volume 5 readily available -- there's a version on archive.org but as it's a reprint from the 1971 it's not fully available for download, so that's something to look for in the future. Mathmitch7 (talk) 13:29, 10 May 2021 (UTC)

Is this an acceptable Index page name?[edit]

Index:Narrative of Henry Box Brown - who escaped from slavery enclosed in a box three feet long and two wide and two and a half high (IA narrativeofhenry00brow).pdf

The document is always referred to as "Narrative of Henry Box Brown" am I permitted to shorten it? This is the only copy on IA, and I have it as a .djvu file but is identical to this pdf. — Ineuw (talk) 04:48, 2 May 2021 (UTC)

My first question would be "but why?". There is no direct relationship between the transcluded work name and the Index: page name. Renames at Commons generally related to their criteria, so it is more into that space than ours when we have no requirement. I have shortened filenames at Commons though only where they have been problematic/wrong. — billinghurst sDrewth 10:57, 2 May 2021 (UTC)
Now that you moved the file name on commons the text had to be moved by an admin. I know that the file name wasn’t ideal, but It’s really creating more work for very little reward. Languageseeker (talk) 11:46, 2 May 2021 (UTC)
Ineuw is an admin so essentially they are just creating work for themself. — billinghurst sDrewth
the 19th century practice of putting all the metadata in the title is a little tiresome, but index title length or sense does not matter, as we can name the work what wikisource wants. and they tend to have machine generated internet archive artifacts. so all pdf names should be acceptable including "qwerty123". Slowking4Farmbrough's revenge 16:01, 3 May 2021 (UTC)

┌─────────────────────────────────┘
Thanks for all the comments, and apologies for these belated clarifications. I was distracted by unrelated issues.

My primary concern in renaming the index was hoping not to offend Languageseeker the original uploader. The reason for doing it was aesthetical, simplifies web search, and it's short, clear, and unambiguous. It is also uncluttered when proofreading. As long as this does not breach WS rules, the critiques are non-sequitur. — Ineuw (talk) 11:38, 6 May 2021 (UTC)

Sections[edit]

Please can someone explain to me in simple terms (or point to page which does so) how sections can be transcluded?

I tried on Showell's Dictionary of Birmingham/A, but that failed. What did I do wrong?

I am confused that we appear to have two types of markup: ## A ## and <section end="A" />. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:43, 2 May 2021 (UTC)

  • You needed to add the sections on the final page. On the final page, there are two sections: “A” and “B.” You should put ##A## at the top of the page (where the “A” section begins on that page) and ##B## between the “A” and “B” sections. I have done this, so the page now transcludes properly. TE(æ)A,ea. (talk) 13:19, 2 May 2021 (UTC)
In terms of the two types of markup: the # style is "Easy LST", while the <section /> is the standard style. Which you see/use depends on whether "Easy LST" is switched on in your Preferences. It's on by default. Beeswaxcandle (talk) 17:12, 2 May 2021 (UTC)
  • Essentially they are the same, the Easy LST is simply a javascript that converts the section tag to ##. At a point in time it was considered easier to have what was considered an easier methodology of simply requiring to mark the start of a section, and allowing the next section to open and close. — billinghurst sDrewth 08:31, 3 May 2021 (UTC)
<section end="A" /> is more commonly used, familiar, and self evident code that is available with a click. CYGNIS INSIGNIS 11:46, 3 May 2021 (UTC)
Noting that the section tag is in the editlist wiki markup drop down, and I knwo that I have made buttons for my toolbar to make my life easier. — billinghurst sDrewth 13:43, 3 May 2021 (UTC)

[Maintenance] Editions marked as "literary work" at wikidata[edit]

Looking at Wikidata we have about 1400 editions of works that are called "literary work" rather than version, edition, or translation (Q3331189).

Only a rough query, but it gives a good example of fixes that we need to make.

I will guess that we have some other variations of work that we will need to fix. Cannot just relabel them as some have merged editions into works. :-(

billinghurst sDrewth 03:04, 3 May 2021 (UTC)

Need to stop overwriting old versions, need to make a new version and disambiguate[edit]

Can I ask that as a rule that we typically never overwrite an edition of a literary work, especially a long-held work. When we have a new edition of a work, then we should name and disambiguate it. The community can then have the conversation about what to do with works.

With our long-held works where works were often not done as subpages there are redirects, versions, blah blah blah that need tidying up and it getting disconnected; plus the wikidata is getting horribly inaccurate. I hate having to upick these Gordian knots for lack of a simple solution being undertaken at the beginning, especially where an admin who can move with subpages, and without redirects. There may be exceptions to the rule, but these exceptions should be made with a clear mind, and by consensus. — billinghurst sDrewth 07:30, 3 May 2021 (UTC)

I think there should be a lot of caution in overwriting, it probably needs to go through discussion when the content is sourced and presentable.

Burning redirects has widowed content I linked from wikipedia in the past, I have opted to link sites like BHL instead for that and other reasons, and I already see how moving the second edition of Descent of Man on a whim or head-canon of a rule may make the deeplink I just added there useless. CYGNIS INSIGNIS 11:38, 3 May 2021 (UTC)

yeah, need some guidance about editions. we have a lot of non-scanned back works with no edition information, that we would like to update. and a lot of non-scanned backed poems (extracts from books). in addition to non-scanned backed works where the available scan is a different edition. Slowking4Farmbrough's revenge 15:47, 3 May 2021 (UTC)
@Billinghurst: I don't understand why? Look at The Deserted House (Alfred Tennyson) - a poem with edition information, but introduced by user, we can't verify it here and now. And other The Sun Rising - no edition information, no source. Why we can't overwrite works of this type? What's wrong if we will do it? Tommy Jantarek (talk) 23:47, 3 May 2021 (UTC)
@Tommy Jantarek: Firstly for your first example our version reputedly comes from The Works of Alfred Lord Tennyson (see its talk page), where is your version coming from? Should there be two versions? What will be the harm, issue with nnn versions?

Secondly, those works by our current methodology we would set them as subpages of their published work, and create either a redirect from root if we only have the one version, or we have a {{versions}} page and point to each.

Thirdly, because people have been doing a ham-fisted and half-arsed job of it, and fixing it up from that is an absolute PITA and waste of (my) time. A version of a poem will most likely have come from another work/source, so it should now be a subpage, not a root work; they overwrite with a different edition and a different source, they don't update the wikidata, etc.

Fourthly, we try to have half measures, and general guidance, and then we get hung by someone saying that they were not told that they could not do it by our written instruction. So I would like to see a gentle review process on how we better handle a new version where no one person makes such a decision, there is at least an independent review that we are not losing works where versions should exist. It does not have to be convoluted or power-based, though it should be reviewed against a set of criteria.

So I see that it is best that the each new work is rendered newly, with the wikidata freshly entered, AND THEN we look what we do with the old version. We can move and overwrite at that stage if we determine that is the best way to progress. — billinghurst sDrewth 03:11, 4 May 2021 (UTC)

I will also note that your Tennyson example, it directly matched to the work The Deserted House (Q7729810) rather than being its own edition of a work. There is not one version of many of our works, so we create editions with their provenance. — billinghurst sDrewth 03:16, 4 May 2021 (UTC)
Regarding 3. What if scan is exactly the same like on edition information? Do we overwrite old version and move it to subpage? And what if old version have no edition information or wikidata item? If user overwrite old version, he might move the overwrote poem to subpage. Tommy J. (talk) 12:37, 4 May 2021 (UTC)
That will be the review process to determine what to do. I am saying that we should typically not replace/overwrite as the first step of actions. If you have a new scan, transclude its edition of the work. Do you see the string of "what ifs" that make it hard to write rules for the newbie or less aware that we work on versions, and of our processes, and all the factors to check. — billinghurst sDrewth 13:48, 4 May 2021 (UTC)
I think the fundamental issue is that most texts are presented as the one-and-only one of text. However, every text is an edition. The Deserted House (Alfred Tennyson) is ambiguous because this poem probably exists in several versions. If this site wants to stop having confusion over texts, then it needs to do two things. 1) Ban non-scanned back copies. This whole entire situation exists because users are trying to replace non-scan backed versions with scan-backed version. Non-scan backed versions are always going to be dubious because there is no simple way to verify their authenticity or accuracy. 2) Set up rules for publishing works. At minimum, it should include the date. 3) Make the deletion on non-scan backed versions when a scan backed exists a criteria for speedy deletion. Languageseeker (talk) 14:42, 4 May 2021 (UTC)
In re your proposal #3, no. As a text repository, we need to ensure that incoming external links don't break. This is the point of what Cygnis is saying and a principal reason for Billinghurst's initial post. The various forms of disambiguation pages and the soft redirect process assist with ensuring this. Beeswaxcandle (talk) 17:40, 4 May 2021 (UTC)
Most texts are the one-and-only one. Once in a long while, a work will get reprinted with some changes, but it's the exception, not the rule. Reams of fiction and non-fiction get written every year and even after you filter for intent to publish and actually getting published, most of it just disappears. This is doubly true if you ignore facsimile reprints and other reprints that don't introduce any interesting changes.
I get it; you have an interest in certain works widely considered important that have important distinct editions. I don't. I'm much more interested in the rare and esoteric. Please understand the differing needs and concerns of some of your fellow users. I can't really imagine that continuing to push the banning of non-scan backed editions is in your best interest; work with the community as it is, don't walk in and trying and make huge changes.--Prosfilaes (talk) 17:55, 4 May 2021 (UTC)
@Prosfilaes: As far as I can tell, the issue that billinghurst is raising is the overriding of non-scan back editions with the text of a different edition: a user overriding Version X of a title with Version Y. This creates confusion in the WD. Therefore, billinghurst is requesting that users do not override a non-scan backed version prior to discussion. Otherwise, the WD becomes inaccurate. Cygnis insignis points out that excessive disambiguation breaks links and makes this site less usefull to Wikipedia.
I am arguing that this occurs for two reasons. First, many pages are created in a way that can lead to ambiguity. Yes, most texts are one-off, but that is no reason not to take more care in creating page name. Even title (year) would lead to less ambiguity than currently exists. I recently had a long discussion about disambiguation where the administrators stated that the current policy is to allow ambiguity and then disambiguate later. Poor naming leading to more future disambiguation and link breakage. Instead, disambiguation should be the exception and not the norm. Second, users are overriding these texts because they wish to improve the quality of the text. Many users see scan-backed versions as better than non-scanned back versions. This will continue until this site needs to make a firm policy on non-scan backed copies. If we allow them, they they can never be deleted. If they are to be replaced, then they should stop being made to avoid creating additional work.
Finally, I'm not against obscure work or only for those considered important. What is important changes over time. Paradise Lost used to be an obscure work. However, I believe that most users will come looking for the texts currently seen as important. Creating scan-backed versions of these works can attract more users that will proofread more works obscure, important, or otherwise. Languageseeker (talk) 00:59, 5 May 2021 (UTC)
Don't presume, ask. It is broader. I have numbers of reasons why, and essentially the free-for-all for numbers of situations the overwriting process is broken. I am saying that the community needs to holistically manage overwrites, and the way that I see it best to do it is to keep it simple "any new work coming in scan-backed to be transcluded as a new version". Then if we think a version is redundant, then the community manages it by its processes. This is typically not a single-person decision.

At this stage non-scanned editions are within scope, so please do NOT pollute this conversation with that matter. Separate conversation deserving of its own discussion at a later time. Dealing with what we have. — billinghurst sDrewth 02:46, 5 May 2021 (UTC)

My humble opinion and suggestion. The community will do what you all will decide, I'm just guest here. It seems to me you all plays too much value to non-scanned texts. This type texts are like Wikipedia articels without sources and footnotes. Are these letters, words, sentences and punctuation correct? Nobody knows it (even author of page) and a verification is very hinderet or impossible. Data based on non-scanned texts is like divination by the cards. "Create new text and the community can then have the conversation about what to do with old version". OK. But what is if community make a decision to delete? We will delete not just text but user's contribution and work also whole history of page. It's no good and honest practice. And it's destructive. In my opinion Proofread project offers most usefulness and credibility on this days. When user (usually experienced) overwrite non-scanned text then text's quality and credibility grow and whole history is preserve. Unbelievable text without source gains its strenght. Proofread pages aren't transcluded by newbies or very rarely. Few newbies know this project sufficiently to do it. What's with Wikidata? If user overwrite and move old text to adequate subpage good practice will be to visit Wikidata item and update all information. There is not a lot of work and it require just one visit preferably right after overwriting. I think that user who know how transclude and move pages also he know how visit and update Wikidata items. The plwikisource community attach much importance to Proofread Project from many years and it brings great results in my opinion. Forgive me for this long scribble, please. Tommy J. (talk) 21:35, 6 May 2021 (UTC)

@Tommy Jantarek: I've started a discussion about this at the Scriptorium if you'd like to express your excellent points there. Languageseeker (talk) 04:59, 8 May 2021 (UTC)
Pictogram voting comment.svg Comment @Tommy Jantarek: I don't disagree with your opinions about scan-backed texts being preferred than a copy and paste. I don't think that the community disagrees either. Now this may be an English language issue alone, that I cannot say, however, I started this thread because the practice of users has been problematic, so you are hearing my experiences with having to resolve problems.

There n be no updating of Wikidata. There are partial moves, partial overwrites, no link fixes, US editions replacing UK versions; illustrated versus non-illustrated versions. So to me we have to change the practice, instead of users themselves choosing to overwrite, instead create a new version [keep it simple]. The community can then decide what to do about versions, and how to go about it. — billinghurst sDrewth 09:50, 8 May 2021 (UTC)

Text of Template:migrate to needs to be adjusted as it encourages the in situ replacement, and no guarantee that the replacement is appropriate. We have no clear quality control on the placement of the the template to even know that a one-for-one replacement is the correct instruction. — billinghurst sDrewth 09:53, 8 May 2021 (UTC)

Tech News: 2021-18[edit]

15:43, 3 May 2021 (UTC)

Philosophical Transactions of the Royal Society A – Volume 184[edit]

Wasn't sure where to post about this since WikiProject Royal Society Journals seems to be dead, but I'm taking on the task of filling out Volume 184 of the journal Philosophical Transactions of the Royal Society A, published in 1893. Why Volume 184 specifically? Can't remember. This is a pretty big undertaking, and I'm mostly doing whatever random pages I feel like in whatever order, so anyone who's up for learning about chemistry or higher mathematics or astronomy or aether theory, etc., as it existed in the late 19th century: feel free to jump in! I already have about 50 pages "done" (several are still missing images, and a handful have ostensible misprints such as teeny tiny exponents that I can't read; nevertheless, those have been proofread as best I can otherwise), but that's just put a dent in it, let alone the other volumes of the journal. A fair warning that a lot of these pages require prolific use of the 'math' tag, but that certainly doesn't apply to all of them. Either way, I think it's a lot of fun, and it'd be great to have more eyes on the project than just my own. :) TheTechnician27 (talk) 17:33, 3 May 2021 (UTC)

@TheTechnician27: Thanks for letting the community know, and great to have you with us. I am not particularly into proofreading those works, though if you need assistance in working in the page: ns, or transclusions, etc. then please let me or the community know here. If you need a bot run through to apply text layers, then please put a detailed note on Wikisource:Bot requests and ping me. When one is jumping about for particular articles having the text there, and a tuned {{engine}} to search within the work can be quite handy. And with regard to proofreading, we all just do our best, and learn on a daily basis, so don't overly sweat that. — billinghurst sDrewth 01:56, 4 May 2021 (UTC)

French speaker needed[edit]

Would someone proficient in French kindly check the three pages starting at Page:Aerial Flight - Volume 2 - Aerodonetics - Frederick Lanchester - 1908.djvu/374? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 08:26, 4 May 2021 (UTC)

I've reviewed this pages; accents were missing in the source, e.g. in Pénaud; I've used a SIC only for the first occurrence in the text. I've used symbols ′ and ″ for prime and second. M-le-mot-dit (talk) 11:05, 4 May 2021 (UTC)

Google OCR does not work[edit]

⧼error⧽ undefined cURL error 60: SSL certificate problem: certificate has expired (see http://curl.haxx.se/libcurl/c/libcurl-errors.html)

Anybody knows where is the source of this problem? Ankry (talk) 09:04, 6 May 2021 (UTC)

@Ankry: It should be back up now. There was a transient problem with the SSL certificates for all services hosted on Tool Labs. I don't have details on the cause or remedial actions, but the issue was reported regarding multiple other tools and subsequent reports that the issue was no longer present. Xover (talk) 11:12, 6 May 2021 (UTC)
Yes check.svg Done confirmed. Ankry (talk) 11:38, 6 May 2021 (UTC)

Text in both margins[edit]

Suggestions, please on layout for pages like Page:The Grand junction railway companion to Liverpool, Manchester, and Birmingham; (IA grandjunctionrai00free).pdf/41, which has text in both margins. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:40, 6 May 2021 (UTC)

Push it all left, otherwise it looks like the teeth of a saw, and is just problematic when transcluded. If you are truly concerned about the Page: ns, then there are some templates that show right, and transclude left. — billinghurst sDrewth 14:14, 6 May 2021 (UTC)
And how will that relate to the subheadings "From Birmingham" and "From L'pool & Manch'r"? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:12, 6 May 2021 (UTC)
How about {{Sidenotes begin}} and its siblings, as seen in The_Solar_System/Chapter_1? The rendering won't be exact (in particular I’m not sure how you would get the rules down the page), but it will get you lined up with the text nicely in a way I’m not sure how else to do… — Dcsohl (talk) 17:25, 6 May 2021 (UTC)
Footnotes, with the cute arrangement of to and from distances linked to the nearest full stop, period CYGNIS INSIGNIS 18:38, 6 May 2021 (UTC)
I've implemented a form of that on the above page, but with the footnotes attached to the place where the mileposts are mentioned in the text. The footnotes are going to get very repetitive like that though. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:43, 6 May 2021 (UTC)
We also need to account for the change in presentation on pages like scan 64, which differentiates between two types of content. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:08, 6 May 2021 (UTC)
i would be tempted to ignore the double side notes. some right waypoints are in the text. unclear what the value added is. Slowking4Farmbrough's revenge 22:34, 6 May 2021 (UTC)
@Pigsonthewing: On second thoughts, don't see them them as marginal, and just see it as a table, and set it a three column table, centre floated, and manage the intervening lines as a merged column and embed {{RunningHeader}} for that row. (Don't answer questions late late at night when dog-tired.) — billinghurst sDrewth 00:30, 7 May 2021 (UTC)

Tech News: 2021-19[edit]

15:10, 10 May 2021 (UTC)

Lower case naming of book by myself[edit]

Index:Some feudal coats of arms.djvu - I think I should have used caps when creating this and I don't know how to fix it. It's because I saw it in lowercase in the WorldCat books catalog, but also the name is sooooo long. I think it should just be Some Feudal Coats of Arms. Can someone fix it. I'm sorry. --The Eloquent Peasant (talk) 21:43, 10 May 2021 (UTC)

@The Eloquent Peasant: the name of the file (and therefore the index page) isn't really important as long as its not actively misleading. Just update the link to mainspace to be Some Feudal Coats of Arms. No need up be sorry :) Inductiveloadtalk/contribs 22:00, 10 May 2021 (UTC)
Thank you! --The Eloquent Peasant (talk) 23:08, 10 May 2021 (UTC)
@The Eloquent Peasant: and the thing is that the standards used by cataloguers change so both are right at different times. So for us both forms are right, just ensure that you create redirects between the forms. In the end, as Inductiveload said, as long as it can be found as an accurate title, then no issues. We rile about the use of all capitals as that is just butt ugly. — billinghurst sDrewth 00:04, 11 May 2021 (UTC)
I have 1 GF. My husband says shouldn't come around 'cause she's butt ugly - ... but I like her and besides, while he's not lying, it's mean to say she is. She's a great person oh except she likes to drink too much. So for that reason, I keep her at bay and also because she lives 4540.9 miles away. All caps is crazy for a book name and all this makes me realize I should have more than just 1 GF. :) Thank you. :) --The Eloquent Peasant (talk) 00:10, 11 May 2021 (UTC)

I have .png and .jpg images for the book and have uploaded both (38 files) as .jpgs and as .png files but now I don't know which one would end up being used in the book here on Wikisource. I cropped and reuploaded one .jpg but the book has tons of images. Which is preferable here, .jpg or .png? See here: [[c:Category:Some feudal coats of arms (Book) The .pngs are a little big. Thank you. --The Eloquent Peasant (talk) 01:57, 11 May 2021 (UTC)

@The Eloquent Peasant: If you are talking page illustrations, we are file type agnostic and more interested in the output. Probably the best guidance of which is better in which situation is c:Commons:File types, and as while their primary use is the transcluded work here, their re-use anywhere should be more in your thinking. — billinghurst sDrewth 02:27, 11 May 2021 (UTC)
@Billinghurst: ok. --The Eloquent Peasant (talk) 02:29, 11 May 2021 (UTC)
@The Eloquent Peasant: in this case, since the images come from a source that is already compressed with a JPEG-like compression scheme (the IA usually uses JP2, but the idea is similar) there is not a lot of value in PNG. This is because PNG expends a lot of file size on slavishly recording all the pseudo-random noise produced by the JP2 compression, which is pretty much incompressible under the PNG format.
However, if you were to extract the images and clean then up so that they are black-on-white diagrams and the image noise is removed, then PNG would be a good choice. This is because the sharp edges between colours (i.e. black and white) represents "high-frequency" image data, which produces substantial image noise when compressed lossily as JPG. For example, below is the image noise (coloured red) that is introduced when a diagram is saved as a JPG:
April 01-40N-2100-Fieldbook of Stars-025 - JPEG noise.png
Also, since the colors are limited in a greyscale, you may find that a greyscale PNG of a cleaned diagram is roughly comparable to a JPG anyway (YMMV, there are too many variables to make a global statement here).
tl;dr rule of thumb: JPG for photos and things that are already JPGs, PNG for "clean" diagrams. Inductiveloadtalk/contribs 07:18, 11 May 2021 (UTC)
@Inductiveload: TY. I'll try to upload vivid, clean files and name all the files so it's easy to know what page they belong to (for eventual use in WS). I found a better scan of this book where the pages don't look yellow. --The Eloquent Peasant (talk) 15:03, 11 May 2021 (UTC)
@The Eloquent Peasant: note that often the yellowish files are actually a better starting point for your own extraction of the image than the black/while ones. The black/white ones are generally automatically processed fairly brutally to remove the paper colour and then recompressed heavily and this often produces a fairly rough image that might look cleaner from a distance, but on closer inspection will be found to have lost a lot of detail. H:EXTRACT has some details (but still needs a lot of expansion). Inductiveloadtalk/contribs 15:58, 12 May 2021 (UTC)
@Inductiveload:That's exactly what I just surmised! I'm starting with the yellow .pdf pages and from there will clean up the files. The .jpgs that were extracted sometimes have entire sections of the image missing. I appreciate the guidance. I'll end up with .pngs and a smaller pixels size and blackish and whitish images that may not be perfect, but good enough! :) --The Eloquent Peasant (talk) 16:10, 12 May 2021 (UTC)
@The Eloquent Peasant: Just checking that you are aware that you can get the best quality individual file images easily from IA. If you look at something like https://archive.org/download/macliseportraitg00macl_0 click and open the VIEW CONTENTS and each page in the work is available right there for download. Usually far better than any manipulated PDF. — billinghurst sDrewth 02:24, 13 May 2021 (UTC)
@Billinghurst: I did spend quite a bit of time at that place and I searched through the different files. I downloaded what I thought would have the images, .jp2 for example but couldn't open. I did try there, but had no luck - not sure why. --The Eloquent Peasant (talk) 02:28, 13 May 2021 (UTC)
I downloaded GIMP for handling JP2000 images, and Inductiveload has some importable filters in his subpages. Though I will note that I am pretty rubbish in graphics tools, beyond the absolute basics. — billinghurst sDrewth 02:30, 13 May 2021 (UTC)
I do have the files in .png format and it's easy to work with them. The difficulty will be on pages with 12-17 images since each will have to be cropped, saved and uploaded. the worst part about is that the book doesn't give them good captions like "Fig. #" so I'll have to name them like Pg 27 img 1, Pg 27 img 2 etc. In total its about 2000 images or so the author states. So, I have pretty good organization skills but I'll soon be a little bogged down with real life. I almost downloaded GIMP. -The Eloquent Peasant (talk) 03:09, 13 May 2021 (UTC)
@The Eloquent Peasant: I finally wrote up a guide to how I use GIMP to do this: Help:Image extraction/With GIMP. Your images are much cleaner that the example, but the process is the same.
Also, the GIMP script Billinghurst refers is User:Inductiveload/Remove-background-colour.scm, which I think will work well for these images: they have nice even, light paper colours and dark ink. If the script works, it's a one-click solution. It doesn't always work, though. Inductiveloadtalk/contribs 03:26, 13 May 2021 (UTC)
Okay. I'll download and try GIMP and (the importable filter). If it works for me, "Some feudal coats of arms" project may be in better shape sooner rather than later. --The Eloquent Peasant (talk) 12:18, 13 May 2021 (UTC)

@Inductiveload: Just an FYI, I like how this one turned out. (Rather than try learn that program we talked about (which reminds me of Pulp Fiction), I've decided to just use my Photoshop)... In Photo Shop, The "Artistic" / "Poster Edge" filter worked well on the last one I tried. What do you think? I like it because it's sharp, compared to the other ones I did. I'm not an expert on images, and I hate to practice on the little guys but, in essence, that's just what I'm doing. --The Eloquent Peasant (talk) 20:22, 14 May 2021 (UTC)

But also, the filter that works on one image, will not work as well on another., yet they should look consistent from one page to the next.... I think the best use of my time is get the images to look pretty good, upload them and map them to the pages... and it's a collaborative work so someone might improve the images in the future. Have a nice weekend! --The Eloquent Peasant (talk) 20:51, 14 May 2021 (UTC)
@Billinghurst: Thank you again. What you do here is really amazing! Anyway this "Some coats of arms" Project is too difficult for me but I'll cleanup, and upload images to Commons and create shell pages in WS. The text review by the Optical Character Recognition software doesn't work on that book at all. Thanks again and take care! --The Eloquent Peasant (talk) 21:27, 14 May 2021 (UTC)

Title for a Copyright Office letter[edit]

I have at User:BD2412/Affirmance of Refusal for Registration (Prancer DNA Sequence) an untitled letter from the Copyright Office to a copyright applicant. The letter indicates who it is from and to, and has a correspondence ID, but I'm not sure how to title it for the move to mainspace. Wikisource:Style guide appears to offer no guidance on this. BD2412 T 04:33, 11 May 2021 (UTC)

Not sure either, I don't know enough about patents and their form of letters. Does it have a common usage name? I think that there should some indication of the target to of the letter, and something to indicate the source, or a date. As this presumably sits in a series of letters, I think some consideration to how such letters look in a series, for the work itself, and also how the authority acts, as we are setting some indicators for future works. Consider some redirects too if part of a series. — billinghurst sDrewth 00:17, 12 May 2021 (UTC)
I suppose if it were in a citation the title would be Copyright Office letter from Robert J. Kasunic to Howard Simon affirming refusal to register the "Prancer DNA Sequence" (February 11, 2014). BD2412 T 15:30, 12 May 2021 (UTC)
That may be a tad long. If we have multiples of these in the series; or had others from the Office to another, how& what is suitably unique and distinguishing? — billinghurst sDrewth 01:36, 13 May 2021 (UTC)
I suppose Copyright Office letter affirming refusal to register the "Prancer DNA Sequence" would be sufficiently unique and identifying. I don't have any other documents in this area, but this one is of particular interest in modern times, given that the determination that copyright does not extend to DNA sequences means that you can't copyright the mRNA vaccine sequence. BD2412 T 07:22, 14 May 2021 (UTC)

Wikidata integration[edit]

Please see d:Wikidata talk:WikiProject Source MetaData#Wikisource integration project, requesting assistance for documentation + implementation. It reminded me of our recent discussion at Wikisource:Scriptorium#Google often unable to find works at Wikisource. Whatamidoing (WMF) (talk) 15:34, 11 May 2021 (UTC)

Introducing {{Annotate QID}}[edit]

For some time I have been noting the Wikidata IDs of people mentioned in works I'm transcribing, who have no author, portal or Wikipedia link, using HTML comments, thus: John Moss <!-- Q27478536 -->

I mentioned on Twitter today that I wanted a template for this, so that the terms are more clearly and semantically tagged, and User:Mfchris84 kindly implemented the idea. I have imported it as Template:Annotate QID (shortcut: {{aqid}}), used thus: {{aqid|Q27478536|John Moss}}.

The name John Moss in this sentence is an example of its use.

It is therefore now possible to write a Wikidata query to interrogate a work on Wikisource, and produce (or process) a list of people or things mentioned in it. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:47, 11 May 2021 (UTC)

There's also {{wdl}} which will link to, in order of preference, the first found of: an author/portal, a Wikipedia page, a Commons category and finally to Wikidata (via Reasonator) for a given Q-ID. Inductiveloadtalk/contribs 19:31, 11 May 2021 (UTC)

Question Question For both of these usages we need to explore how they work in linking policy, and annotation policy, and improve our guidance accordingly. What is the ultimate outcome? How does it improve our site? Where do we expect it used? How much do we expect it used? Both within a work, and withina page? What are our limitations in its usage? Once per page? per chapter per work? When do we annotate vs when or link? [We have a lot of non-fiction works] Also how do both work in the export realm? — billinghurst sDrewth 00:22, 12 May 2021 (UTC)

Wikisource:Requests for comment/Wikilinking policy mentioned it. Where does that stand? @Beeswaxcandle:?
WRT to export, they're "just" links, so they export as links. From an ebook, there is no difference between link to Wikisource and a link to Wikipedia or Wikidata other than the URL - both would prompt the user "do you want to visit this location". Depending on the reader that might or might not work (for a start, it needs network access which not all readers have).
Another issue is that nearly all reader devices have no concept of "hover", so using the title attribute for anything is going to be functionally invisible in the end result. This problem also affects {{SIC}}, {{tooltip}} and friends. Inductiveloadtalk/contribs 15:45, 12 May 2021 (UTC)
@Inductiveload: The new version of the policy should be ready for release next week. Still juggling the various thoughts and coming to a conclusion. Beeswaxcandle (talk) 07:48, 13 May 2021 (UTC)
Not concerned about tooltips generated. SIC as it is, is just meant to be an indicator to the reader who may edit there and then. It is meant to be non-vital information, so never been an issue to not export. — billinghurst sDrewth 01:41, 13 May 2021 (UTC)
Should any template like "Annotate QID" have "tooltip" as an underlying component if we are using it to annotate? Even if we have to fiddle with tootip itself. — billinghurst sDrewth 01:43, 13 May 2021 (UTC)

"by nationality" categories[edit]

Seeking feedback on how many author "by nationality" categories are necessary, and then the lower spread beneath those.

Here is what we have

I can see the value in some of these especially as all authors would normally be added to a category below "Authors by nationality", but all of these seems like duplication.

I will also preface the commentary that something like and its kin are ambiguous in title as they are author: ns pages only, and if we keep them they either need to be policed or renamed to indicate author only. Taking biographical entries down that far is also seems somewhat exceeding requirements.

Anyway, asking what the community is trying to achieve with that categorisation, and where we want to go with it. Thanks. — billinghurst sDrewth 00:07, 12 May 2021 (UTC)

Use of notes= for annotations[edit]

Is there community concensus for this edit? I always thought notes= is the right place for annotations. Hesperian 04:38, 12 May 2021 (UTC)

That does indeed look like an appropriate use of the notes field to me. Xover (talk) 04:48, 12 May 2021 (UTC)
Always has, always will be. How else have we ever added {{wikipediaref}} statements in situ. I have reversed their edit. — billinghurst sDrewth 09:59, 12 May 2021 (UTC)

Removed the "proposals" hatnote from Wikisource:Annotations[edit]

The document has been long in place and been pointed to and treated as de facto policy for an extended period, and not been targeted to be removed. So I have removed {{proposal}} which allows it to be a clear guiding document. If there is any dissent, then please revert that edit, and tell us what is clearly problematic with the document and how it should be otherwise actioned. Thanks. — billinghurst sDrewth 10:04, 12 May 2021 (UTC)

Thanks a lot, billinghurst, you have asked if I have an other point of view to add to this discussion. Well, my point of view is: Wikisource:Annotations is exactly what I would have liked to write myself!
I am translating it now into French in order to complete it with French wikisorcerers' additions, if any, to the French Wikisource corresponding page, and will tell you of the result.
Many thanks and many regards here,
--Zyephyrus (talk) 19:58, 14 May 2021 (UTC)
Then I am confused Zyephyrus. If you think that it is okay, why shouldn't we remove the proposal tag, and accept it as our current official guidance? — billinghurst sDrewth 00:54, 16 May 2021 (UTC)

FYI: Wikisource-bot has stopped archiving[edit]

A note to those who watch for archiving around the place. The bot stopped working April 30th, and I am seeking assistance to get the script investigated and the issue resolved. Bear with us. — billinghurst sDrewth 06:08, 13 May 2021 (UTC)

Yes check.svg resolved — billinghurst sDrewth 11:03, 13 May 2021 (UTC)

RfC: Nomenclature for categorisation of person portal pages[edit]

Back to the people categorisation nomenclature ...

Background

Two previous nomenclature conversations about authors, and biographical works/entries

so authors pages are now "occupation as authors" and "biographies of occupation"

Top of the this part of the tree is {{:Category:Occupations]]

So where does the community wish for person portal pages to be categorised? To their own subcategory tree?, eg.

  • Category:Biologists
    • Category:Biologists as authors [Author ns:]
    • Category:Biographies of biologists [Main ns]
    • Category:(something about portals and biologists) [Portal: ns]

or is there a wish to combine them with an existing category? Not categorise? Or just leave them at the Biologists level? There are will never be a lot, though it maybe look a little unusual.

If it is its own category, what do you wish to have it called?

  • Portals of biologists
  • Biologists portals (grammatically incorrect or we can add grammar)
  • (your suggestions)

At this stage people portals are only generally only categorised to Category:People in portal namespace currently sitting in the 180s, and most through use of template:person (special:whatlinkshere/Template:Person) though not all. Still some transfers to do by anyone interested (petscan:19073593).

Thanks for your feedback. — billinghurst sDrewth 12:41, 13 May 2021 (UTC)

Woman in Art[edit]

This book is listed with a year of 1900. I knew this to be incorrect, as I have observed numerous later (pre 1926) dates being mentioned in the text while proofing it. Now on page 175 I see the date 1927 mentioned. Is this work now a copyright violation, and if so does it need to be withdrawn from the site? If so can the work I have already done be archived until it becomes out of copyright? Thanks Sp1nd01 (talk) 21:46, 13 May 2021 (UTC)

@Sp1nd01: there's no date, publisher or location mentioned (WorldCat agrees). This copy was signed by the author in 1930, so that puts it c. 1927–1930. And since Armstrong was American and presumably this is an American book (also considering the WorldCat holdings are all in the US), and there's no copyright notices, I think it would be covered by {{PD-US-no-notice}}. Inductiveloadtalk/contribs 21:57, 13 May 2021 (UTC)
Great, thank you for checking it out! I'll continue working on it unless told otherwise. Sp1nd01 (talk) 22:15, 13 May 2021 (UTC)

Template:author link and shortcut => substitution[edit]

I am asking that we reconsider how we use Template:Author link and its shortcut {{al}}. The template makes disambiguation (automatic / semi-automatic / tools) more difficult so I would like for us to consider either making it a requirement to substitute it or we have a process to go through and substitute it by bot. — billinghurst sDrewth 01:04, 14 May 2021 (UTC)

Old English texts: potential for a project?[edit]

I've been looking at the English literature portal and it looks like there could be a project to input pre-1926 editions of Old English works on Wikisource. There ought to be a Portal for OE literature which can list all the texts already uploaded; then, the aim of the project would be to input at least one edition for each text. Lists of texts of the OE corpora can be found in the respective poetry and prose category templates on Wikipedia. A lot of these are available in HathiTrust, but there is no one readable (as opposed to purely searchable) corpus of Old English works

The Category 'Old English works' is also a hodge-podge of editorial anthologies, translations into 'revived' Old English (which don't seem to be indicated as such as to distinguish them from real historical documents) and unsourced, contextless poems. So another aim of the project would be to reorder the portal along the lines of the Wikipedia template (i.e. organised on a thematic, historical or Manuscript basis).

Thoughts? unsigned comment by Rho9998 (talk) .

@Rho9998: Wikisource:WikiProjects are able to be started by anyone; they are coordinating hubs for people of like interests. Help:Portal will guide you on portals, and they are generally aligned with a subject, so no issue from my PoV. For us categories are generally less populated than elsewhere, so we tend to not drill down too far until truly required—lots of near empty categories aren't useful. I would rather see 1 category of 40 items, than 20 categories of 2 items. Also noting that Category:Bright's Anglo-Saxon Reader additions is not usual for us, we tend to only categorise the root page in such a circumstance. A WikiProject is a great space to discuss and coordinate categorisation for such a subject. — billinghurst sDrewth 06:45, 17 May 2021 (UTC)
@Rho9998: I think an Old English Wikiproject is a great idea. The Old English section of Portal:English language certainly needs expanding, and if it expands over, say, 10 items, or gets thematic or other divisions, Portal:Old English should probably be created (containing both literature and language unless those need further splitting, which I think is unlikely for now). It's certainly odd that we have Portal:Old Norse literature but nothing at Portal:Old English!
I have no domain knowledge of Old (or Middle) English, so I can't do much other than provide moral and technical support, so you'll have to do the subject legwork (or find someone who does know their thorn from their wynn to help!)
If you choose a fairly "easy" text (one that doesn't need too much esoteric formatting or detailed domain knowledge of Old English, and is fairly short), it could also be a candidate for the Wikisource:Monthly Challenge.
If you would like a text or two from Hathi, let me know. If you want a lot of texts from Hathi, that's OK too, but I'd like some information from you in that case. Inductiveloadtalk/contribs 09:07, 17 May 2021 (UTC)

Tolstoy (Wiener)...[edit]

We seem to have 2 incomplete sets:-

Can someone make a decision and upload one complete set in a consistent format? Thanks :) ShakespeareFan00 (talk) 10:01, 17 May 2021 (UTC)