Wikisource:Scriptorium

From Wikisource
Jump to navigation Jump to search
Scriptorium
The Scriptorium is Wikisource's community discussion page. Feel free to ask questions or leave comments. You may join any current discussion or start a new one; please see Wikisource:Scriptorium/Help. Project members can often be found in the #wikisource IRC channel webclient. For discussion related to the entire project (not just the English chapter), please discuss at the multilingual Wikisource. There are currently 453 active users here.

Announcements[edit]

Proposals[edit]

Collective work inclusion criteria[edit]

[This is a proposal stemming from the #Policy on substantially empty works section below.]

Since there has been no more input for a month, here we go. This is only a proposal, so any part of it can be changed, or the whole idea rejected. Inductiveloadtalk/contribs 10:58, 25 August 2020 (UTC)

Inclusion criteria for articles[edit]

Some works are composed of multiple parts that can stand alone as independent pages. These works are generally encyclopedias, biographical dictionaries, anthologies and periodicals such as magazines and newspapers and so on. Such "collective works" have slightly different criteria for inclusion in the main namespace. The aim of these criteria is:

  • To allow individually-useful articles, or sets of articles, to be transcribed to the main namespace without requiring active transcription of hundreds of pages of unrelated articles
  • To nevertheless make it easy for other users to "drop in" and add more articles to the work.

To be eligible for inclusion, a component of a collective work (e.g. a single magazine article), should satisfy the following criteria:

  • The component should be "non-trivial" in scope and importance. For example, only a title page or single-paragraph "notice to subscribers" in a magazine is unlikely to be considered useful on its own. However, it would still be part of a full transcription of the rest of the parent unit (e.g. a magazine issue).
  • The work should be scan-backed.
  • Main namespace pages should be created for the work at the top level and any intervening levels (e.g. Volume and Issue/Number ranks should exist). Sometimes, the Issue/Number rank redirects to a section on the Volume page.
  • Front matter of each intervening level the "parent unit" (e.g. a magazine volume and issue) should be transcribed and transcluded
  • A table of contents is required for the parent unit in question. Use {{AuxTOC}} if the original work doesn't contain a TOC.
  • Appropriate infrastructure around the work should exist. This might include internal plain link templates ("lkpl"), dedicated article link templates for use on author pages, formatting templates for repeated formatting elements, etc. All templates should be fully documented.
  • The article should be linked to from any relevant author pages and suitable portals
  • Oppose. An article is a complete work. The only requirement for inclusion should be that it actually is an article. This proposal would result in, for example, the deletion of huge numbers (at least hundreds) of perfectly good short stories and similar articles created over more than a decade for no good reason. I can see no reason for demanding every piece of front matter, which might consist of large quantities of indexes, adverts and other material of no great importance but massive bulk and technical difficulty. Insisting on scan backing would be extremely damaging if a particular article is or should be used as a source for Wikipedia. The need to provide online copies of sources to maintain and improve Wikipedia is overwhelmingly more important than the luxury of scan backing. Requiring the creation of templates would be a crushing burden, because most people do not know how to create them. It is in any event wholly unecessary. Whether the article is linked to is irrelevant to inclusion. I can understand the desire for a main page that links to the article (and even that would take a lot of effort to effect in some cases where a lot of articles have already been created), but the rest is just obstructive. The problem with this proposal is that it would create a massive crushing burden that is wholly unecessary and produces no useful benefit to the project or readers. It is burdensome restrictions for the sake of restrictions. James500 (talk) 20:18, 29 August 2020 (UTC)
  • Support. Without a system like the one you have described in place, sub-pages of works could be created wantonly without any means of completing the works from which they were derived. If an article, which is a selection from a larger work, is created without any infrastructure, it will be very difficult for other Wikisourcerors to complete the work which has been started, as they will have to find and upload a scan and set up the complicated not-article material without the aid of the person who created the first article. The new system will also make it easier for other contributors to work on smaller parts of a larger work, without worrying about demanding formatting concerns. TE(æ)A,ea. (talk) 12:30, 30 August 2020 (UTC).
    • Content creation should not be described as "wanton". There are means of completing the works from which the sub-pages were derived. If an periodical article is created without so-called infrastructure, it is very easy for other Wikisource editors to complete the work which has been started. It only becomes difficult when someone goes on a deletion spree. And it is massive numbers of nominations that cause problems. James500 (talk) 18:33, 30 August 2020 (UTC)
      • This page is a fine example of what I refer to. A novel contributor, with no previous involvement with this work, or one like it, would have to generate an entire system for reproducing (transcluding) articles from that work. The example I provide is more complete than other pages, and is much more complete, in relation to the whole work, than a single article. It would be very difficult to add to larger works, where the basis is merely articles or other pages in the state of which I complain. TE(æ)A,ea. (talk) 21:21, 30 August 2020 (UTC).
        Oh sheesh is that happening again. Fully agree with you TE(æ)A,ea that it is wanton and of little value. That content does not belong in main namespace. Main namespace is for transcribed work. Constructs and curation belong in portal namespace. I have created the portal and moved the non-mainspace material. — billinghurst sDrewth 23:17, 30 August 2020 (UTC)
        • That page was created more than a year ago. Nothing is "happening again". You did not move the bibliographic information from the mainspace page to the portal. I had to add it to the portal myself. If that important bibliographic information had been deleted by mistake, that is an example of how seriously disruptive the proposed deletion criteria could be. The word "wanton" is needlessly offensive. The primary meaning of the word "wanton" is "sexually promiscuous" and it is applied to other things by analogy. Please do not use that word. James500 (talk) 00:49, 31 August 2020 (UTC)
          • what's "happening again", is the periodic pearl clutching of the deletionists, who are opposed to an open project, and seek to provide a tl;dr of the "one right way" to do transcription. if a text is useful, and people can work to organize it, then we should include it. put a maintenance category, and move on. making up exclusion rules is a waste of time with the prospect of a growing backlog, or filters turning away newbies. take a look at german wikisource, if you want to know how that turns out. [1] Slowking4Rama's revenge 21:38, 1 October 2020 (UTC)

Pictogram voting comment.svg Comment @Inductiveload:

The proposal, as is, would require inhibit the ad hoc transcription of articles from "The Times", eg. The Times/1914 and things linked from {{The Times link}}. Is that in or out of scope for your proposal? Maybe there should be a declaration of some governing principles first. What is looking to be achieved, and indications of what is trying to be stopped. Then we can get onto a structure. I know that we created {{header periodical}} to capture where we have more sporadic collections of articles from newspapers. [Now I could be convinced that such constructions are better to be in the portal namespace rather than main ns.]

Some examples of pages considered problematic would be useful for context. If the proposal is an effort to have articles from a periodical becoming part of a hierarchy of the periodical, ie. subpages, then YES, I fully support that, in contrast to a random root level pages without context to the publication. If the proposal is to set up a fully qualified structure for every periodical where we just want to reproduce one article, then NO. This is self-interest as I regularly want to reproduce an obituary for an author to establish biographical information and we are never going to get all that requisite newspaper construct data, and we are virtually never going to get the scans.

For any newspaper article I have transcribed I will generally do "Periodical name/YYYY/Article name" to give it grounding, and the article would have some "notability". The Times I did an extra hierarchy level. I will accept that there will be early works that I transcribed that may be incomplete by that standard and I would not transcribe them that way today. — billinghurst sDrewth 15:31, 30 August 2020 (UTC)

To be eligible for inclusion, a component of a collective work (e.g. a single magazine article), should satisfy the following criteria:

  • The component should be "non-trivial" in scope and importance. For example, only a title page or single-paragraph "notice to subscribers" in a magazine is unlikely to be considered useful on its own. However, it would still be part of a full transcription of the rest of the parent unit (e.g. a magazine issue).
  • The work should be scan-backed.
  • Main namespace pages should be created for the work at the top level and any intervening levels a suitable, logical subpage hierarchy developed (e.g. Volume and Issue/Number ranks should exist). Sometimes, the Issue/Number rank redirects to a section on the Volume page.
  • Front matter of each intervening level the "parent unit" (e.g. a magazine volume and issue) should be transcribed and transcluded
  • A means to navigate the subpages of the work is required; a table of contents is preferred, though alternatives exist. A table of contents is required for the parent unit in question. Use {{AuxTOC}} if the original work doesn't contain a TOC.
  • Appropriate infrastructure around the work should exist. This might include internal plain link templates ("lkpl"), dedicated article link templates for use on author pages, formatting templates for repeated formatting elements, etc. All templates should be fully documented. (additional) Parent template exist to make this readily easy.
  • The article should be linked to from any relevant author pages and suitable portals; (additional) orphaned pages are not acceptable.
    • If an article is orphaned, that is certainly a reason to add links to the relevant author page or portal. It is not a reason to delete the article. Issues that can be addressed in a very straightforward way by adding links to other pages are not suitable for use as deletion criteria. Why would you delete the page instead of just adding the links? This kind of thing belongs in a style guide. I suggest the words "eligible for inclusion" are the problem with some of these criteria. James500 (talk) 01:33, 31 August 2020 (UTC)
      We are wanting to get people to link. We don't delete a work for lack of a linking, we are not that petty. What that criteria does is limit the transcription and addition of the trivial, linking indicates that it requires some relevance. — billinghurst sDrewth 14:57, 31 August 2020 (UTC)
    • @Billinghurst: I mostly agree with your formulation - that's more flexible in the case of newspapers. @TE(æ)A,ea.: has already given an example, but there are several more examples in the #Policy on substantially empty works below.
    • I do still think we should be requiring the front matter, but perhaps only when we have scans. Usually, it's just a title page or issue banner, it usually provides the date and number as in the original and it prevents the main-space page being just a floating TOC: e.g. The Chinese Repository/Volume 1 and The Chinese Repository/Volume 1/Number 1, versus, say, The London Quarterly Review/39 (which doesn't have a scan, so it's kind of fair enough in this case, but if it had a scan, it should get the front matter).
    • I was going to disagree with the removal of the scan section, but if it is downgraded to "if possible", since the current global policy is pretty much "scans if at all possible", it doesn't need to be repeated.
    • For clarification: by "Parent template exist to make this readily easy." do you mean things like Template:Authority/lkpl? Inductiveloadtalk/contribs 11:11, 31 August 2020 (UTC)
      I was meaning template:article link primarily as it is more what we have used for journals. template:authority/link is more aligned to dictionaries and the like. But yes, one of those as the parent template, or used directly. If we have a scan, then yes to front matter, so we can qualify in the regard of its existence.
  • I have a question; let's take Golfers Magazine. I expect that there will be exactly one article ever transcribed from this--Ask the Egyptians, by Rex Stout, an obscure short story by a not so obscure author. I'm glad to provide scans; I think we should demand scans for stuff that wasn't originally published digital. And it will get tucked under a Golfers Magazine/Volume 28/Issue 3/Ask the Egyptians. But how much work do you expect here? I would begrudgingly create a ToC for the issue, but messing with templates seems completely unnecessary.--Prosfilaes (talk) 14:03, 31 August 2020 (UTC)
    Personally I think that scans are nice, maybe preferred, not mandatory. Sometimes getting scans is either not possible, or just problematic. I have numerous newspapers to which I can get access through subscription sites, but producing scans to upload is just MEH! especially if I just want an obituary reproduced. (Noting that where I just want a rough transcription or a snippet that these days I put it on an author talk page.) Have a poke at Category:Obituaries for a range sources that myself and others have used.

    For your example, I would have gone for "Golfers Magazine/YYYY/article name" and then slapped down {{header periodical}} at the root level, as we get more years, then we can break it down further. — billinghurst sDrewth 14:57, 31 August 2020 (UTC)

  • @Prosfilaes:, what I think would be nice here might be:
    • The top level page, pretty much as it is. Doesn't look like there's much more to say about this work.
    • I can't really see any sensible templates (note "might include" in the proposal) to create for this work. It's not a dictionary so it doesn't obviously need a lkpl, and it's not big enough to merit an article link template of its own. Perhaps if all the headers are identical, there could be a formatting helper, but not critically needed.
    • Personally, I'd like to see the cover if there is one and it's "nice" like this one (obviously not a library binding), and the issue header on the issue sub-page, but I can see the argument that it's a bit pointless if there is no intention to transcribe the rest of the issue. The TOC (which already exists in the original work) is something I'd prefer to see if possible, but I do get that it's a bit of an imposition in this case, where only one article is "interesting".
    • A list of the known scans somewhere (90% of periodicals seem to do this in the mainspace, but that's evidently controversial). It looks like Hathi has an incomplete list and the IA has another Google-fied copy of v.12, so in this case probably just what Hathi has. A lot of the time a mish-mash is needed to get a set of links. Uploading is strictly optional - obviously preferred, but we all know how much of a pain it is, and page-listing and checking periodicals is pretty masochistic, so it's absolutely not needed.
    • Again personally, I prefer "Golfers Magazine/Volume 28/Issue 3/Ask the Egyptians" than "Golfers Magazine/1916/Ask the Egyptians" since we might as well put things in the correct place ahead of time and it provides the obvious place for things like front matter. But I know that's not how it's always done, especially for newspapers where the content is often even more sparse, proportionally speaking, than magazines. Inductiveloadtalk/contribs 15:54, 31 August 2020 (UTC)
      @Inductiveload: If we can get that data, then that is definitely preferred, and I would think that for journals we would encourage it. For newspapers, I doubt that we are going to get the coverage, and they are just a lot harder due to how those beasts are constructed. Probably a case of differing guidance, and difference tolerances. — billinghurst sDrewth 14:23, 20 September 2020 (UTC)
  • Caveat: I was hoping I would find the time to really dig into this and contribute something with some thought behind it, but I keep being disappointed, so instead I'm just going to do the drive-by thing. Sorry!
    I Symbol support vote.svg Support Inductiveload's proposal as written. I disagree with Billinghurst's proposed softening, in particular regarding scans. We need to start getting a hard scan requirement (with the obvious exceptions) into policy, and partial works like this is where the requirement is most urgent as it is a de facto requirement for other contributors to be able to work effectively on completing the work. I am open to, and lean towards, removing the templates requirement. Templates are very hard for most people, and a somewhat tall order even for long-term Wikimedians, and I don't consider bespoke templates to be a critical factor.
    I also support soft application of this policy, the same way we allow for {{incomplete}} and {{missing image}}. Billinghursts concern regarding gigantic efforts required for front and end matter (long tables of contents, indices, etc.) is a legitimate one, but I think this is better handled by softing application than softening the policy. If the text is put in a sub-page structure, is scan-backed, and the front matter is coarsely there, I can live with something like a hypothetical {{toc part missing}} or {{issue toc missing}}. With all the coarse structure in place, filling in detail is eminently doable by crowdsourcing.
    I also stress that I don't consider the establishment of this policy a bright-line immediate cause for deleting existing texts. I oppose an explicit grandfather clause in this policy, but I !vote in favour of it in the context that our practice is not to proactively mass-delete historical texts just because we raise the standard for quality. I do, however, expect that individual texts that do not meet this new policy will be proposed for deletion piecemeal over time, as people happen to run across them, with no progress toward meeting the standard, or are too pathological to fix (which should certainly be the first approach whenever possible). And my expectation is that in those discussions those texts will either be improved to comply with this policy or they will be deleted in accordance with this policy. I also very much expect contributors who disagree with this to express their disagreement politely and constructively: prioritising different factors (e.g. quality over quantity) is in no way shape or form cause for name-calling or ascribing ulterior motives to other contributors. --Xover (talk) 13:22, 22 November 2020 (UTC)

No-content mainspace pages[edit]

This one is probably even more controversial so it's a separate proposal:

Collective works are commonly referenced by other works. Due to this, it is permitted to pre-emptively create the top-level main namespace page to collect incoming links, even when there is no content ready for transclusion. This also allows labour-intensive research into location of scans to be preserved and presented to users even when no transcribed work has been completed. The following is required for such a work:

  • A header with a brief description including active dates, major editors, structure (e.g. series) and so on
  • Redirects from alternative names (e.g. when a work has changed name or is referred to by other names)
  • A listing of volume scans should be added, and it should be as complete as possible, based on availability of scans online. As always, creating Wikisources index pages is preferred, but external scans are acceptable.
  • Creating sub-pages (volumes or issues) should follow the article inclusion criteria. This means a sub-page should not be created if there is no content.
  • Oppose As above these restrictions are an unecessary burden that would produce no real benefit and presumably result in lot of deletions. We do not need lists of editors. We do not need a complete list of volumes. (There may be hundreds of volumes of a particular periodical that have scans. For example, a page with links to scans of twenty volumes should not be deleted because the creator failed to link to scans of another eighty volumes.) Lack of redirects is not a reason to delete these pages either. James500 (talk) 20:37, 29 August 2020 (UTC)
  • Support, mostly. Generally speaking, I think that if a periodical changed its name, then there should be a separate page under the new name; however, redirection pages from alternate titles would be preferable. The other requirements are not overmuch burdensome, and would make useful a page that is otherwise empty, due to a lack of transclusions. TE(æ)A,ea. (talk) 12:30, 30 August 2020 (UTC).
    • None of our periodical pages includes the names of the editors, as far as I am aware. Not one. Under this proposal, every single periodical we have would be deleted. Further, it is not possible to include the names of the editors when they are anonymous. James500 (talk) 18:24, 30 August 2020 (UTC)
      • @James500: "every single periodical we have would be deleted" - or we could make the effort to improve such works as we find them. Generally, an except from Wikipedia or some other source would do just to provide some context. E.g. The Condor vs The Journal of Jurisprudence, which has the dates, but not other useful info, not even the country. For example, even a quick trawl would allow to write something like "The Journal of Jurisprudence was a Scottish law journal published in Edinburgh from 1857 to 1891. The first successful Scottish law journal, it covered all aspects of the Scottish legal system and included editorials, biographies and short articles as well as case law and reporting of legislation. It merged with the Scottish Law Magazine in 1867. It was largely replaced by the Juridical Review in 1891.". The editors aren't particularly obvious here (so they're not "major editors"), but sometimes editors are important to the work's history and are explicitly noted, e.g. All the Year Round or The New-England Courant.
      • Basically, if a page has zero or near-zero transcribed content, in my mind it can edge over the line into acceptable as long as it's providing useful auxiliary bibliographic information, which might also include collation of various names. This is somewhere WS can actually provide value-add - nowhere else online, as far as I know, provides a venue for this information (IA/Google metadata is terrible, OCLC is not very good at periodicals, Hathi is not can't download easily, none are editable, often a complete scan list uses various sources, etc). However, "it was a periodical and here's a handful of raw external links, kthxbai" doesn't quite cut it, even for someone who thinks these pages can be useful like me.
      • I've said it before several times, but the aim here is not, not, not to get all the pages like The Journal of Jurisprudence deleted, but instead figure out what needs to happen to keep them. To me, a decent blurb and a tidy list of volumes and scans will do it, but that's far from consensus. As it stands, as far as I can tell, the only reason half of Portal:Periodicals isn't getting unceremoniously dumped into Portal space (something I personally would like to find an alternative outcome to) is no one really wants to deal with it. We can fix that by coming up with a minimum level which the pages should meet and then fixing them up. Inductiveloadtalk/contribs 12:37, 31 August 2020 (UTC)
    • @TE(æ)A,ea.: about the names, above is an example, where the The Journal of Jurisprudence absorbed the Scottish Law Magazine in 1867. Though technically after the merge TJJ became The Journal of Jurisprudence and the Scottish Law Magazine (e.g. here, but not the title pages), it was still the same work. So in my mind, we could have The Scottish Law Magazine running up to 1867 and then The Journal of Jurisprudence for 1857–1891, with notes about the merge in both headers.
    • Another example of a work that changed name, but remained the same fundamental work is Monthly Law Reporter, which was just The Law Reporter for the first 10 years, and even kept the volume sequencing over the name change (though it added a "new series" number). So The Law Reporter should probably be a redirect. Inductiveloadtalk/contribs 12:37, 31 August 2020 (UTC)
      • The Scottish Law Magazine [and Sheriff Court Reporter] was originally called the Scottish Law Journal and Sheriff Court Record. It has a page already which includes the volumes up to 1867. James500 (talk) 15:10, 1 September 2020 (UTC)
        • @James500: Then a link to it should have been in the description already. I have added it and expanded the description as above. Feel free to add more details. Inductiveloadtalk/contribs 15:50, 1 September 2020 (UTC)
  • Pictogram voting comment.svg Comment Periodical main namespace pages should not contain the curated information of scans, etc., that is the job of the Portal: namespace. Main namespace should only contain published information for works that we have prepared. So under your proposal, the main ns can exist, and it should contain contents of works that we have transcribed, and there should be a corresponding portal: or there can be a constructed Wikisource: project page where there is a project to do the work. This was discussed years ago, and we have been moving those constructs to portal namespace for years. If there is zero content at the page, and we are unlikely to have it, then it can be redlinked, or maybe if it is that obvious then we don't need a link at all, Examples would be useful. — billinghurst sDrewth 15:42, 30 August 2020 (UTC)
    • You are the only person moving these pages into the portal space. I would like to see a link to the alleged discussion you refer to. James500 (talk) 18:24, 30 August 2020 (UTC)
  • @Billinghurst: I personally don't see huge value in simply shunting just scan links to Portal and leaving them there:
    • It eventually leads to having two parallel volume lists, one with links and one without, sometimes with divergence.
    • It tends to end up with "scratchpad-level" content in Portal, which is supposed to be a nice presentation space.
    • Portals are badly integrated and will probably not be noticed by casual users, or even many Wikisource editors. Especially as the Portal headers never seem to actually link to the mainspace works that exist, but we can fix that.
  • I suggest Portals like Portal:Punch provide some useful value-add, whereas Portal:Notes and Queries does not (yet), and its current content, if anywhere, should be on a WikiProject, just on the mainspace talk page, or even nowhere now all the volumes are uploaded. If the consensus truly is to shunt this all to Portal and move back once there's content, then fine, but I do wonder if that's truly the most ideal strategy. From a pure "only reproduced content in mainspace" angle, perhaps, but does that serve readers best? Inductiveloadtalk/contribs
    @Inductiveload: Main namespace is content for the reader. There is nothing worse for a reader to go to a page and have to drill down multiple pages to find that there is no content just some dashed skeleton of hierarchy. Main namespace is not built to drive transcribers and transcriptions, that is our other content spaces. We can create a page there once we have content to display what we have to read, and point to the portal for what we have to transcribe. It is the reason we put in place the portal namespace. — billinghurst sDrewth 15:08, 31 August 2020 (UTC)
    I also wish to avoid the really ugly situation of people uploading a work, creating the front page, and then just leaving it for other people. That facadism of a work is just problematic, and we know that nothing happens to it. It is why we developed {{ext scan link}} and {{small scan link}} for use in the author namespace to do that role of managing that list build. So portal and author namespaces play that role and keep main namespace cleaner and more functional. — billinghurst sDrewth 15:15, 31 August 2020 (UTC)
  • @Billinghurst: I'm not say that we should be creating pre-emptive "empty" hierarchies. I'm saying that I don't really see the point of shunting all the scan links off to a portal where they will basically never be found by anyone who isn't extremely familiar with Wikisource and the mainspace/portal split. If a casual reader, is after, say, Volume 22 of The Atlantic Monthly, for which we have neither scans nor content, do we serve them better by placing a scan link to the IA on the mainpage next to the redlink so that can at least find what they wanted, or is better to have no redlink at all, skip Volume 22 in the list and maybe put the IA link at a portal? If the latter, I'm fairly certain 95%+ of people will just not find that link at WS. We can certainly adopt a stance of if it doesn't exist here, we don't even want casual readers to be presented with an external resource, but that seems slightly walled-gardenish for an open project.
  • "Facadism" is annoying, and it (or the perception of it) is what has brought us to this point via the proposals at WS:PD. As an example from that page, I don't find the concept of the page American Law Review intrinsically offensive in mainspace, even without any content (though perhaps it's a little untidy as-is), but I don't really see the point of American Law Review/Volume 1 as it stands (only a title page and redlinked TOC, though it's a single article away from being useful to me).
    • Notably, I find "facadism" of a collective work much less annoying than, say, only having the preface to a novel. Collective works can have individually-useful things slotted in bit by bit, and if there's a framework around the work, it's even easy to do.
  • And if we do want to ditch this proposal and be strict with Portals in this way, then 1) it needs to be documented that that's how it works (Wikisource:Portal guidelines and Help:Portals don't mention use of Portals for this purpose at all, they focus more on thematic curation) and 2) most existing periodicals need to be converted over: many people reasonably imitate of existing structures, we can't blame them for that.
  • And do we allow redirection from a non-existent mainspace page to the portal so it can be found via "normal" linking until such time as there is content? Inductiveloadtalk/contribs 17:09, 31 August 2020 (UTC)
  • The word "facadism" is needlessly offensive and should be deprecated in favour of something that doesn't sound like it refers to habitual dishonesty. I would urge that care be taken when coining neologisms to consider how these words might be taken. James500 (talk) 15:32, 1 September 2020 (UTC)
    What? It means that there is a face only. Nothing more. There is no offensive with it and I don't even see where you can draw that inference. You are digging to deep or looking for insult. Front-pageism is meh! So unless you can ind a better term can you please AGF. — billinghurst sDrewth 18:58, 1 September 2020 (UTC)
  • Symbol oppose vote.svg Oppose I disagree with Inductiveload's position, and agree with Billinghurst's (provided I have understood them both correctly, which is not a certainty). We should significantly raise the bar in this area for mainspace pages, and anything that is not a (part of) an actual published work should be shunted to other namespaces. I acknowledge the downsides to that approach that Inductiveload brings up, but I think we should find other ways to ameliorate those. I also agree that the main purpose in setting a higher bar is to have a clear and predictable standard for contributors to aim for to enable keeping a work, with deletion being an admission of failure (i.e. deletion is a sometimes necessary, but never a desirable, outcome). I disagree that shunting content to other namespaces is a bad thing, as it is a great way to preserve content that would otherwise be deleted. Maintaining clear purposes for the namespaces makes possible technical innovation in the long term, through better integration with Wikidata and similar measures. --Xover (talk) 13:44, 22 November 2020 (UTC)
@Xover: Re: Maintaining clear purposes for the namespaces: I think part of my problem here is that Portal namespace is overloaded with two kinds of content: curated "exhibition-style" information and a dumping ground for lists of links shunted from mainspace, where they are all-but-invisible to the average user. IMO, either all the "volume list" pages should be in one namespace or the other. For example, The Times and Portal:New York Times are basically the same thing, but one is in Portal space and one is not. And very rough lists probably should go somewhere in Wikisource-space if they're so rough they're not suitable for public display.
As an aside, this is somewhere I think a cross-namespace redirect (if the mainspace page doesn't exist yet) isn't a summary hanging offence. Inductiveloadtalk/contribs 17:33, 8 January 2021 (UTC)
@Inductiveload: I agree, I think, with all your points here; but I fall down on the other side of the line on them. Portal:'s purpose is a bit overloaded, but I prefer that to overloading mainspace's purpose. I think The Times and Portal:New York Times is a bit of a distinction without a difference, and I have no clear idea of what we'd actually put at The Times that would be materially different (in terms of the principles we're discussing here) from Portal:New York Times, but I'd rather have a bright-line rule for mainspace with common-sense exceptions (after community discussion) for works like The Times.
Or put another way, my highest priority is raising the quality bar for our main presentation namespace. I also care about the quality of other user-visible namespaces (Author:, Portal:, Translation:), and about practical issues like organization of work, findability of scans and bibliographic info for not-yet-proofread works, and barriers to entry and effort required to contribute; but all with a lower priority than maintaining quality in mainspace.
My immediate instinct regarding the duality of Portal: is not to "pollute" mainspace, but to find some good way to clean up Portal:. Typically by thinking up some better alternative along the lines of a new namespace or pseudo-namespace (like WikiProjects) for those purposes. Ideally with some form of technical innovation that would make that alternative desirable, not just tolerable, for the relevant stakeholders. Perhaps there's an opportunity for tooling to manage scan links and bibliographic data in a structured format, possibly even integrated with Wikidata? Overlapping with the WikiCite/Worldcat-killer/VIAF-replacement effort perhaps? This might even fit into a grander vision of tooling and integration for structured data on enWS, where everything that's in {{header}} in mainspace pages today would be editable in a GUI, backed by Wikidata, and inherently structured; and where we have a defined and tool-supported workflow from creation of Author: pages, populating them with works, adding scans, creating indexes, proofreading transcluding, promoting, etc. There is a lot of potential there, and a lot of it can be solved piecemeal: maybe a better alternative for "scan-list pages" could be the first piece of that puzzle? --Xover (talk) 09:28, 9 January 2021 (UTC)

Bot approval requests[edit]

Inductivebot[edit]

Hi! Could I please request the bot flag back for User:InductiveBot? I'm starting to thing about making a fix for the {{TOC begin}} family and that might need a bit of bot finagling to remove things like blank lines that will cause issues after the fix is made.

Also I'd like to use it for general maintenance task, moves, replacements, etc., like it used to do 10 years ago.

None of the tasks it would run are run constantly, they're started manually and supervised. Inductiveloadtalk/contribs 17:09, 8 January 2021 (UTC)

  • Pictogram voting comment.svg Comment Just noting that since this is a request for reactivation of a previously approved bot—and one with an extremely low potential for controversy or disruption at that—the bot policy allows for an abbreviated approval process rather than a full minimum 4 days discussion + 7 days trial period. It does require the flag be granted by a `crat though (so ping Hesperian and Mpaa). original (2010) authorisation, 2013 confirmation, and I think the flag was removed when we purged the inactive bot accounts in 2017 or thereabouts but I couldn't be bothered to dig it up just now. --Xover (talk) 09:47, 9 January 2021 (UTC)
Flag set. I did not wait long, given the history of Inductiveload and their bot here. In case of disagreement, please continue the discussion and the outcome will be considered, as per process.Mpaa (talk) 18:31, 9 January 2021 (UTC)
Thank you! Inductiveloadtalk/contribs 23:47, 10 January 2021 (UTC)

Repairs (and moves)[edit]

Designated for requests related to the repair of works (and scans of works) presented on Wikisource

Animal Life and the World of Nature/1903/06/Notes and Queries[edit]

Please move file:Animal life and the world of nature - Notes and Queries - Alice Foljambe - 1903-06.pdf and the two associated pages to "Animal Life and the World of Nature - Notes and Comments - Alice Foljambe - 1903-06.pdf". Apologies for the error. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:04, 3 January 2021 (UTC)

Other discussions[edit]

PD-anon-1923 again[edit]

The discussion of Happy Public Domain Day! has slipped into the archives without getting into some conclusion, so I would like to remind that the last suggestion in the above mentioned discussion was to create {{PD-US|year of death}} and deprecate {{PD/1923}} and {{PD-anon-1923}}. Is this solution OK?

BTW: if we decide to keep calling the license templates for pre-1925 works {{PD/1923}} and {{PD-anon-1923}}, it would be necessary at least to adapt the latter one so that it could be used for 1924 anonymous works too. --Jan Kameníček (talk) 16:21, 20 February 2020 (UTC)

Symbol support vote.svg Support the change — I don't really care but it makes sense —Beleg Tâl (talk) 16:36, 20 February 2020 (UTC)
  • Symbol support vote.svg Support likewise —Nizolan (talk) 01:54, 21 February 2020 (UTC)
  • Symbol oppose vote.svg Oppose because the name emphasizes US. The point of the templates is to cover both US status and international status. A template that names the US will cause confusion, especially to newcomers. --EncycloPetey (talk) 02:02, 21 February 2020 (UTC)
    @EncycloPetey: So under your opinion, fixing a math wrong do even require consensus? Without consensus we should believe 1+1=3 rahter than 1+1=2? --Liuxinyu970226 (talk) 01:37, 1 April 2020 (UTC)
    Changes to established templates require consensus. We've had previous discussions and the community is divided on the issue concerning these templates. Proceeding with a change when the community has expressed such division is inappropriate because of the community discussion, not because of my opinion. --EncycloPetey (talk) 02:05, 1 April 2020 (UTC)
  • Symbol support vote.svg Support. We are US-centric in our copyright approach. Given the number of times I've had to look up these type of templates here and on Commons, I might buy the idea that we should copy them, but otherwise, I think this is going to be as non-confusing as we get.--Prosfilaes (talk) 04:35, 21 February 2020 (UTC)
  • Pictogram voting comment.svg Comment In your proposal, how do we code the year of the author's death for anonymous works? --EncycloPetey (talk) 04:38, 21 February 2020 (UTC)
    I am afraid I do not understand the question: anonymous works do not have any known author. I propose that for anonymous works we would have a template with similar wording as {{PD-anon-1923}}, but it would be called {{PD-anon-US}}. --Jan Kameníček (talk) 09:42, 21 February 2020 (UTC)
    That's also problematic, because the US is just one place that we display license information for. The current template displays that information for both the US and for countries with 95 years pma. --EncycloPetey (talk) 19:46, 21 February 2020 (UTC)

Pictogram voting comment.svg Comment If there is a consensus to act, my recommendation is that we just move/rename the templates

  • pd/1923|yyyy -> PD-US|yyyy, yyyy=YoD, displays two templates as now
  • PD-1923 -> PD-US, where no $1 parameter it displays the one template
  • PD-anon-1923 -> PD-anon-US|yyyy, year of publication

and update the documentation around the place. Do any internal required tidying around internals of templates, and fixing double redirects. No need to deprecate anything, just move to the new nomenclature, and not worry about any of the old usage, or anyone continuing its use, as it matters not. — billinghurst sDrewth 11:15, 21 February 2020 (UTC)

  • Symbol oppose vote.svg Oppose Firstly, because of the US emphasis. Yes, we follow US copyright law, but we also serve an international readership, not to mention contributors who are also bound by the copyright laws of other countries. Secondly, I think replacing "PD-1923" with "PD-US" is confusing. "PD-US" sounds like a generic template for "this work is PD in the US", but under this proposal it would mean "this work is PD in the US for the specific reason that it was published more than 95 years ago". BethNaught (talk) 22:16, 21 February 2020 (UTC)
    I do not understand in what way "the readership" is concerned in this… They see only the text of the template which is going to stay the same. --Jan Kameníček (talk) 23:08, 21 February 2020 (UTC)
    Pictogram voting comment.svg Comment I do not think that the suggested name of the template is more American-centred than the old one. E.g. {{PD/1923|1943}} has got two parts: "1923" is the American part referring to the American copyright laws, and the parameter "1943" is international referring to the countries where PD depends on the year of death. Nothing would change, only the American part would be called "US" instead of the nowadays non-sensical 1923, I really do not see any problem in that. --Jan Kameníček (talk) 23:08, 21 February 2020 (UTC)
    @BethNaught: The thing is that the only consideration we give to copyright compliance with regard to hosting is to the US copyright. Unlike Commons, we don't really care whether it is copyright in the country of origin. It is for this reason that I am reasonably comfortable with just stating PD-US and variants. The additional PD-old-70 and variants are for information only. — billinghurst sDrewth 00:43, 22 February 2020 (UTC)
  • Pictogram voting comment.svg Comment I think this is an important issue, and I'd like to weigh in. I'm probably as familiar as (almost) any Wikimedian with the considerations around copyright law in various countries. But I do not see a clear statement of what the problem is that we're aiming to solve, or what the pros and cons are. I'm sure if I took an hour or two to dig through various archives, I could probably figure it out, but I'm not likely to have the time for that...nor should we expect every voter to do that. So given all that, I'm inclined to gently oppose, simply because I can't figure out what's going on, and it seems unwise to make a change that is difficult for community members to evaluate. Is it possible to sum up the issues more concisely so that I can give it more proper consideration, without having to do all the research myself? -Pete (talk) 22:44, 21 February 2020 (UTC)
    The problem I see is this: Until 1923 it made quite a good sense to have a template called PD-1923, because it referred to the fact that only pre-1923 works are in the public domain. However, the situation has changed, currently the time border is 1925-01-01 (or 1924-12-31) and it shifts every year. I perceive it as very confusing to call the template for pre-1925 works PD-1923 (why 1923???). At the same time it does not make sense to change the name of the template every year (PD-1923, …, PD-1925, …), it would be better to find a fitting universal name. --Jan Kameníček (talk) 23:16, 21 February 2020 (UTC)
    Ah, that's very helpful @Jan.Kamenicek:, thank you. I had misunderstood, I thought you were proposing a change to the functionality in addition to the name change.
    I agree that changing the name (a) such that it specifies "US" and (b) such that it references the 95 year rule, rather than the (now outdated) 1923 rule would be worthwhile. I agree with others that we should be cautious about US centrism; but the reality is, with a current title that assumes that it relates to US law, without stating it, we already have a high degree of US centrism in the title. In my view, it's better to state "US" as part of the name, to make it clear to editors (who are the primary audience for a template name) that it's about US law. So, my suggestion would be {{PD-US-95}} or similar. That conveys that it's about US law, and it's about the 95 year rule. Text on the template page/docs could clarify that the 1923 rule is now outdated, and subsumed under the 95 year rule.
    A related issue that I find confusing: I don't understand why we need two separate templates for {{PD-1923}} and {{PD/1923}}. I think this proposal only relates to the latter; would we be leaving PD-1923 intact? A decision on this is probably a matter for a separate discussion, but I'd like to know for sure what the intent of this proposal is. -Pete (talk) 23:45, 21 February 2020 (UTC)
    PD-1923 has no decision-making applies just a single template, it does not add the PD-old-nn variants. It has been utilised where we have been unable to determine a date of death, or for corporate publications which do not have PMA decisions. I addressed above that they would morph into PD-US, though we would need to handle them as parameterless. — billinghurst sDrewth 00:51, 22 February 2020 (UTC)
    Jan, that's not quite correct. Works published before 1923 are still in PD in the US for the same reason they were before. The 1923 date was a cutoff date beyond which we have never had to check. What has changed is that works that were under copyright later than that (from 1923 and 1924), and had their copyright renewed at one point, have now had that copyright protection expire. The works published before 1923 were not eligible for renewal and entered PD for a different reason than the works published in 1923 and 1924. It is one view to see the date as a shifting cutoff, but the cause of works from 1923 and 1924 entering public domain is actually different from those that were published prior to 1923. --EncycloPetey (talk) 03:13, 22 February 2020 (UTC)
    All works published more than 95 years ago are out of copyright because of the time since publication, no matter whether that's due to copyright notices, or renewals, or being in copyright for a full long term. For a work published before 1923, we've never been concerned about copyright notices or renewals, nor how long work published with copyright notice and renewal got in copyright. Why does it matter that a work published in 1924 may have got 95 years of copyright, whereas a work published in 1922 may have only got 75, when we don't really care about that 95 or 75 in the first place? We have no tag for "published abroad before non-US works got copyright in the US in 1891", because we don't care; it has always been sufficient for our purposes to say that it was published before 1923, and I don't see why it is not now sufficient to say that it was published more than 95 years ago.--Prosfilaes (talk) 04:59, 22 February 2020 (UTC)
    @Prosfilaes: I am presuming that this is in reference to the primary notice about copyright within the US, not the secondary notice for PD-old-nn which relates to copyright elsewhere in the world. The secondary notice can still apply for those of us not in the US, which is why we added it. — billinghurst sDrewth 05:08, 22 February 2020 (UTC)
    Yes, the primary notice. There's no need to worry about now-historical features of non-US countries, but certainly helpful to list the years since death.--Prosfilaes (talk) 05:18, 22 February 2020 (UTC)
    Yes and no. There are authors who have works published prior to 1925 who died late enough to still have works in copyright in their home country, so those notices are still very pertinent per Category:Media not suitable for Commons. — billinghurst sDrewth 05:30, 22 February 2020 (UTC)
    Right; I didn't mean to imply we should change the current secondary notices.--Prosfilaes (talk) 06:42, 22 February 2020 (UTC)
  • Symbol support vote.svg Support U.S. copyright is of primary concern to Wikisource. Fixing the license so more 1923 and 1924 works appear on Wikisource even if still under copyright in other countries is so important. Abzeronow (talk) 19:46, 16 March 2020 (UTC)
  • Symbol support vote.svg Support as this seems like the least problematic solution to the problem, and it doesn't make sense for us to keep delaying a resolution. Kaldari (talk) 18:09, 14 April 2020 (UTC)
  • Pictogram voting comment.svg Comment It looks as though some people are hedging their bets: arguing for deprecating the template on the one hand but arguing for improving the template on the other. Since the template content has now changed, before this discussion has concluded, then proceduraily we should recast all votes, since the template named in this discussion thread no longer has the content it had at the start of this discussion. --EncycloPetey (talk) 20:42, 24 April 2020 (UTC)
    Hedging their bets? It is somehow improper to try and improve Wikisource for now, whether or not this template gets deleted? If we're going to get pedantic about policy, where is it written on the English Wikisource that we should recast all votes?--Prosfilaes (talk) 06:41, 25 April 2020 (UTC)
    No need to restart the votes, as the changes have been reverted. The template is the same as it was before the voting started. No changes should be made to any template if there is a discussion and voting ongoing about its future. If the changes were allowed and at the same time we would have to restart the voting after every change, we may never come to a conclusion; not everybody has time to vote about the same problem again and again. --Jan Kameníček (talk) 09:50, 25 April 2020 (UTC)
  • Symbol support vote.svg Support If there must need a consensus to fix math wrongs, let it be. --Liuxinyu970226 (talk) 09:01, 7 May 2020 (UTC)
  • Pictogram voting comment.svg Comment Please note that the new date, 1925, applies to all works except sound recordings (and maybe architecture). The date for sound recordings is 1923. That isn't shown in the local summary of the Hirtle chart, but is in the original. (I dropped a more detailed comment below.)--Sphilbrick (talk) 14:29, 20 July 2020 (UTC)
    Interesting point. If it is really so and if we need to show a license for sound recordings somewhere, we would probably have to create a specialized template for them.--Jan Kameníček (talk) 11:44, 2 December 2020 (UTC)
    Yeah. Sound recordings have a tortured history in US copyright law, but the end point is that the first recordings to have their copyright expire in the US will be in 2022, for those published before 1923. See w:Public_domain_in_the_United_States#Sound_recordings_under_public_domain.--Prosfilaes (talk) 00:51, 3 December 2020 (UTC)

So it seems to me that there is a weak consensus for the change. If so, it might be better to make it before the end of the year, so that works newly entering public domain can already be added with new templates.

The less important change is renaming the templates from {{PD/1923|year of death}} and {{PD-anon-1923}} for {{PD-US|year of death}} and {{PD-anon-US}}. It is only a change of the names of the templates, what the readers see will not be affected by this.

The more important change is adapting the latter one so that it automatically counted the years as {{CURRENTYEAR}}-95, similarly as it has been done e. g. here.

--Jan Kameníček (talk) 11:44, 2 December 2020 (UTC)

Looks like an interesting, but a very long, discussion. Is there a way for a newbie to get involved without spending hours and hours? Thanks in advance, Ottawahitech (talk) 18:24, 10 December 2020 (UTC)

I have updated {{PD-anon-1923}} and moved it to {{PD-anon-US}}, and also moved {{PD-1923}} to {{PD-US}}, per discussion above. However, {{Pd/1923}} is locked and so I asked it to be moved to {{PD/US}} at the Admins' noticeboard. --Jan Kameníček (talk) 14:20, 26 December 2020 (UTC)

Policy on substantially empty works[edit]

[This is imported from WS:PD, where it applies to multiple current proposals, and several other works].

We have quite a few cases of works that are "collective" or "encyclopaedic" in that they comprise many standalone articles of individual value, which are basically just "shell pages", with no substantial content of any sort, not even imported scans or Index pages. For example, and this isn't intended to make any statement about these specific works, they're just examples and they may well get some work done soon during their respective WS:PD discussions:

Based on the usual rate of editing for things like that, unless dragged up into a process like WS:PD, they'll remain that way a very, very long time. I think it is perhaps there might be a case to host a mainspace page for this work, even though there is zero, or almost zero actual content. Do we want:

  • Mainspace pages where this is a tiny bit of information like header notes, scan links and maybe detective work on the talk page (not in this case). This provides a place for people to incrementally add content. Also gives "false positive" blue links, since there is actually no "real" content from the work itself, or
  • Do not have a mainspace page until there's some content. Only host this in terms of scan links author/portal scan links, much like we do for something like a novel.

Personally, I lean (gently) towards #2, but with a fairly low bar for how much content is needed. Say, Indexes, basic templates, a title page and one example article. Ideally, a completed TOC if practical, especially for periodical volumes/numbers. It is fair to not wish to transcribe entire volumes of these work, it is fair to not want to import dozens of scans when you only wanted one, it is fair to only want an article or two, but it's not fair, IMO, to expect the first person who wants to add an article to have to do all the groundwork themselves, despite having been lured in with a blue link. That onus feels more like it should be on the person creating the top-level page in the first place.

I do see some value in periodical top pages with decent lists of volumes and scans where known, because these are often tricky and fiddly to compile from Google books/IA/Hathi, so it's not useless work, even if there are no imported scans (though imported is better than not).

We currently have a large handful of collective works listed for deletion right now in various levels of "no real content", and, furthermore, every single periodical that gets added can fall into this situation unless the person who adds, so I think we could have a think about what we really want to see here. Inductiveloadtalk/contribs 15:43, 3 July 2020 (UTC)

  • I believe that, if there is no scan as an Index: page, the main-namespace page should not exist unless it is being actively completed or is already mostly completed. A few pages (of the volume itself) is not very helpful, and is entirely useless if their is no scan given. TE(æ)A,ea. (talk) 15:59, 3 July 2020 (UTC).
  • I think such preparatory information would ideally be on more centralized WikiProject pages (for the broad subject), both for clarity and to assist in keeping different efforts consistent -- but that it certainly should be retained as visible to non-admins. I think that the red vs blue link issue is minor (but not totally negligible) and outweighed by the disadvantages of hiding the history of previous efforts. I strongly encourage redirecting such pages to appropriate WikiProject pages (after copying over the details there). JesseW (talk) 18:11, 3 July 2020 (UTC)
  • @JesseW: I agree that history shouldn't be deleted, but I think we should approach this in terms of what we want to see from these works, rather than what to do with the handful of examples at PD. There are hundreds of periodicals we could have but don't, and this applies to those as well. If we can come to a conclusion about what is and isn't wanted, we can make all the deletion requested works conform to that easily enough. Inductiveloadtalk/contribs 20:55, 3 July 2020 (UTC)
  • I think these pages are necessary to list index pages and external scans of multi-volume works (such as encyclopaedias and periodicals) especially if they are wholly or partly anonymous or have many authors or are simply large. I think it makes no difference whether such pages are in the mainspace, the portal space or the project space (except that it is harder to find pages outside the mainspace). The point is that these works often have so many volumes (often dozens or hundreds) that they must have their own page, and cannot be merged into a larger portal or wikiproject. If the community starts insisting on index pages, what will happen is the rapid upload of a large number of scans for the periodicals that already have their own page. Likewise if the community insists on transclusion. I also think it is reasonable to have a contents page in the mainspace, as it allows transclusion of articles. Most importantly, new restrictions should not immediately apply to existing pages that were created before the introduction of the restrictions. This is necessary to prevent a bottleneck. James500 (talk) 23:55, 3 July 2020 (UTC)
move the works to a maintenance category, and i will work them; delete them and i will not: i find your sword of Damocles demotivating. Slowking4Rama's revenge 01:55, 5 July 2020 (UTC)
@User:Slowking4: I am not proposing a sword of Damocles. I agree that the imposition of deadlines is counter-productive. I do not support the deletion of any of these pages. I would prefer to see them improved. James500 (talk) 04:38, 5 July 2020 (UTC)
TEA is on his usual deletion spree. not a fan. will not be finding scans to save texts, any more. he can do it. Slowking4Rama's revenge 00:15, 6 July 2020 (UTC)
The entire point of moving this here, and not staying at WS:PD is to decouple from the emotions that get stirred up in a deletion discussion. Let's keep deletion out of this. If we come up with some idea of what we do and don't want, then we can go back to WS:PD and decide what to do. I imagine that all that will be needed will be a fairly limited amount of housework to bring those works up to some standard that we can decide on here, and all the collective works there will be easy keeps. Hopefully with some kind of consensus that we can point at to outline a minimum viable product for such works going forward. There are hundreds and thousands of dictionaries, encyclopedias, periodicals and newspapers that we could/will, quite reasonably, have only snippets of. How do we want to present them? What, exactly, is the minimum threshold? Let's head of all those future deletion proposals off at the pass, because deletion proposals often cause friction. Inductiveloadtalk/contribs 00:47, 6 July 2020 (UTC)
and yet deletion is the default method to "motivate" quality improvement. i reject your assertion that "emotions get stirred in a deletion discussion", rather, anger is a valid response to a repeated broken process being kicked down on the volunteers. it is unclear that a minimum threshold is necessary, rather a functional quality improvement process is. until we have one, you should expect to see this periodic stirring of emotions, as the non-leaders act out. Slowking4Rama's revenge 11:53, 9 July 2020 (UTC)
@Slowking4: Thank you for presenting this opinion, and I'm sorry if I have not made myself clear. We do need to figure out how to avoid a de-facto process of using WS:PD as an ill-tempered ad-hoc venue for "forcing" improvements on people who have somehow managed to generate works that are so in need of improvement that another user has nominated them for deletion. Please also consider looking at #Re-purpose_WikiProject_OCR_to_WikiProject_Scans for an idea to have a "functional quality improvement process" to which such works could be referred upon discovery rather than kicking them straight to WS:PD. If you have other ideas or you have previously suggested something similar to address these frustrations, you could detail them there. Personally, I think we should always prefer improvement over deletion. Exactly what the remediation is (refer to a putative WP:Scans, WS:Scriptorium/Help, directly WS:PD as now, or something else) is not what this thread is for. This thread is for discussing, what, if anything, should be the tipping point for deeming a page "lacking" and doing something about, whatever "something" is. I don't think I can be much clearer that this is not about deletion. If we also have a better venue for improvements, then that's even better.
For example, my personal feeling and !vote on A Critical Dictionary of English Literature is "keep and improve", despite it lacking scans or even links to scans, having only one article and no other content, not even a title page: in short, failing almost every criterion suggested so far in this thread. The only thing it does have is have is good text quality of the one entry. I personally do not think this work should be deleted, but I do think it should be improved in specific ways. The first half of that sentence is not the focus of this discussion, the second half is. Inductiveloadtalk/contribs 14:18, 9 July 2020 (UTC)
deletion threat has been an habitual method of communicating by admins since the beginning of the project. and text dumps have been habitual following in the guttenberg example. culture change and process change would be required to change those behaviors. we could may it easier to start scan backed works, but the wishlist was not supported. Slowking4Rama's revenge 21:00, 14 July 2020 (UTC)

I don't think this needs to be much of an issue going forward -- we all agree that it's OK to create Index pages for scans, even if none of the Pages have been transcribed yet; so the only case where this would come up is recording research where no scan has yet been identified as suitable to be uploaded. And for that, I still think a WikiProject page is the right location, not mainspace. (Or, if you must, your userpage.) JesseW (talk) 00:59, 6 July 2020 (UTC) I realized I may not have been clear enough here -- in my view, the ideal process goes like this:

  1. Decide on a work you are interested in (in this case, a periodical/encyclopedic one) -- don't record that anywhere on-wiki (except maybe your user page)
  2. Find and upload (to Commons) a scan of one part/issue/etc of the work.
  3. Create a ProofreadPage-managed page in the Index: namespace for the scan. (You can stop after this point, without worry that your work will later be discarded.)
  4. EITHER
    1. Put further research (on other editions, context, possible wikification, etc.) on that Index_talk page.
    2. Proofread a complete part of the scan (an article from the magazine issue, a chapter from the book, a entry from an encyclopedia, etc.) and transclude it to the mainspace (and create necessary parent pages), and put the further research on the Talk: page of the parent mainspace entry.

If you can't find any scan, and don't want to leave your working notes on your user page, put them on a relevant WikiProject's page.

If you come across such research done by others and misplaced, follow the above process to relocate it to an appropriate place, then redirect the page where you found it to the new location. That's my proposal. JesseW (talk) 01:08, 6 July 2020 (UTC)

@JesseW: It's not clear to me in your above whether when you use the term "index" you refer to a ProofreadPage-managed page in the Index: namespace, or a general wikipage in the main namespace on which an index-like structure (and/or a ToC, or similar) is manually created. Could you clarify? --Xover (talk) 05:14, 6 July 2020 (UTC)
I meant the namespace. Clarified now. JesseW (talk) 05:17, 6 July 2020 (UTC)
  • Hoo-boy. Y'all sure know how to pick the difficult issues…
    My general stance is that: 1) scans and Index: (and Page:) namespace pages have no particular completion criteria to meet to merit inclusion, and can stay in whatever state indefinitely (there may be other reasons to get rid of them, but not this); and 2) the default for mainspace is that only scan-backed complete and finished works that meet a minimum standard for quality should exist there.
    That general stance must be nuanced in two main ways: 1) there must be some kind of grandfather clause for pre-existing pages; and 2) there must exist exceptions for certain kinds of works that meet certain criteria. I won't touch on the grandfather clause here much, except to say I'm generally in favour of making it minimal, maybe something like "No active effort to get rid of older works, but if they're brought to PD for other reasons they're fair game". The design of a grandfather clause for this is a whole separate discussion, and an intelligent one requires analysis of existing pages that would be affected by it. It is always preferable to migrate pages to a modern standard, so a grandfather clause is by definition a second choice option.
    Now, to the meat of the matter: the exceptions…
    We have a clear policy to start from: no excerpts. Works should either be complete as published, or they should not be in mainspace. But quite apart from the historical practices that modify this (which are somewhat subjective and inconsistent, so I'll ignore them for now), there are some fairly obvious cases that suggest a need for more nuance than a simple bright-line rule alone provides. The major ones that come to mind are: 1) massive never-completed projects like EB1911 or the New York Times (EB because it's big; NYT because new PD issues are added every year); 2) compilations or collections of stand-alone works with plausible claim to independent notability.
    For encyclopedias and encyclopedia-like things, we have to accept some subsets due to sheer scale of work. But when that is the grounds for exception, there needs to be some minimum level of completion. I'm not sure I can come up with a specific number of pages/entries or percentage, but it needs to be more than just a single entry (and, obviously, only complete entries). For this kind of exception to apply, I think it needs to be a requirement that the framing structure for it is complete: that is, the mainspace page should give a complete overview of the relevant work even if most of it is redlinks. That includes title pages and other prolegomena when relevant. For a periodical like the NYT, that means complete lists of issues with dates and other such relevant information (e,g. name changes etc.). For preference, these kinds of things should be in Portal: namespace or on a WikiProject page until actually complete, but that will not always be practical (EB1911 and NYT are examples of this). Mainspace or Portal:-space should never contain external links (i.e. to scans) or links to Index: or Page: space (except the implied link of transclusion and the "Source" tab in the MW UI provided by ProofreadPage).
    For exception claimed under independent notability there are a couple of distinct variants.
    Newspaper or magazine articles need to have a certain level of substance in addition to a specific identifiable byline (possibly anonymous or pseudonymous, and possibly identified after the fact by some other source, such as the Letters of Junius) in order to qualify. It is not enough to ipso facto be a newspaper article, a magazine article, a poem, or an encyclopedia entry. On the one hand we have things like dictionaries and thesauri, where an entry could be as little as two words. Or a one-sentence notice without byline in a newspaper. Or two rhymed lines (technically a poem) within a 1000-page scholarly monograph.
    To merit this exception it should be reasonable to argue that the "work" in question should exist as a stand-alone mainspace page (not that we generally want that; but as a test for this exception, it should be reasonable to make such an argument). This would clearly apply to moderately long entries in the EB1911 written by a known author that has their own Wikipedia article. It would apply to short stories or novella-length serialisations in literary magazines by authors that have later become famous (or "are still …"). It would apply to various longer-form journalistic material from identifiable journalists (again, rule of thumb is notable enough for enWP article), including things in magazines that have similar properties. For most periodicals the most relevant atomic (indivisable) part is the issue not the entry or article, but with some commonsense exceptions.
    It would, generally, not apply to things that are works by a single author, like a scholarly monograph that just happens to be arranged in "entries" rather than chapters. It would not apply to things that are essentially lists or tables of data. It would not apply to short entries in something encyclopedia-like or entries that are not by an identifiable author. The OED for example, iirc, is a collective work where entries are by multiple not individually identifiable authors (and each entry is mostly very short too); only the overall editor is usually cited.
    For works claiming this exception too the framing structure should be complete, even if most of it are redlinks. The same general rules about Portal:/WikiProject and no external or Index:-space links apply. An exception would be for periodicals where new issues enter the public domain every year; and we should generally avoid including even redlinks for the non-PD issues here (but may allow them in a WikiProject page). For non-periodical works in multiple volumes where some volumes were published after the PD cutoff, including listings for the non-PD volumes (but not links to scans; those are a copyvio issue) is ok.
    Poems, short stories, and novellas are a special class of works here. A lot of these were first published in a magazine (possibly serialized), and a lot of them exist as multiple editions in substantially the same form. Some exist in multiple versions. These should all primarily exist the same way as chapters as part of their various containing works; but there are some cases where we might want to have, for example, a series of connected pages of the poems of Emily Dickinson. I am significantly ambivalent about this practice, as it amounts to making our own "edition" or "collection" of her poems (in violation of several of our other policies), but I acknowledge that it is an established practice and it is something that has definite value to our readers. It may be that it is actually a practice that should be governed by its own dedicated policy rather be attempted to be handled within these other general policies.
    For the sake of example; applying this to the works Inductiveload listed at the start of this thread would shake out something like this:
    Auction Prices of Books—This work appears to have no sensible subdivisions and is in any case by a single author. I see no obvious reason to grant this work an exception, except under sheer volume of work and even there I would want to see both a substantial proportion completed and some kind of ongoing effort towards completion (no particular time frame, but definitely not infinite and definitely not as an effectively abandoned project). In a deletion discussion I would very likely vote to delete the mainspace pages here (but, as nearly always, to keep the Index: and Page: namespace artifacts). I don't see this as a reasonable candidate for a Portal:, nor really a good fit for a WikiProject (though I probably wouldn't object to a WikiProject if someone really wanted one).
    Central Law Journal/Volume 1—A single volume is too little, so I would want to see a complete structure for the entire Central Law Journal, with level of detail for each volume similar to the one existing volume. Each article in the journal can be individually considered for a stand-alone work exception; but for the collection I would want to see at minimum a full issue finished to justify having the mainspace structure, and preferably multiple issues (in a deletion discussion I might insist on multiple issues). Index: and Page:-space artefacts can, of course, stay. A Portal: might make sense for selections from the journal, of articles that meet the standalone work exception. A WikiProject to coordinate work and track links to scans etc. might be a decent fit here, if someone wanted that. As it currently stands I would probably vote delete for the mainspace artefacts (with option to move whatever content has reuse value to a non-mainspace page for preservation; and undeleting if someone wants to work on something is a low bar).
    A Critical Dictionary of English Literature—The top level mainspace page has near-zero value, existing only to link to the single transcribed entry. For a credible claim to exception to exist it would need to be a complete framework for the work as a whole, and significantly more than a single entry must be complete. I would probably also want to see ongoing work, unless a substantial percentage of the entries were complete. The single finished entry is eligible to claim a standalone work exception, but I think it probably would not meet my bar for that (I might be wrong; and the rest of the community might judge it differently). In a deletion discussion I would probably vote to delete all the mainspace artifacts here (as always keeping Index:/Page: stuff) but with a definite possibility that I might be persuaded on the one completed entry (an absolute requirement for convincing me would be to scan-back it: as a separate issue, my tolerance for grandfathering of non-scan-backed works is small, and effectively zero for new/non-grandfathered works).
    Bradshaw's Monthly Railway Guide—Would need a full framework and a number of individual issues finished to merit a mainspace page. I see no credible subdivisions for a standalone work exception, but might be persuaded otherwise if, say, one of the train tables was used as a (reliable primary) source in a Wikipedia article (implying some sort of notability beyond just being raw data). In a deletion discussion I would probably vote to delete all mainspace artifacts here. If anyone made the argument, I would entertain the notion that there is value in treating train tables like poems, and hosting a series of train tables like we do Dickinson's poems; but that would require a substantial number of them completed.
    For everything above my stance is nuanced by a willingness to accept temporary exceptions for things that are actively being worked: active being operative, but with no particular deadline to complete the work. We have differing amounts of time available, and some works are so labour-intensive or tedious to do, that my person threshold for "active" is a pretty low bar to clear. If it's months and years between every time you dip in and do a bit I might start to get antsy, but days or weeks probably won't faze me. And that the projected time to completion is very long at that pace is not particularly a problem so long as it is not infinite. Within those parameters I would always tend to err on the side of letting contributors just get on with it in peace, regardless of any of the policy-like rules sketched above.
    I also want to emphasise that I think this is a very difficult issue to deal with. There are a lot of competing concerns, and a lot of grey areas that will likely take individual discussions to resolve. My balance point on this issue is partly formed by a broader concern about our overall quality (we have waay too many works of plain sub-par quality, and too many not up to modern standards) and a hope that by preventing the creation of these kinds of works (rather than deleting them after creation) we will be able to retain the good and desirable exceptions without dragging down quality, and without the traumatic and stressful events that deletions and proposed deletion discussions are.
    And for that very reason I am grateful this issue was brought up here for discussion, and I hope we can end up with some clear guidance, possibly in the form of a policy page, going forward. And in any case, since it will create de facto policy, this is a discussion that needs to stay open for a good long while (there are several community members that have not yet commented whose opinion I would wish to hear before closing this), and depending on how well we manage to structure the consensus, may also require a formal vote (up in the #Proposals section). --Xover (talk) 09:03, 6 July 2020 (UTC)
  • Symbol oppose vote.svg Oppose. It is becoming clear that a policy on incomplete works in the mainspace is going to place enormous pressure on individual editors. I think it would be more effective to start a wikiproject devoted to scan-backing works that lack scans and so on. James500 (talk) 12:14, 6 July 2020 (UTC)
    • @James500: FYI, this thread was made in order to provide an exception to the current policy of "no excerpts". A literal reading of the policy as it stands has a plausible chance of coming down delete on the mainspace pages over at WS:PD. This thread is a chance to come up with a better way to support such partial collective works. That we have several substantially incomplete and abandoned collective works lolling around in mainspace is actually the result of laxity in respect to stated policy (not to say I think it's a bad thing). The deletion proposals, whatever you may think of them, are actually not in contradiction to policy. That said, as always, there is scope to adjust policy. Which is what this is.
    • Now, in terms of a WikiProject to scan back works, I think that is a good idea. See #Re-purpose_WikiProject_OCR_to_WikiProject_Scans above, which proposed to reboot Wikiproject OCR as a scan-backing Wikiproject. Inductiveloadtalk/contribs 14:40, 6 July 2020 (UTC)
      • The policy says "When an entire work is available as a djvu file on commons and an Index page is created here, works are considered in process not excerpts." A literal reading of that policy is that no scan-backed work is an excerpt (it is expected to be completed eventually). Further the policy refers to "Random or selected sections of a larger work". A literal reading of that expression is that it does not include lists of scans, or auxilliary content tables, as they are not "sections" (they are not part of the work), and that not every incomplete portion of a work is either "random or selected" (which would not include starting from the beginning and getting as far as you can, with intent to finish later). I could probably argue that an encyclopedia article or periodical article is a complete work. James500 (talk) 15:16, 6 July 2020 (UTC)
  • Nice wall of text, Xover (and I say that with great respect!) -- it generally makes sense and sounds good to me. As another hopefully illustrative example, take The Works of Voltaire, which I've been digging thru lately. I think this would very much satisfy your criteria as a large work, with sufficient scaffolding to justify the mainspace pages that exist for it. I would love to hear others thoughts on that. JesseW (talk) 16:07, 6 July 2020 (UTC)
    @JesseW: Yeah, apologies for the length. Brevity is just not my strong suit.
    The Works of Voltaire probably qualifies on sheer scale of work, yes. I don't think the current wikipage at The Works of Voltaire is quite it though: as it currently stands it is more WikiProject than something that should sit in mainspace (its contents are for Wikisource contributors, to organise our effort, not our readers, who want to read finished transcriptions). It also mixes a work page with a versions page in a confusing way. So I would probably say… Move the current page to Wikisource:WikiProject Voltaire; create a new The Works of Voltaire as a pure versions page, linking to…; The Works of Voltaire (1906), that is set up as a work page with the cover and title (and other relevant front matter) of the first volume, and an AuxTOC (and possibly also the {{Works of Voltaire}} volume navigation template). I don't know how tightly coupled the volumes of this edition are (does the first volume have a common ToC or index of works for all the volumes?), so some flexibility on format may be needed to make sense. But as a base rule of thumb it should start from a regular works page and deviate only as needed to accommodate this work (mainly the size is different).
    In any case… With a volume or two completed (they're only ~350 pages each) I'd be perfectly happy having something like that sitting around. With less then that I'd possibly be a bit more iffy, but it's hard to put any kind of hard limit on that. And with somebody actively working on it I'd be in no hurry whatsoever regardless of current level of completion.
    PS. I'm pretty sure a large proportion of the contents of these volumes are works that would qualify under "standalone works" that could exist independently in mainspace, regardless of what's done with the The Works of Voltaire page. Even his individual poems and essays can presumably make a credible claim here (because it's Voltaire; less famous authors would have a higher bar). Better as part of the edition, but also acceptable on their own. --Xover (talk) 16:56, 6 July 2020 (UTC)
  • @JesseW: I personally take no issue with this page's existence (actually I think it's a nice work and good way to allow an important author's works to be slotted in piece-by-piece. I have some general comments which overlap with this thread (written before Xover's reply, so pardon overlap):
    • First off, I differ with Xover in terms of the scan links: I think they're better than nothing, and I don't see much value in duplicating the volume list onto an auxiliary page just to add scan links. However, I can sympathise with the sentiment that our mainspace shouldn't direct users off-wiki (or at least off-WMF). But if we don't have the scans, and that's what the user wants, they're leaving anyway. Real answer: import moar scans!
    • No scan links are necessary where the volume exists in mainspace and is scan-backed (e.g. v3)
    • Ext scan links should only be used when there is no Index page or imported scan. Use {{small scan link}} or {{Commons link}} when possible (e.g. v2)
    • The first volume list could probably be in an AuxTOC to mark it out as WS-generated content.
    • The "Other editions" section belongs on an auxiliary namespace page (Talk, Portal or Wikisource). I suggest the Talk page is best in this case. Inductiveloadtalk/contribs 17:35, 6 July 2020 (UTC)
  • @Xover: I am in agreement with the majority of what you say. Particularly, I think a framework around any collective work (be it a single-volume biographical dictionary or a 400-issue literary review spanning 80 years) is the critical prerequisite, plus at least some scans, the more the merrier. Where I think I differ:
    • I am inclined to be a bit more relaxed in terms of how much of a work we need. As long as a single article exists, it's not "trivial" (e.g. only a short advert or some incidental text like a "note to correspondents", as opposed to an actual article), it's well-formatted and scan-backed, and a complete framework exists, including front matter and a TOC, such that's it is easy for anyone to slot in new pieces, I'd be fairly happy. Lots of periodicals have all sort of tricky bits like tables of stocks or weather tables and writing into policy that those must be proofread in order to get the "real" articles into mainspace would be a chilling effect, in my opinion. If you allowed an exception, it would be verbose and tricky to capture the spirit without saying "unless, like, it's totally, like, hard, man".
    • I am not dead against scan links in the mainspace at the top level, when such a top-level page exists. See my comments on Voltaire above. I am against them where they could sensibly be on an Author page and they are the only mainspace content.
    • I am ambivalent on the presence of, e.g., disjointed train timetables. It's not my thing to have a smattering of random timetables, but as long as they're individually presented nicely, it's not too offensive to my sensibilities. I might question the sanity of someone who loves doing tables that much, but whatever floats the boats! Also, I think that this might circle back to "good for export" - a mark which certainly would require completed issues or volumes. If you want to get that box ticked, you have to do it all.
    • Re the "notability" aspect of individual articles, I'm not really bothered by that, as I don't think we'll see a flood of total dross because few people really want to take the time to transcribe 1867 articles about cats in a tree from the Nowhere, Arizona Daily Reporter, and, actually I think some of the "dross" can be quite interesting in a slice-of-life kind of a way (always assuming well-formed and scan-backed). And the real dross is usually so bad (no scans, raw OCR, etc) that it can be dealt with outside of this topic. I think part of the value of WS is the tiny, weird and wonderful, not just in blockbusters like War and Peace and Pultizers. I think I might like to see more of our articles strung together thematically via Portals, but that's another day's issue. Inductiveloadtalk/contribs 17:35, 6 July 2020 (UTC)
      • @Inductiveload: We appear to be mostly in agreement. But… instead of me dropping another wall of text on the remaining points of disagreement, maybe that means we're in a position to try to hash out a draft guidance / policy type page with the rough framework? Then we could go at the remaining issues point by point. Because I think I'm in with a decent chance to persuade you to my point of view on at least some of them, but this thread is fast getting unwieldy (mostly my fault). It would also probably be easier for the community to relate to now, and much easier to lean on in the future. --Xover (talk) 18:31, 6 July 2020 (UTC)
        • @Xover: If there are no more comments forthcoming after a couple of days, I think that makes sense. I don't want to railroad it: considering we have at least one !vote for "do nothing", I'd like to see if there are any other substantially different opinions floating about. Inductiveloadtalk/contribs 17:41, 7 July 2020 (UTC)

The quantity of text here has grown far faster than my ability to absorb it, so rather than continue to put it off, here's my position: I don't see any problem with transcriptions that are scan-backed, even if the transcription only covers a small fraction of the entire scan. If Sally chooses (say) to transcribe a favorite story, that happened to be published in an issue of Harper's back in the 1890s, and goes to the trouble of uploading the full issue, but only creates pages for the one story that interests her, I think that's great. It doesn't matter to me whether she intends to work on the other pages or not. If it's not scan-backed, but it's fairly high quality, I am personally willing to do some work trying to locate a scan and match it up to the text; I'd rather we take that approach, than deletion, though of course deletion is the better option in some cases where the scan is very hard to come by.

If all this has been said above, or if I've misunderstood the topic, my apologies. Please take this comment or leave it, as appropriate. -Pete (talk) 02:00, 8 July 2020 (UTC)

Apologies, I see I had missed the point.

I disagree with Xover's statement that a top-level page for a publication, with a link only to a single article within the publication, has "near-zero value." Such a page can serve an important function linking content together in ways that help the reader (and search engines) find the content they're looking for, or understand the context around it. For instance, A Critical Dictionary of English Literature is linked from the relevant Wikidata entry. The banner on the Wikisource page clearly tells a Wikisource reader that they won't find a full transcription here; and with a simple edit, it could link to a full scan on another site, or (with perhaps a little more effort) even transcription links here on Wikisource. This page has been here since 2010; we don't have any way of knowing what links might have been created elsewhere in the intervening decade. (I do think that new pages like this should not be created without a scan at Commons to be linked to.) -Pete (talk) 02:12, 8 July 2020 (UTC)

I'm really bad with walls of text, so I have only read a tiny portion of the above discussion. But I want to mention a couple of things that I think are worth considering in this discussion.
  • Most of the time, a mainspace "work" that is only a table of contents, but which has none of the actual content, and is not actively being worked on, can be (and should be) deleted as No meaningful content or history under our deletion policy.
  • A mainspace work that has only a little bit of content, but that content is a work unto itself within the scope of Wikisourse, should be kept. Most periodicals are like this. For an example, see the Journal of English and Germanic Philology which only has one hosted article, but that hosted article is scan-backed and firmly within scope.
  • On some occasions, empty mainspace works do have value. I ended up creating the page The Roman Breviary, depsite containing no actual content, mostly because there are a lot of works that link to it, using many different titles, and if someone uploaded a copy of the work under one title then many of the links would remain red because they point to different titles of the work. This could be easily solved by creating redirects to a simple placeholder page, so I did. I tried to make the placeholder page as useful as a placeholder page can be, as it contains useful information about the history and authorship of the work, and links to the Index pages where the transcription will take place.

Anyway those are my 2 cents, sorry if they are redundant —Beleg Tâl (talk) 00:40, 29 July 2020 (UTC)

Proposal[edit]

Since there has been no extra input for a month, and not wanting this section to get archived without at least attempting a proposal, I have started a proposal #Collective work inclusion criteria above. Inductiveloadtalk/contribs 11:00, 25 August 2020 (UTC)


I've created Bradshaw's Monthly Railway and Steam Navigation Guide (XVI) - it couldn't be done on one page, due to the very high number of template transclusions. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:52, 1 September 2020 (UTC)

Anne of Green Gables[edit]

I recently did an independent proof-read of Anne of Green Gables. In doing so, I compared my transcription with yours. It helped me find errors in my text. At the same time, it led me to errors in the wikisource text. Here is the list of issues.

https://johanley.github.io/anne-of-green-gables/index.html#issues-wikisource

Could you get in touch with the original wikisource proof-readers, and let them know?

Second question: the nature of the issues is in many cases repetitive - missing quotes, period-versus-comma. Are we sure this work was actually edited by at least one human? unsigned comment by John O'Hanley (talk) .

Thanks. Do you have the page numbers? Fixing mistakes that a sincere comparison has found would be welcomed here, especially as the Wikisource editions has the scans to compare against. ShakespeareFan00 (talk) 21:05, 21 December 2020 (UTC)

No page numbers. But the listing has chapter number, and the text that starts off the paragraph. In addition, about half the cases have links to the page on archive.org, so that will have the page number. John O'Hanley (talk) 21:10, 21 December 2020 (UTC)

Meant the scan page numbers, because that's what the Page: structure typicaly uses.
BTW If you find errors, don't feel you need to ask to repair them if what's been nominally validated, doesn't agree with the scan. If you want to mark something that looks like a genuine error in the original printing use {{SIC}} , I've done this for stuff I've proofread. ShakespeareFan00 (talk) 21:23, 21 December 2020 (UTC)
No scan page numbers, no. I can apply repairs if needed. But given the nature and number of the errors, it's best to let you folks know explicitly, in case something unusual has happened with this text. John O'Hanley (talk) 21:38, 21 December 2020 (UTC)


@BethNaught: , You are typically good at finding typos and scan errors others may have overlooked? Your comments? ShakespeareFan00 (talk) 21:07, 21 December 2020 (UTC)

I don't have any comments that someone else wouldn't be able to give. I would only note for John O'Hanley that yes, each page of this book has been checked by at least two Wikisource users. You can see this by clicking the "Source" tab at the top of the work, which takes you to the scan index: pages in green are "validated" i.e. checked by two people. The fact that they didn't catch these errors is unfortunate, but we don't expect people to be perfect. BethNaught (talk) 21:44, 21 December 2020 (UTC)

Pictogram voting comment.svg Comment To help find the errors in Page: ns, I have added a search box to the Index: page for quick searches. — billinghurst sDrewth 04:22, 22 December 2020 (UTC)

and if they are "repetitive - missing quotes, period-versus-comma" then a find and replace in visual editor should expedite another sweep through. Slowking4Rama's revenge 21:30, 22 December 2020 (UTC)


FYI - Wikisource editors have fixed most of the issues. Updated listing.

thank you for the error reporting. you can also leave notes on the talk page [2] and leave a note at scriptorium, and people will respond. cheers. Slowking4Rama's revenge 01:05, 1 January 2021 (UTC)


Barging in, but regarding the "checked by two people" thing... I did a test on one work several months ago. (I could find the discussion where I mentioned this if wanted) I closely reviewed between 50 and 100 pages that had been gone over very nicely by quite competent people here. However, there was still an error every 10 pages or so. I even found an error *I* had missed. It left me with the feeling there will be lots of errors in the average review done here.

You might see me occasionally peek into other works, not only for possible interest, but also to check 'quality'. I *often* find errors in the first or second page I peek at, even of 'validated' pages. I sometimes end up downright snippy in my edit summaries, when a long series of missed opportunities is found.

Please take this not so much as a criticism of the project, but rather a reflection on one aspect of work here - fidelity to the source. It is not at a point we should be comfortable with. It is that resulting discomfort that certainly motivates me in my review pass. Shenme (talk) 05:10, 10 January 2021 (UTC)

Are you referring to this thread? If not, I'm still pointing it out as it found it interesting.
It's alleged there that "Validations are not being done properly … so the texts are often only little improved after the initial proofreading". Naming no names, but I can sympathise with that. Some validators of texts I proofread go very fast, such that I'm not confident they're checking everything properly. Perhaps some of them are professionals, or skilled enough to justify that—but for some, when I check works they proofread, I see lots of errors.
I think some may have a "quantity over quality" mindset. The appropriate balance can be debated, but the fact that (at least for this book) we've been empirically shown to be worse than DP Canada should give us pause. To illustrate, people sometimes praise me for being a high-quality and thorough proofreader, but I think the truth is just that I go more slowly, double-check easily-scannoed punctuation, and spellcheck the result.
Also, the discussion of The Great Gatsby at Wikisource talk:Proofread of the Month#June (Fiction: Novel) was illuminating to me. I didn't realise people routinely made assumptions about the strengths and weaknesses of the OCR/uncorrected transcription: to me, marking something as proofread or validated meant signing off on every aspect of the page.
In short, I don't think we have clear enough guidance or expectations about what level of checking "proofread" and "validated" require. BethNaught (talk) 11:23, 10 January 2021 (UTC)
PS. I'm not claiming to be perfect, I do sometimes miss things still. BethNaught (talk) 11:23, 10 January 2021 (UTC)
@BethNaught: Help:Page status and Help:Proofread are the likeliest spaces to add further information, where would you want it? 20 cents ... to me it means aligns to Wikisource:style guide

1. all words are correct
2. general formatting reflects the work (hyphenation, italics, dashes, quotations, bold, size, ...)
3. requisite linking is undertaken (eg. q.v. links done). Sometimes works are progressed to proofread without links which is okay, though I would not expect them to be validated without required links.

I also think that we need to look to similar guidance relating to presenting a work/transclusion about what is best practice, and including something about WD items. — billinghurst sDrewth 02:09, 11 January 2021 (UTC)

@Billinghurst: I agree, we specifically need something somewhere with a brief list of expectations, especially for being able to turn a page "yellow/proofread". Quite a few new editors don't know about the bajillions of templates and we end up with things like Page:Five_Irish_comic_songs.pdf/1 going yellow a little prematurely, which means it moves into the "to be validated" queue and loses eyes on it. It's unfair, IMO, to expect newcomers people to intuit all the expectations themselves.
WD processes are absolutely also required to be documented somewhere, because they're not obvious at all, and the work/edition ontologies are confusing (at least to me!). I know we have the WE-framework gadget, but it's a be "secret". Inductiveloadtalk/contribs 09:36, 11 January 2021 (UTC)
WD is "documented" at d:Wikidata:Wikisource, then follows d:Wikidata:Books, though not in an easy access process. — billinghurst sDrewth 10:16, 11 January 2021 (UTC)
and we have Wikisource:Wikidata. AND I am the worst to write such pages, and even to go and read them. :-/ — billinghurst sDrewth 10:19, 11 January 2021 (UTC)

Magazines for scan backing[edit]

I've been working on a lot of magazines recently, and I'd like to make a list of the more notable ones that I should pull from IA or HathiTrust when I get a chance. A lot of this information may be on Wiki; e.g. Author:Zona Gale says that certain volumes of The Smart Set, Everybody's, Sunday Magazine, Harper's Magazine, American Magazine, The Century Magazine, and Harper's Monthly Magazine would scan-back existing works. I know about Lovecraft and Author:Robert Ervin Howard; I am working on 1925 Weird Tales, but not any post-1925 pulps at this time. Any other suggestions of authors or magazines that I should look for for works that need scan backing on Wikisource?--Prosfilaes (talk) 10:27, 25 December 2020 (UTC)

Dropinitial: paragraph is gone despite a blank line[edit]

1, 2, 3. But it works with two paragraphs: 4. Caused by recent changes in {{Dropinitial}}. --Ratte (talk) 12:19, 26 December 2020 (UTC)

@Ratte: I reverted the problematic change. It was intended to fix drop initials in an indented container, but it's not that simple (and might not be possible). Thanks for reporting. Inductiveloadtalk/contribs 12:42, 26 December 2020 (UTC)
Ok, thanks. Ratte (talk) 12:47, 26 December 2020 (UTC)

Deblacklisting YouTube, Amazon, eBay etc. for autoconfirmed users[edit]

It'd be really handy to be able to show at Wikisource:WikiProject Film/Not uploaded to Commons where an encode of a film is available to download/buy, as a reminder for users (such as myself) to get the film from that source before it goes out of stock or gets deleted, and rip it for Wikimedia Commons. But the filters on Wikisource prevent all of these types of links for any user except administrators apparently. But I am an autoconfirmed user, and this filter I presume is mostly protecting against spammers, who would almost certainly not be autoconfirmed. As what I am doing is not spam, could you please lift these filters for non-autoconfirmed users, so I do not have to avoid the filters by only including video IDs or the ends of Amazon links? PseudoSkull (talk) 13:08, 27 December 2020 (UTC)

I (as an administrator) tried to add the link you had a problem with (for The Primitive Man) and was also prevented from doing so. Per phab:T36928, it is not possible to allow certain user groups to override the blacklist.
Thoughts on some options:
  1. Ask an admin to manually whitelist all your desired links. This is probably too unwieldy.
  2. Hack your way around the blacklist somehow. I don't know if this would work, and it's probably a bad idea per w:en:WP:BEANS.
  3. Deblacklist all of YouTube. I think this might be fine if we also add an AbuseFilter rule to stop new users posting YouTube links.
  4. Shepherd that bug through to a fix and obtain the relevant user right.
BethNaught (talk) 16:25, 27 December 2020 (UTC)
you could go complain at [3], but they think blacklists are a good thing, and like the "admin may i" gatekeeping. Slowking4Rama's revenge 22:14, 3 January 2021 (UTC)

Pictogram voting comment.svg Comment

  • Blacklists are blacklists, except where we whitelist, there are no exemptions for any level of editor. Make it simpler to add items to the whitelist is always worth asking.
  • I would argue that it is not our job to be adding links to commercial products. How would you like us to try and differentiate between your adding commercial links, another editor account adding commercial links, an IP address adding commercial links, and spammers adding commercial links.
  • I see no reason to remove Amazon or Ebay from the blacklist, though I can see some argument that YouTube has limited value and could be added to work's talk pages for use as a source within {{textinfo}}. [I will note that it is heavily abused by spambots]
  • Any user is able to utilise search engines to search and find commercial products and hardly needs our help.
  • To Slowking4 -- living a life of snideness must be marvellous. That may be how you would approach the role, but let me say that blacklists save a whole lot of work. special:log/spamblacklist

billinghurst sDrewth 04:50, 4 January 2021 (UTC)

thanks for making the admin case. the notorious commercial pirate UK, and US governments persist in using youtube as a reference. as we saw in the The Report of the Iraq Inquiry - Executive Summary, filtering youtube prevented doing actual work on that text. but it is a small price to pay for admin comfort. (that was not snide, that was an accurate reflection of admin attitudes, which you have confirmed.) filters are so adversive and abrupt that even veteran editors are confused, as we saw at Scots wikipedia. Slowking4Rama's revenge 17:53, 4 January 2021 (UTC)
As I said, if you have a case for removing youtube from the blacklist, then present it. Don't give me the sarcasm, the snideness, the bitterness, just be pleasant and present the case. {if those expressions are not your intent, they are how they come across). At this time, admins have said that we will whitelist addresses, or suspend the blacklist entry as required.

FWIW we have substantially more hits of youtube from spammers than we do from users. And it is far easier to assist in whitelisting then the repeated removal of the spam. Minor inconvenience is a two-way street, and would be identified as part of any conversation about the best means to move forward with removing a domain from the blacklist. — billinghurst sDrewth 01:06, 5 January 2021 (UTC)

Duplication of government works (again)[edit]

I noticed this issue before, but it has come to life again. The whistleblower letter on the Trump–Ukraine scandal is now scan-backed twice from two different scans at two different locations—Letter to Chairman Burr and Chairman Schiff, August 12, 2019 (the original) and Trump–Ukraine whistleblower complaint. The files were created at almost the same time, the latter only two hours after the former, but the latter was transcluded nearly one year after the former. The problem with the duplication of the transcript needs to be resolved as well. TE(æ)A,ea. (talk) 23:03, 27 December 2020 (UTC).

Yes check.svg Done I have redirected to one work, and deleted the others components. I have also resolved at Commons. Thanks for the alert. — billinghurst sDrewth 01:16, 5 January 2021 (UTC)

Bug in text file export?[edit]

In proof-reading Anne of Green Gables, I found some occasional issues with whitespace in exported text files. The general idea is that the core data is fine when viewed/edited using the normal means on Wikisource, but the exported text is not. This means that no fix can be applied by normal editing.

It seems like there's a bug in the text export code. For a list of instances of this problem, see the listing here, where items are marked as exp (for export). The steps I took to generate the text file export are described here.

The issues result in extra spaces, or a problem with a paragraph break.

Some of the issues occur near page breaks in the underlying book, where a page with a captioned picture occurs. John O'Hanley (talk) 20:01, 28 December 2020 (UTC)

@John O'Hanley: I flicked through the "exp" labelled lines. From my looking at the whitespace errors, they are not the production technique they are the compilation by a person. ProofreadPage transclusion inserts a space between pages, so the errors that we did are things like not including the apostrophe inside {{hwe}} when the hyphenated word spans a page; not using exclude = when transcluding to skip blank pages (and their added space). Some of the white space issues that you reported seems to have been fixed by another user in mid-Dec. @Samwilson: can we look for the generation of a string of (soft) spaces and remove duplicates?
FYI - I have added exclude attributes for several other blank pages. John O'Hanley (talk) 15:11, 29 December 2020 (UTC)
I don't see the paragraph break issues when I quickly throw into PDF documents, and none of them are at page breaks so that sounds weird and I ask that you check those reported again. [I edited four Page: ns, and amended two transclusions in main ns]
Thanks for your report. — billinghurst sDrewth
Addendum, YES, I can regenerate the paragraph issue. It seems that they fail where the previous paragraph is incomplete though some sort of break, eg. interrupted by an image => example https://wsexport.wmflabs.org/?lang=en&page=Anne+of+Green+Gables+(1908)/Chapter+XXXVIII&format=txt&fonts=&images=false and compare with last page of text Page:Anneofgreengables-rbsc.djvu/457billinghurst sDrewth 05:29, 29 December 2020 (UTC)
I can see what is happening here with the wikitext. When transcluded paragraphs seem to generate </p><p> to start a paragraph, in these examples I see that the new paragraph just starts with a <p> with no prior termination. I am unable to determine what is happening with the prior components generation of the wikitext to cause that difference. Best we can do is force the issue to indicate that it is a new paragraph with something like a {{nopt}}. — billinghurst sDrewth 22:31, 29 December 2020 (UTC)
How would I do that workaround? John O'Hanley (talk) 23:16, 29 December 2020 (UTC)
My theory is that 1) last line of a page and first line of next page are enclosed in <p> ... page numbering span etc. ... but no divs... </p> 2) since there is an image between the two "text" pages, a div is inserted, making the <p> of the page before and the </p> of the page after, orphans, so they are stripped. And the poor first line after the image page is broken. Mpaa (talk) 01:17, 30 December 2020 (UTC)
That is
<p>"I forgave you that day by the pond landing,&#32;<span><span class="pagenum ws-pagenum" id="428" data-page-number="428" data-page-name="Page:Anneofgreengables-rbsc.djvu/454" data-page-index="454" title="Page:Anneofgreengables-rbsc.djvu/454"><span id="pageindex_454" class="pagenum-inner">&#8203;</span></span></span>although I didn't know it. What a stubborn little goose I was. I've been—I may as well make a complete confession—I've been sorry ever since."
</p><p>"We are
As suggested above, I added a {{nop}} and it works, but one must be careful not to leave blank lines in between otherwise an extra <br /> is added. It works but easy to get it wrong.Mpaa (talk) 16:17, 30 December 2020 (UTC)
Imitating your example, I have applied the {{nop}} workaround to the remaining 3 cases. You may want to verify the changes. Thanks to everyone for your help in this regard. Well done. John O'Hanley (talk) 18:08, 30 December 2020 (UTC)


I see some zero-width spaces in the text export of Anne of Green Gables (1908)/Chapter XIV. Is that intentional? This comes from the space between just and found:
  • U+0074 : LATIN SMALL LETTER T
  • U+0020 : SPACE [SP]
  • U+200B : ZERO WIDTH SPACE [ZWSP]
  • U+0066 : LATIN SMALL LETTER F

I have retested today, and updated the listing. Many items are now fixed, but not all. There are 7 items remaining, 3 with a ZERO WIDTH SPACE [ZWSP], and 4 with a missing paragraph-break issue (as noted above by billinghurst). John O'Hanley (talk) 16:56, 29 December 2020 (UTC)

I forgot to commit my changes to github - the listing should be updated in a moment... John O'Hanley (talk) 18:09, 29 December 2020 (UTC)

There's something wacky with this page. The image of the page is a mismatch for the text (wrong page). This page has one of those ZWSP issues. John O'Hanley (talk) 17:58, 29 December 2020 (UTC)

The thumb is frozen on the old page, if you put any other number than 1024 here, it displays the rigt page: https://commons.wikimedia.org/w/thumb.php?f=Anneofgreengables-rbsc.djvu&w=1024&p=164 I have not been able to refresh it. Mpaa (talk) 21:20, 29 December 2020 (UTC)
@John O'Hanley: we are typically visual checkers of text, and really until now that has not been identified as an issue unless we have the occasional link and template issue. Is it a deal breaker with the exported text, or is it more that you are seeing these issues. I would say that we have similar issues in many places due to OCR'd text and visual checking and fully unaware.
I understand. I appreciate that you are interested mainly in the visual appearance. But if you publish text exports, you don't have any control over other use cases for the text. It would be nice if the text output was a bit cleaner. It's not a serious issue for me at all, but it would, I think, be a nice improvement to your site. Perhaps a filter of some sort could be done on the text output? I can't see any reason not to strip out those ZWSB characters, for example. That would be simple to implement, no? Similarly, I can't see any reason not to change double-spaces into single-spaces. Just strip them out; you wouldn't need to waste time tracking down the exact cause... The issue with missing paragraph breaks is a more important defect, though, and should be addressed. John O'Hanley (talk) 23:08, 29 December 2020 (UTC)

Pictogram voting comment.svg Comment re "dinner-time", "dinner time" and "dinnertime". We proofread against what is there, not what dictionaries show. There will even be variations within a work, and I noticed in that work two variations of the same word with regard to hyphenation, so not overly fussed with that one. — billinghurst sDrewth 22:05, 29 December 2020 (UTC)

  • &#8203; is part of the page numbering span, I don't see it in the text. — billinghurst sDrewth 22:15, 29 December 2020 (UTC)
I see ZWSP in the text. I use this tool. John O'Hanley (talk) 23:13, 29 December 2020 (UTC)
I see a ZWSP for each page numbering span, most likely as they are inside a valid span they are considered text and exported. I agree that stripping them in the export tool would be the easiest thing to do. E.g.
the pond landing,&#32;<span><span class="pagenum ws-pagenum" id="428" data-page-number="428" data-page-name="Page:Anneofgreengables-rbsc.djvu/454" data-page-index="454" title="Page:Anneofgreengables-rbsc.djvu/454"><span id="pageindex_454" class="pagenum-inner">&#8203;</span></span></span>although I
Mpaa (talk) 21:20, 29 December 2020 (UTC)
If we don't want the pagenumber spans to go to export at all, we can just add ws-noexport to the inner span, and the export tool will drop it, leaving an empty outer span. Or drop the whole thing with ws-noexport in the outer span.
The ZWSP was inserted after a discussion in 2019. @Xover: is it still something that belongs in MediaWiki:Proofreadpage pagenum template and not in the JS? Inductiveloadtalk/contribs 09:02, 30 December 2020 (UTC)
@Inductiveload: In the message you just edit-conflicted—:)—I was going to suggest ws-noexport'ing the whole page number span. So far as I can tell we neither need nor want it in the output; but ebook export is not something I've looked at, which was why I was going to ping you for input on that. :)
The zero-width spaces are still needed in the template, and for the same reasons. But now that we can more easily hack the script we can prevent it from immediately overwriting them (grr!). There are some possible approaches that might obviate the need which we could explore, but that's a ways down the line. --Xover (talk) 10:52, 30 December 2020 (UTC)
@Xover:, OK I ws-noexport'd the entire thing. There's no useful text content there anyway at export. If we can think of some useful way to use the pagenum span in export one day, we can revisit it easily enough. Inductiveloadtalk/contribs 11:01, 30 December 2020 (UTC)
Confirmed: I see no more ZWSP issues. Thank you! John O'Hanley (talk) 14:32, 30 December 2020 (UTC)

Template:Engine to index pages by default[edit]

Short time ago I learnt about an excellent template {{Engine}} thanks to a billinghurst’s contribution in a discussion above and about the possibility to add it to an index page. This proved so useful to me that I would like to suggest making it accessible at all index pages by default. Is it technically possible? --Jan Kameníček (talk) 14:20, 30 December 2020 (UTC)

I would prefer not to. Primarily it only works after the text has been added to the pages, so of no value to a new work. Secondarily, with compiled works we make it more general—multiple Index:, rather than specific to the Index:—and it is in those multi-volume works that it has best value. If you think that there is value to it, then possibly we can mention it on the page that discusses building an Index: page, and give some instruction about it. — billinghurst sDrewth 22:40, 30 December 2020 (UTC)

Public Domain Day 2021[edit]

A reminder that many works will enter public domain on Jan 1, 2021. We have a page Wikisource:Requested texts/1925 that lists some of these works.

Please keep in mind that:

  • Some works enter public domain in their country of publication but not in the US, in which case we would not host them on Wikisource.
  • Some works enter public domain in the US, but not in their country of publication, so scans of these works must be hosted locally on Wikisource and not on Commons. There are notes about the date an Author died on the page listings to make this determination easier. Most often, the issue is a UK publication, where copyright applies until 70 years after the author's death.
  • And some works are entering public domain in the US, and were published in the US or were already in public domain in their country of publication. These works should have their scans uploaded to Commons.

Also keep in mind that some databases will have imperfect scans. It is always a good idea to give a new scan a thorough check to ensure no pages are duplicated, missing, upside-down, or otherwise problematic before starting transcription. Some scans entering public domain have not had many sets of eyes checking them. And a happy Public Domain Day to everyone. --EncycloPetey (talk) 00:03, 1 January 2021 (UTC)

I wish you a happy Public Domain Day, and TBH I care about that more than the actual "New Year" holiday. As a side question, do works all officially go into the public domain at 12:00 AM EST? Or is it when all US time zones have crossed to 2021? PseudoSkull (talk) 00:37, 1 January 2021 (UTC)
That's a tricky question, since the US publications were generally published in New York (EST) but the servers are in PST. I generally start once the scan of a work is available at whatever scan repository hosts it, which can be several days after. I do not know what timing Commons uses for this issue. --EncycloPetey (talk) 00:43, 1 January 2021 (UTC)
yeah, the copyright lawyers tend to go for date and year only, so no case law for time of day, either UTC, EST, or NZST. it's five o'clock somewhere. Great Gatsby and New Yorker have the most buzz, but hard to find first editions. Slowking4Rama's revenge 01:13, 1 January 2021 (UTC)

Please undelete previous versions of The Great Gatsby. Thanks. —Justin (koavf)TCM 02:54, 1 January 2021 (UTC)

We will want to do that when the clock hits 12 AM EST at least. That's 1hr8m from time of posting. (But we need to make sure there's a source DJVU or PDF first.) I'm hyped! PseudoSkull (talk) 03:53, 1 January 2021 (UTC)
I took a look. Only one of the deleted Gatsby edits contains the text of the novel, as a single-page info-dump, from Gutenberg Australia. I do not know how reliable their texts are. --EncycloPetey (talk) 05:17, 1 January 2021 (UTC)
that scan version. [4] has a 1953 renewal. Slowking4Rama's revenge 21:25, 1 January 2021 (UTC)
As I understand it, renewals for 1925 works should not be relevant any more… Am I right? --Jan Kameníček (talk) 21:37, 1 January 2021 (UTC)
Correct. And thus, our tags of 1925-not-renewed templates should just be changed to 1925-expired, as that trumps even the lack of renewal. PseudoSkull (talk) 21:59, 1 January 2021 (UTC)
it is not a first edition, so you should not put a 1925 on it. which may not matter for this work, but some authors revised considerably, and the general lack of firsts is a problem. Slowking4Rama's revenge 01:39, 3 January 2021 (UTC)
That is true. However, there is the original 1925 edition available too. --Jan Kameníček (talk) 01:46, 3 January 2021 (UTC)
And the first edition is now available at File:The Great Gatsby (1925).djvu and Index:The Great Gatsby (1925).djvu, at the full resolution HathiTrust has (but won't let you download). Work is ongoing at Index:The Great Gatsby - Fitzgerald - 1925.djvu using the post-1953 reprint, but I might actually be tempted to dive in on this one myself just so we have the first edition too. Time permitting… --Xover (talk) 03:26, 3 January 2021 (UTC)
it seems to be fully downloadable now. don't know that the editions are different enough to matter. we should really cultivate a wikipedia library / academic librarian with hathi access to work a download list. Slowking4Rama's revenge 21:42, 3 January 2021 (UTC)

Well let's make it official[edit]

Happy Public Domain Day, as it is now 2021 in EST, a time zone within the US! Please welcome our new handful of public-domain works, now freed of their needlessly restrictive copyrights. PseudoSkull (talk) 05:03, 1 January 2021 (UTC)

If we want to be generous, I suppose we should wait three more hours for PST to go into 2021, though. PseudoSkull (talk) 05:05, 1 January 2021 (UTC)
If you want to be really thorough, wait for Hawaii. BD2412 T 07:31, 1 January 2021 (UTC)
I would like to wait for UTC-12 to enter New Year next time to be fully thorough, despite no one living on Baker nor Howland Island.--Jusjih (talk) 04:27, 16 January 2021 (UTC)
I agree with this. BD2412 T 17:41, 23 January 2021 (UTC)
I don't see any point in being so careful. Even pedantically speaking, I can't see it being worse than filing in the federal courts in California, under PST, and even that I think would be tossed out as de minimis. It's cutting such thin straws to push it after that.--Prosfilaes (talk) 18:59, 23 January 2021 (UTC)
Okay, then, on January 1, 2022 at 7:00 AM EST, several films from 1926, under copyright right now, will have transcriptions automatically added to Wikisource by PastLovingBot. By then no incredibly litigious lawyer with nothing better to do could argue that we uploaded something 1 hour before it technically went PD for all US time zones, because some EVIL SAILOR who is stranded on Howland Island might PIRATE the movie there via satellite Internet, and have it finished downloading 1 hour before midnight. What kind of HORRIBLE man that sailor would be, isn't that right, copyright lawyer? Shame on him! PseudoSkull (talk) 18:11, 23 January 2021 (UTC)

Feedback requested on font template[edit]

I have used {{Blackletter}} before for these Fraktur forms on Wikisource but I was surprised to see that there was no equivalent for w:en:Uncial script, so I made {{Uncial}}. I'm requesting others provide feedback for 1.) which fonts would be best to cascade and 2.) how to import those fonts to our CSS (is this via a phab: ticket?). Thanks and happy Public Domain Day! —Justin (koavf)TCM 21:43, 1 January 2021 (UTC)

Reminder: The Wikipedia Library Card Platform[edit]

Probably hasn't been said in a while, however, through Wikimedia we can get ready access to a range of research databases through your Wikimedia SUL where anyone with a modicum editing qualifies

There is also access to other resources based on an application process. — billinghurst sDrewth 03:56, 2 January 2021 (UTC)

Thank you for the reminder and link. I've often been stymied trying to fix an obviously mistyped quote over at Wikipedia. This will help. Shenme (talk) 00:34, 3 January 2021 (UTC)

The Center of the Web, Part 1?[edit]

There is something confusing about this film, in that in the title card of this print it is labelled as "The Centre of the Web / Part 1" (not yet uploaded to Commons, I have requested another user upload it for me as I don't have the tools to edit it available to me). The print comes from the British Film Institute, and this title card is clearly one of two intertitles instated by the BFI themselves (the other being within the film, as a reconstructed intertitle). So, the title card is not the original title card (the original intertitle would probably look like the ones at A Dog's Love or Shep's Race with Death, both made by the same director in the same year). I find this claim of it being the first part of the film to be a bit dubious, but it's not entirely out of the realm of possibility that there are more "parts" to this film that have now been lost, or perhaps were planned at the time but never made/finished.

The film does seem to leave you with a cliffhanger (see the letter at the end), suggesting that there may be more to the story ahead...although admittedly this film was a bit difficult for me to understand on first watch, so I could be wrong that that was a cliffhanger. There are no listings for a second part to the film, anywhere that I could find. The only place I could find where it is ever mentioned as a "Part 1" is on this print itself; elsewhere, the film is described as if it is a single work in and of itself; the plot summaries only describe the content of this print. So, I have no idea why the film preserver decided to place "Part 1" there. There's no way to know if it was present in the original film title. So I think I'm going to assume in the transcription for Wikisource, like other places are doing, that there is no Part 2. PseudoSkull (talk) 07:04, 2 January 2021 (UTC)

You can always include an abbreviated explanation in the "notes = " about the general doubt that a part 2 exists. You can also include full details of what you've fund on the Talk page. If you've already done the research, then summarizing it for future readers would probably be appreciated. --EncycloPetey (talk) 19:23, 3 January 2021 (UTC)
Upon some speculation, I have thought of the possibility that perhaps whoever compiled the print at one point separated it into parts, and once they combined the two parts later, they forgot to remove the title card containing "Part 1" and replacing it with a more generic title card. But we'll never know. PseudoSkull (talk) 20:44, 3 January 2021 (UTC)

Index:Mrs Beeton's Book of Household Management.djvu[edit]

Proofreading is still needed on the index, and cross referencing it to the main work. ShakespeareFan00 (talk) 13:02, 4 January 2021 (UTC)

Just focus on proofreading and validating the text. There are bot tools to add in the links, but these are dependent on having accurate page numbers in the index, and must wait until the text is validated. Beeswaxcandle (talk) 16:32, 4 January 2021 (UTC)

Can anyone comment on this website of LOC documents?[edit]

getarchive.net — Ineuw (talk) 01:23, 5 January 2021 (UTC)

Appears to be some kind of database that wraps Pubic Domain data sources like LoC and Getty.edu's collections and then tries to get you to pay for access to high resolution assets and/or sell you some prints and/or hawk the company's other wares. Could be useful for search, but probably need to head upstream for the original documents. Inductiveloadtalk/contribs 01:32, 5 January 2021 (UTC)
I knew there had to be a catch. Thanks for checking. — Ineuw (talk) 01:57, 5 January 2021 (UTC)

Video games[edit]

I noticed there are no video game transcriptions here as far as I can tell, and I know there are freely licensed games that are popular and would probably meet our standards for inclusion. So I started this WikiProject/proposal page that explains my thoughts on 1.) why we should have video game transcriptions at Wikisource, to follow the same rules as for books or film, and 2.) some ideas on how transcription of games could be implemented. Video games are culturally significant and do count as works, so I think transcribing them here would be acceptable. As mentioned on the page though, a great majority of the video games we would normally include are going to be copyrighted in the US for 60+ more years (and in other countries much longer), but we should start now with what we have that's free if we're going to do it.

I'm willing to create a prototype of a transcription for a freely licensed video game, just to show my thoughts on what such a thing might look like. Any thoughts/comments/interest? PseudoSkull (talk) 07:46, 6 January 2021 (UTC)

Which games would be copyright-okay here, please?--Jusjih (talk) 05:08, 19 January 2021 (UTC)
@Jusjih: I started a list here. These are a few games that were self-released into the public domain. Note that any game we cover not only has to have freely licensed source code, but freely licensed data—that would include the text you see during gameplay. Some games are open source but the data is copyrighted. For more info, see also Wikisource:WikiProject_Video_Games#What_video_games_are_free? PseudoSkull (talk) 05:15, 19 January 2021 (UTC)

Google spreadsheet based tool to help TOC, chapter pages creation[edit]

I created a Google Sheet tool to help prepare content for TOC page,Chapter pages. I have tried it on Telugu Wikisource. Make a copy of the spreadsheet and run the google script, by accessing Wikisource menu on the spreadsheet. You need to authorise the script to access your Google Drive, display sidebar first time you run it. By entering basic book details like index filename, author,year,chapter prefix,chapter name and page number, you can easily prepare content for TOC, chapter pages. this content appears in a sidebar, from which the content can be copied and pasted into Wikisource. Chapter content is formatted for using with pwb's pagefromfile script. I request users to try it and give feedback. If any better tool is available, please share.--Arjunaraoc (talk) 07:22, 7 January 2021 (UTC)

Please join the English Wikisource Discord server[edit]

@Reboot01 and I a while back had suggested we create a Discord server for the English Wikisource, and now I've been bold and created it. Please join if you're interested in connecting with others on this project via Discord. I will give any administrators who join the Administrators role.

Invite link: https://discord.gg/g5UfBT6epz (permanent) PseudoSkull (talk) 18:25, 7 January 2021 (UTC)

@PseudoSkull: Any particular reason this isn't a separate channel on the English "Wikimedia Community" Discord server? Mahir256 (talk) 18:30, 7 January 2021 (UTC)
@Mahir256: Because we're a different community from Wikipedia, and their community etc. is very different from ours. The amount of chat that the server could (possibly) end up having might make it excessive for the two communities to be intertwined. For context, I also started the English Wiktionary's Discord server which is also completely separate from the Wikipedia server, and the server is successful anyway, as a large portion of the Wiktionary community is still active on that server to this day. I don't know how many users here at Wikisource use Discord, but I set it up in case they want a chat outlet through that service. PseudoSkull (talk) 18:37, 7 January 2021 (UTC)
there is also a wikisource telegram, but it is more indic wikisource. nice to have another channel, but unclear use case. Slowking4Rama's revenge 03:26, 15 January 2021 (UTC)

Impermanence of Sexual Phenotypes by I. A. T. Savillo[edit]

Authorship for this article and the article itself is requested to be added in Wikisource. This is to disseminate this information about the unchallenged content of this article (The recent finding that there is no gay gene). In fact, Regarding the recent rape slay case (if ever there is a rape) in the Philippines done by "gays" is of note that this article applies to. This article is linked in Phenotypic plasticity in main Wikipedia but it is much better if it is also in Wikisource so people will be aware of it. I do not have to expound on the article but this is for the people to read and judge themselves and to be careful. 110.54.251.176 22:52, 7 January 2021 (UTC)

We store modern free works, but this is very unlikely to be free, and I certainly don't appreciate issue pushing.--Prosfilaes (talk) 00:49, 8 January 2021 (UTC)
Wikisource:What Wikisource includesbillinghurst sDrewth 01:21, 8 January 2021 (UTC)
Thanks.110.54.251.176

How do I make corrections to the djvu file?[edit]

See for example History_of_the_Municipalities_of_Hudson_County,_New_Jersey,_1630-1923/Volume_3/Freudenberg,_Arthur_Oscar, how would I fix errors in the text? Is the text housed at Commons? I have only dealt with the text that is right on the page here. --RAN (talk) 18:28, 9 January 2021 (UTC)

@Richard Arthur Norton (1958- ):There are page numbers in square brackets to the left of the text. When you click one of them, you get to the page, e. g. when you click on [729] in the above mentioned article on A. S. Freudenberg, you get to Page:History of the Municipalities of Hudson County (1924), Vol. 3.djvu/461. On the right there is the scanned page and on the left there is the editable text. Then you simply click the edit button and the rest is easy. --Jan Kameníček (talk) 18:39, 9 January 2021 (UTC)
Thanks! Are we allowed to add in links to Wikidata and Wikipedia in djvu files? Can you remind me how to handle typos in the original printed text, mark them with "sic", but where would you show how it should read? Do you add a footnote? Do you just ignore it? For instance an article may say someone was born in 1900, and a more reliable source, like a birth certificate stored at Commons, may say that he was born in December 1899. We see that a lot in obituaries where they count back from the person's age, and can be off by a year. --RAN (talk) 19:33, 9 January 2021 (UTC)
@Richard Arthur Norton (1958- ): Unlike Wikipedia, we do not try to make here reliable articles but we just transcribe original sources as faithfully as possible. So, if a source contains a factual error, we reproduce it as it happened, because the error is simply an inseperable part of the original historical document. When the original text contains a typo, we keep it as well, although it can be pointed out using the template {{SIC}} (written in capital letters). As for e.g. mistakes in death years in obituaries, it can sometimes be considered to be just a typo as well, and so it is imo acceptable to indicate it using the SIC template too. We do not mark it in the printed document, only in the transcribed text. As for the footnotes, these are used only when there are some original footnotes in the original text. --Jan Kameníček (talk) 20:39, 9 January 2021 (UTC)
Oh, I forgot to answer about the links. Many people consider placing hyperlinks leading to Wikipedia articles into the transcribed text to be a sort of annotation and avoid it, although from time to time you can see texts where such links were added. I personnally do not do it either. However, if the transcribed text is about the same topic as a Wikipedia article, such a link can be placed into the "wikipedia" parameter of the header, see the template {{Header}}. --Jan Kameníček (talk) 21:11, 9 January 2021 (UTC)
there is some limited wikilinks in the work header template. also Help:Contents is your friend. Slowking4Rama's revenge 03:29, 15 January 2021 (UTC)

Not leaving redirects behind in moves[edit]

Is there an option to not leave a redirect behind when you move an entry. The entry will the old title will still show in the index, and I want to avoid that. --RAN (talk) 21:44, 9 January 2021 (UTC)

Administrators have that ability. For non-admins, you can tag the resulting redirects for speedy deletion. BD2412 T 23:58, 9 January 2021 (UTC)
I started answering this a couple of hours ago, but then my wifi died spectacularly. Just got back to find the above response. I'll extend it a little with:

Only sysops can suppress a redirect when moving a page. If the old place is a recent page that has no on-wiki incoming links, mark it for speedy deletion using {{sdelete|M2 - unneeded redirect}}. If it's an older page that may well have external links to it, then change it to a dated soft redirect {{Dated soft redirect}}. Beeswaxcandle (talk) 01:03, 10 January 2021 (UTC)

Best practices for a magazine[edit]

I've started working on the first issue of The New Yorker and while I've searched the project pages, I don't really see a good guide on how to best transclude scans of magazines. Other periodicals seem to have individual articles broken up into text, which is probably wise for readability but it seems like there is also value in transcribing pages as a single document, too. Additionally, I seem to have screwed up the very simple task of trasncluding pages as well, so I guess I'm a total mess. Is anyone interested in helping me? Thanks. —Justin (koavf)TCM 10:28, 10 January 2021 (UTC)

that's very good work, 44 issues to go. you might want to cut each issue out from file:The New Yorker volume 1, numbers 1-19.pdf, and file:The New Yorker volume 1, numbers 20-45.pdf (thank you hathi and google books) in the future, you will want to combine single pages into a multipage pdf on commons to make it easier. Slowking4Rama's revenge 02:56, 15 January 2021 (UTC)
@Slowking4: "in the future, you will want to combine single pages into a multipage pdf on commons to make it easier" It is: File:The New Yorker 0001 1925-02-21.pdf. What do you mean? —Justin (koavf)TCM 05:16, 15 January 2021 (UTC)

Export links in the sidebar[edit]

Hi everyone! Next week we (the Community Tech team, of which I'm a part) are going to roll out phab:T256392 which will move the functionality of the MediaWiki:Gadget-WSexport.js gadget into the Wikisource extension. The main change for English Wikisource will be that we can remove the gadget, but for other Wikisources it'll mean that have the export links (lots are missing the gadget, and quite a few have it but it's not translated into their language). I and the other CommTech engineers will be scooting around cleaning up scripts wherever we can, but if anyone notices anything amiss please let me know! This is a smallish change and is a precursor to the larger change that hopefully will come soon of enabling a 'download' button at the top of works (as some Wikisources already do). See phab:T266262 for more about that. —Sam Wilson 10:49, 10 January 2021 (UTC)

@Samwilson: excellent news! Thank you for this. I just added the "PDF (beta)" option to the enWS sidebar, because the normal "PDF" option is almost entirely useless for Wikisource works.
Can we have that removed as part of this, or should raise another Phab ticket? As well as the "Compile a book" link which is currently not working at all, as far as I can tell. It's confusing for users to have these two broken tools presented above the WS-export links. Inductiveloadtalk/contribs 14:43, 10 January 2021 (UTC)
@Inductiveload: Yes, good point: the Special:DownloadAsPdf link will be removed in this change (because it's replaced by WS Export; the function will still exist though, if people want to use it manually e.g. for Help pages), but the Special:Book link won't be removed (although I suspect it should be because it is pretty confusing… I can't find a task for this yet so will create one). —Sam Wilson 00:21, 11 January 2021 (UTC)

@Samwilson: I am guessing that the translations will be held at Translatewiki as part of the WM message kit. Are you able to send a list of the required labels to Wikisource-L so that those communities can start preparing, and also to ensure that the Tech newsletter has good instructions on how to complete the translations. [Many will not be transkatewiki aware.] — billinghurst sDrewth 14:48, 10 January 2021 (UTC)

@Billinghurst: Yep, they'll be done (and some already have been) on TranslateWiki. Here's the full list. I've copied over all the existing translations that I could find on the various gadgets that are already in use, so those Wikisources that already had translations will continue to have the same ones. Here's the translation progress overview. I'll post to the list. I don't think it's worth putting a note in the tech newsletter because the missing translations will be done in due course, and we don't usually announce the existence of new interface messages. —Sam Wilson 00:21, 11 January 2021 (UTC)

This change has been completed. No more multiple PDF links for us! :-) Let me know if you notice anything odd. —Sam Wilson 23:03, 13 January 2021 (UTC)

Tech News: 2021-02[edit]

15:42, 11 January 2021 (UTC)

TOC templates[edit]

The templates {{TOC begin}} and friends have been updated following clarification at phab:T232477. This means that page numbers half-way though ToCs should now work (example).

There should be no changes to any page, but there is a new behaviour. Previously, whitespace between row templates was ignored. Now, it will be part of the last cell. One space between rows (including newlines at the end of the last parameter) is OK, but more than that will be added to the rows:

{{TOC begin}}
{{TOC row 2-1|This row is OK|1}}

{{TOC row 2-1|This row is OK|2}}
{{TOC row 2-1|This row is OK|3
}}
{{TOC row 2-1|This row will have extra space in it|4}}


{{TOC row 2-1|This row will also have extra space in it|5
}}

{{TOC row 2-1|This row is OK|6}}
{{TOC row 2-1|This row is OK|7}}
{{TOC end}}
This row is OK 1
This row is OK 2
This row is OK 3
This row will have extra space in it 4


This row will also have extra space in it 5


This row is OK 6
This row is OK 7

I believe that I have found and fixed any cases where this could have occurred. If you see anything that has not been fixed, please let me know. Inductiveloadtalk/contribs 17:43, 11 January 2021 (UTC)

RFC: Manual addition of license categories to pages???[edit]

Can anyone think of a reason why we would manually categorise works, authors or portals by the categories subsidiary to Category:Works by license or Category:Authors by license? I would think that we would be looking for all additions to be by application of a license template (see Help:Copyright tags).

I am just doing some further configuration of HotCat and I would like to exclude all those categories from a search selection through HotCat; though seeking comment prior to actioning my thoughts. — billinghurst sDrewth 06:22, 13 January 2021 (UTC)

If you want to test, I have excluded both Category:PD-old-80 and Category:Author-PD-old-80 as a proof of concept, and you have to get to the 8 to do a decent test. — billinghurst sDrewth 06:29, 13 January 2021 (UTC)
I'm not aware of any reason.
I did a search to see if I could find a use case and found some interesting statistics:
  • [[Category:PD has 989 hits
  • [[Category:PD-USGov has 906 hits
  • [[Category:PD-USGov-POTUS has 897 hits
  • {PD-USGov.*[[Category:PD-USGov-POTUS has 888 hits
Seems to me like the latter group should be stripped of the category ({{PD-USGov-POTUS}} was deleted) and that'll fix 90% of the problem for PD tags. BethNaught (talk) 09:00, 13 January 2021 (UTC)

Collaboration with Google around Children's literature[edit]

Hello everyone,

Wikimedia Foundation is planning a collaboration with Google Read Along for a project around illustrated children’s literature. While the entire project is being scoped out, in the first phase Google Read Along will take works from Wikisource and integrate on their platform. This will allow younger audiences to access works from Wikisource in a mobile-friendly platform and also learn how to read the language, which, as the name suggests, the Google Read Along platform is basically designed to do.

We have shared the Portal:Children's literature with Google so far. There are certain books here which match the expectations and requirements of Google, such as the poems in The Real Mother Goose. It has good illustrations and short poems. I would like to request your help in transcluding this book.

Additionally, it would be really helpful if you can help identify other children’s books that have short tales, poems, stories etc. accompanied by good illustrations. Perhaps you can add those in the portal itself.

You can read more about Google Read Along here: https://readalong.google/

Feel free to ask any questions that you might have. Thank you so much!

--SGill (WMF) (talk) 07:32, 14 January 2021 (UTC)

@SGill (WMF): What does Wikisource get out of this? Step one basically sounds like "help Google slurp our content". Of course our mission is to create high quality, open source digital books, but I don't want to let Google take the credit. Or, why don't they hire or train a Wikimedian to improve the content they want to reuse, instead of asking us to do that work for free? BethNaught (talk) 08:39, 14 January 2021 (UTC)
Seconded. At the very least, I hope they put a nice big pretty notice saying (paraphrased) "This comes from, and was prepared by, Wikisource. They'd love for you to get involved in preparing more Public Domain books for children and adults alike. Come on over and have a chat :-)".
I'll be thrilled if it genuinely gets us more exposure, but less chuffed if we're just a cost-saving on a Mountain View management spreadsheet to avoid them having to pay an intern to transcribe a couple of hundred poems (a task that would be substantially easier if the millions of books they've scanned weren't scanned in bottom-drawer quality, and compressed so hard the words literally fall out). If that's their aim, I might suggest Portal:IWW instead! Inductiveloadtalk/contribs 09:48, 14 January 2021 (UTC)
Wikisource gets someone else getting access to these works, which is the point of site. You won't find many who dislike FAANG/Frightful Five companies more than me but 1.) they are undeniably good sources for getting our work out to the public and 2.) free licenses mean free licenses. I second Inductiveload's proposal for more I. W. W. works being propagated. —Justin (koavf)TCM 10:06, 14 January 2021 (UTC)
@BethNaught:, @Inductiveload: and @Koavf: Thank you for your comments. The idea here is definitely to get more exposure to Wikisource and its content. We have been talking with other organizations regarding the same as well and hopefully we will share those with you soon. There will definitely be attribution for Wikisource, perhaps even a separate section on the application, but exactly how this happens still needs to be figured out. The first phase essentially is to integrate existing content which meets their specific needs (good illustrations and relatively short pieces of text). We are already discussing the idea to hire/train a Wikimedian to help them in this in the next phases. The questions before the next phases can be envisioned is: Are there a lot of works on Wikisource that would meet their requirements or are there only a handful one?
It would be really good if you can help surface some of the proofread/validated texts that you feel might fit into these requirements (good illustrations and relatively short pieces of text).
Also, what do you think would be the best way to search for texts on Wikisource that have illustrations and are written for a the age group 3 to 8?
I looked at I.W.W. works but they don't look like works written for children that Google Read Along can integrate as it is basically a platform for children, ideally between 3 and 8 years of age. --SGill (WMF) (talk) 10:18, 14 January 2021 (UTC)
@SGill (WMF): To be clear, the I. W. W. reference was (partially) a joke: I don't think that anyone would seriously expect that Google would make a bunch of union songs available for children. (But we can always hope!) That said, I think the best way to find relevant texts is thru the portal linked above, which also includes some suggested reading levels. There is also Category:Children's literature and the related portals here. I suppose I would be concerned that kids just may not be interested in older material but I'd be happy to be proven wrong (I loved The Box-Car Children when I was nine.) Additionally, a Wikisource-adjacent project is Wikijunior which has some original works intended as reference volumes for children. I'm not sure if Google are motivated to include those as well but I figured I would point it out as specifically kid-friendly Wikimedia content. —Justin (koavf)TCM 10:28, 14 January 2021 (UTC)
@SGill (WMF): Yes, that was supposed to be a joke. I'm not against free use of Wikisource content. Indeed, that's the whole point of WS. And I'[m CERTAINLY all for exposing Wikisource more, because awareness of Wikisource is near zero in the general public. Even people who know Project gutenberg don't know us. I just don't want to end up having WS users faffing about to help Google scrape up content, for free, to put into a walled-garden Google-branded mobile-only app without some kind of reciprocity. Inductiveloadtalk/contribs 12:12, 14 January 2021 (UTC)
hardy hardy har. really laughing at the google bashing, smdh. as someone in the room, when author's guild said "hathi trust is just a pirate like google", i regret i did not respond: "arrrrgh". the enemy of my enemy is my friend. Slowking4Rama's revenge 02:46, 15 January 2021 (UTC)
There is also Portal:Children's fairy tales and Portal:Children's poetry. --Jan Kameníček (talk) 11:49, 14 January 2021 (UTC)
@SGill (WMF): The Real Mother Goose is done. It may not be the best example since some of its formatting is awkward and the images relatively low resolution, but it's probably roughly representative. @Inductiveload: Could you give it a "Ready for export" check?
Also, since this is a relatively novel application for us, it would be interesting to get feedback on our markup structure, metadata, and visual layout from the perspective of the Read Along project. What makes it hard for them to reuse our content and what made it easier? What would be their wishlist for what we should provide? Does connecting our pages to Wikidata help them at all? Are they able to reuse our formatting or will they end up having to do a lot of transformation?
Let me also take the opportunity to echo Inductiveload's comments regarding Google Books. If an opportunity should present itself to channel feedback in that direction there are a number of issues that makes Google Books close to net negative value for our purposes, and we would love for the opposite to be true. Scan quality is the biggest issue (and without decent scans we can't add works with nice pictures for the Read Along guys!), but there's also things like lack of stable URLs, arbitrary copyright practices, hard to download scan images, bad bibliographic metadata, etc.
@Xover: Yes check.svg Done ; a few tweaks were needed (an align=center; the first few poems were using blank lines to separate lines), but it now renders well in Koreader and "OK" in Moon+Reader. The images are actually pretty nice and big at Commons (e.g. File:The Real Mother Goose pg 5.jpg is over 1000px wide), we just don't embed them at a very large size (in this case, using the default "frameless" size of 180px). Presumably Google has enough nous to get the originals for processing into whatever internal format they need for their app. Inductiveloadtalk/contribs 13:32, 14 January 2021 (UTC)
yeah, +1 picking picture books which is a wikisource>gutenberg, but a single book of fifty pages is not much of a challenge. rather they should double down with a list from Category:Children's books or https://ufdc.ufl.edu/juv, or Roller Skates or Newbery Medal, with some prizes to get some traction. (we need better tools to generate list of works by subject with status; now that all the IA books are at commons, a work list would be helpful ) Slowking4Rama's revenge 02:30, 15 January 2021 (UTC)
FYI, Roller Skates' copyright was renewed (Renewal: R341460), so it's not PD until 2031. Though Google can probably license the rights from Saywer's grandkids, we can't host it yet. In fact, every Newbery winner from 1926–1950 was renewed. :-( Now, if Google really cared about providing full access, they'd buy the rights to them all and donate them to the PD. Inductiveloadtalk/contribs 11:19, 15 January 2021 (UTC)
i see 9 texts on Newbery 1922-25. but they are older more text than pictures. Slowking4Rama's revenge 00:57, 16 January 2021 (UTC)
  • SGill (WMF): How long is the longest individual work that is desired? Certainly, short, multi-page poems would be appropriate, but for stories, how long would be too long? I am interested in adding new works for this project, but I don’t want to add any work which would be too long to be useful. TE(æ)A,ea. (talk) 01:46, 20 January 2021 (UTC).

Page clean-up observation.[edit]

Sorry if this issue is known about already, thought I'd raise it just in case.
I have noticed that when I use the page clean-up button after I have created a table with a pad left entry in it, that my {{ts|pl2}} entry is changed to {{ts|p12}} which breaks the formatting. Not a major issue but just wanted to highlight it. The page I created is here Sp1nd01 (talk) 10:47, 15 January 2021 (UTC)

@Sp1nd01: This will be the rule .replace(/[il]([0-9])/g, '1$1'). I think generally it's better to run cleanup before adding templates, as it can be tricky to tell Wikicode from OCR mistakes, because Wikicode is not plain text, and the cleanup assumes it's looking at page text. Inductiveloadtalk/contribs 11:01, 15 January 2021 (UTC)
Thanks for the explanation, I'll ensure that I only run page clean-up before I add templates from now on, I have been running it after I edit a page just in case I inadvertently add issues. Sp1nd01 (talk) 11:27, 15 January 2021 (UTC)
I had not known this gadget which looks useful, so I searched for it and found out that it is provided by Wikisource:TemplateScript. So, I copied the code into my common.js as suggested there, but nothing changed, at least I do not see any change, no new button appeared, nothing. I tried clearing my browser’s cache, but it did not help either. Did I miss anything? --Jan Kameníček (talk) 15:16, 15 January 2021 (UTC)
@Jan.Kamenicek: The script is this one:
mw.loader.load('//en.wikisource.org/w/index.php?title=User:Samwilson/PageCleanUp.js&action=raw&ctype=text/javascript');
I also have a more in-depth one here, but it's pre-alpha in terms of reliability and it needs better config options:
mw.loader.load('//en.wikisource.org/w/index.php?title=User:Inductiveload/cleanup.js&action=raw&ctype=text/javascript');
Inductiveloadtalk/contribs 15:34, 15 January 2021 (UTC)
@Inductiveload: Thanks! I have just tried it and looks really great. Can I update Wikisource:TemplateScript, replacing the advised code there? --Jan Kameníček (talk) 15:46, 15 January 2021 (UTC)
@Jan.Kamenicek: makes sense to me. Though I think really it's not a "templatescript" script (even if it uses it internally), so it could really go on a new page, maybe Wikisource:Tools and Scripts/PageCleanup. And then link there from Wikisource:Tools and scripts. Inductiveloadtalk/contribs 17:15, 15 January 2021 (UTC)
If the whole page should be moved to Wikisource:Tools and Scripts/PageCleanup, then I will probably leave it to somebody else, as do not understand the page in detail and there might be more things to take care about. For example the other code mentioned on the page, which should enable some script dealing with typography, diacritics and letter case, probably needs to be updated too. --Jan Kameníček (talk) 21:31, 15 January 2021 (UTC)
@Inductiveload: There is one more advice which does not work: "Go to Special:TemplateScript to disable scripts you don't use", but the provided link does not work. Where can I disable it? The gadget as a whole is very useful, but I do not like that it changes all curly quotes and apostrophes into straight ones, although the curly quotes have already been approved here, and so I would like to disable this particular feature. --Jan Kameníček (talk) 13:42, 16 January 2021 (UTC)
Instead of using this sidebar version, use the toolbar pair mentioned on Wikisource:Tools and scripts. There's one for page cleanup and the following one for curly quotes. Once they're in, it's just two mouse clicks and done. The sidebar version is used by contributors like myself who don't have the toolbar switched on. Beeswaxcandle (talk) 17:22, 16 January 2021 (UTC)
Ah, I can see it, thanks very much, I will use it. However, it seems quite redundant to turn all the curly quotes into the straight ones by one tool and then turn them back again by another tool… --Jan Kameníček (talk) 17:57, 16 January 2021 (UTC)
Not redundant really. I frequently get OCR'd pages with a mixture of curly and straight on them. Best to turn them all to straight first, which allows the correct pairings to happen with the optional second tool. Beeswaxcandle (talk) 18:44, 16 January 2021 (UTC)
Ah, true. --Jan Kameníček (talk) 20:01, 16 January 2021 (UTC)

For the sake of simplicity[edit]

Can I use this "<!-- -->" universally, regardless if table, or end of paragraph, or page?— Ineuw (talk) 19:39, 15 January 2021 (UTC)

Or, can someone please suggest a single universal {{nop}} that is acceptable in a table structure and elsewhere?— Ineuw (talk) 04:59, 17 January 2021 (UTC)
html comment <!-- --> can be used for their effect anywhere within a page / page section that is html. Which means they cannot be used within themselves, they cannot be used in css or js pages, or where the html is interrupted, eg. with <nowiki>.

{{nop}} is a <div>-based interruption to wikitext rendering and {{nopt}} is a <span>-based version. Typically in a table we should use the "nopt", the span-based version, not the div, which is stuff that we worked out over the years and have discussed here. — billinghurst sDrewth 01:06, 18 January 2021 (UTC)

CSS help wanted[edit]

Page:The New Yorker 0001 1925-02-21.pdf/25. I spent hours on this unholy mess. I give up. Can anyone make this work? —Justin (koavf)TCM 05:22, 16 January 2021 (UTC)

{{overfloat image}} is good in combination with plain text but much worse for combinations with table-like designs.
I suggest to upload images of single bags instead of one image of both bags. Then we can create a table with a cell on the left spanning across four rows and containing an image of the bag, a column of four cells in the middle, and a cell with the other bag on the right. Alternatively, it should be OK to use only one image of the bag twice. For the middle cells I would use {{dotted cell}}. If you agree with this solution, I can do it. --Jan Kameníček (talk) 10:07, 16 January 2021 (UTC)
There is a single bag image already on Commons but that is not tabular data, so it's bad semantics. —Justin (koavf)TCM 14:44, 17 January 2021 (UTC)
Can I ask the question "why?" We are here to represent the text in a way that reasonably portrays the publication. We are not slaves to an absolute facsimile. If something is an artefact, like that image, then stick in the image as a png/jpg, as long as people can see what the page contained, then so what that it somewhat differs in look, that will happen due to screen width and the toggle of screen layout. When it gets to be an epub, it will be different again, so why do we fuss beyond a representation? — billinghurst sDrewth 03:50, 18 January 2021 (UTC)
yeah, i would be happy with a centered image in the middle of text in one column, and push corner image (not text) to bottom. - like side footnotes, verisimilitude is more trouble than it is worth. thanks for trying though. and only 44 issues to work on. Slowking4Rama's revenge 01:05, 19 January 2021 (UTC)
@Billinghurst: The reason why is because accessibility is not optional. I agree that we are not obliged to have complete photographic reproduction, just typographic and as long as we can make the text appear in some logical fashion, that's what matters (e.g. I did not try to retain all of the pagination-related formatting such as arbitrary columns). The frustrating thing is that HTML and CSS can do this but I'm just failing to make it actualized on this page. —Justin (koavf)TCM 07:30, 19 January 2021 (UTC)
@Koavf: Why can't you add the image, and add the text as alt=. That is generally what I have done when I have added pictures of monuments where the words are evident in the photographs. Or simply give a good alt description when it is an inconsequential part of the reproduction. — billinghurst sDrewth 07:45, 19 January 2021 (UTC)
@Billinghurst: That is also bad semantics: the purpose of alt text is to serve as an alternative to the image in cases where (e.g.) someone is blind or someone has images turned off or someone has bandwidth constrictions and the image won't load. Including alt text for a monument, grave marker, sign, etc. where the actual object has writing on it as an integral part is appropriate but adding it here would not be. That said, were someone to do this, where would you actually put the text itself for users who are reading it? —Justin (koavf)TCM 07:55, 19 January 2021 (UTC)
It is a trade-off where we are reproducing a work from ages ago where the work is not envisioned to be presented. With these works, add the image, if there is text on the image then transcribe it; explain it. Trying to reproduce the text in a scroll box at an angle seems like a waste of good time and effort to try and facsimile. — billinghurst sDrewth 08:04, 19 January 2021 (UTC)
@Slowking4: Yes, unfortunately, there is still a lot to do in an ideal sense, such as converting line drawings to SVG and having transparencies instead of opacities, etc.) I'm reasonably happy with what I've been able to accomplish there minus this CSS problem. I'm not sure how motivated I am to personally go thru the Herculean task of the entire first year of The New Yorker but if I have assistance, I'd be motivated to continue. —Justin (koavf)TCM 07:30, 19 January 2021 (UTC)
oh, it is not that herculean. just have to make some proofread compromises. should be all red, if not yellow, by year end, just in time for next year's batch. Slowking4Rama's revenge 18:57, 20 January 2021 (UTC)
Not at the rate it's going now: it's taken three weeks to do one issue and that's without all of the best practices I mentioned above. Plus, that has probably been c. 100 hours, so that would be two full-time jobs doing it by myself thru the year (which I will not commit to) and the fact that I've done a lot of the heavy lifting solo (altho, of course, I am very appreciative of any the work that my collaborators have added and continue to add). Additionally, the scans of subsequent issues are much lower rez and missing pages, which will require a decent amount of forensics and reconstruction. I would be pleasantly surprised if we were half-way done by the end of the year before an even bigger crop of issues drops. —Justin (koavf)TCM 21:45, 20 January 2021 (UTC)

Tech News: 2021-03[edit]

16:10, 18 January 2021 (UTC)

Don Q, Son of Zorro[edit]

It has come to my attention that several people on here wanted a copy of the silent film "Don Q, Son of Zorro" (1925) in one part. I am currently uploading the movie (single part) to Wikimedia Commons. It should be available on there soon. (SurprisedMewtwoFace (talk) 22:48, 18 January 2021 (UTC))

@SurprisedMewtwoFace: That would be me. Thank you so much for doing that! I will transcribe the Zorro series fairly soon then. PseudoSkull (talk) 23:04, 18 January 2021 (UTC)
@PseudoSkull: The upload is now complete on Wikimedia Commons. Hope you find this useful! You can find it at https://commons.wikimedia.org/wiki/File:Don_Q_Son_of_Zorro.webm (SurprisedMewtwoFace (talk) 04:47, 19 January 2021 (UTC))
Incredibly useful, and a good quality encode, thanks! PseudoSkull (talk) 04:58, 19 January 2021 (UTC)
(Keep an eye out for Don Q, Son of Zorro if you're interested in the movie) PseudoSkull (talk) 04:58, 19 January 2021 (UTC)
Thanks so much for all your work! I'm sure you'll do a great job transcribing it. (SurprisedMewtwoFace (talk) 05:26, 19 January 2021 (UTC))

Proposal to convert usage of header "categories" to proper categories—ongoing[edit]

Hi to all. The use of categories as a parameter within {{header}} is painful as they are not recognised by any of the tools that manage/manipulate categories (eg. hotcat, cat-a-lot, autowikibrowser, etc.), meaning that any maintenance of categories needs to be done manually. At this stage I am not proposing that we remove the ability for categories in this manner, though I am proposing that where this style of addition is used, that there is a bot replacement to convert those categories into traditional style cat, and that this bot be run as an ongoing tool. [Detail not yet worked out just wish to propose these changes and seek agreement first.) — billinghurst sDrewth 04:28, 19 January 2021 (UTC)

Alice in Wonderland (1903 film)[edit]

As I was doing some maintenance on all films across the project, I noticed this film's encode was deleted from Commons (see the deletion discussion). Having a transcript with no film is not good. It could still be reuploaded here though, though it will be PD enough for Commons' standards in just a few years. PseudoSkull (talk) 04:56, 19 January 2021 (UTC)

on hold In the process of migrating the file, though pywikibot on toolforge is failing for me phabricator ticket in play. — billinghurst sDrewth 09:56, 19 January 2021 (UTC)
@Billinghurst:, task T272345 is resolved. Mpaa (talk) 21:20, 21 January 2021 (UTC)

The Unholy Three (1925 film)[edit]

I have noticed that none of the pages on Wikipedia or Wikisource have the film "The Unholy Three" by Tod Browning, which is now public domain. I have found a high-quality no-music upload of this film on Internet Archive and am now uploading it to Wikimedia. I hope this is helpful and increases access to the film. (SurprisedMewtwoFace (talk) 12:30, 19 January 2021 (UTC))

Nice! Thank you again! I hadn't even stumbled across that film yet and so it's not in the WikiProject Film list, so well done. @SurprisedMewtwoFace: By any chance, could you grab a copy of Stella Dallas (1925 film)? Here's a copy, for example, but the audio and unofficial title and ending need to be stripped. PseudoSkull (talk) 12:38, 19 January 2021 (UTC)
By the way, if you'd like, you could join Wikisource:WikiProject Film, a collaborative effort on Wikisource's film coverage, with a lot of resources on films listed. PseudoSkull (talk) 12:39, 19 January 2021 (UTC)
@PseudoSkull: Upload is now complete. You can find it at https://commons.wikimedia.org/wiki/File:The_Unholy_Three.webm (SurprisedMewtwoFace} (talk) 18:57, 19 January 2021 (UTC)))

Doctor Dolittle's Circus (1924)[edit]

Is there some reason why the US Gutenberg and other sites don't have public domain versions of this yet? It should have gone public domain last year, and I can find it on Gutenberg Australia, but the availability seems more limited than it should for whatever reason. (SurprisedMewtwoFace (talk) 14:55, 20 January 2021 (UTC))

I have found a Gutenberg version of this book uploaded to archive.org, but it is strange that the scans of the 1924 original are not to be found anywhere. --Jan Kameníček (talk) 15:15, 20 January 2021 (UTC)
I guess I could convert the Archive one if worse came to worse and upload it here, it just seems very strange that nobody has the original. I didn't have much trouble finding the Oz books once they went public domain, in contrast to this (SurprisedMewtwoFace (talk) 15:24, 20 January 2021 (UTC))
FYI, if you do feel the need to use a PG source, there's no point uploading a PDF made from PG source material - you might as well just copy-paste the text directly. All a PDF achieves is wasting time matching up pages. Inductiveloadtalk/contribs 22:30, 20 January 2021 (UTC)
it is not at all surprising that there are scan gaps. the scan effort tends to be at university libraries, and first editions are rare. have to go to library with book scanner and book, and upload to IA, to get a meaningful pdf. Slowking4Rama's revenge 05:50, 21 January 2021 (UTC)
If all goes well, then leave it to me to search out a copy in my state's (FL) library systems, lockdown/curbside pickup or not. (My search only began a couple moments after I read this.) --Slgrandson (talk) 06:12, 21 January 2021 (UTC)
only reprints in worldcat. you might try here https://baldwin.uflib.ufl.edu/ but i'm not find it there either. Slowking4Rama's revenge 14:20, 21 January 2021 (UTC)
Looks like we just found a candidate (from the Dupont-Ball collection of Stetson University). --Slgrandson (talk) 20:51, 21 January 2021 (UTC)
The WS:NLS has a copy too. They have an active WikiProject here. @Chime Hours: is it possible to request thing like this go onto a scanning queue of some sort? Lofting died in '47, so it's PD in the UK too. Inductiveloadtalk/contribs 21:18, 21 January 2021 (UTC)
Excellent! I assume this means we'll eventually be able to get a copy up. Question; if Lofting is public domain in the UK now, shouldn't the file be on Wikimedia when we upload it? And is there any difference between the US and UK edition? (SurprisedMewtwoFace (talk) 04:39, 22 January 2021 (UTC))

Category:Professors is not an occupation[edit]

As Professor is a title rather than an occupation, I was thinking that this may be better described as "Academic" though would like to hear the opinion of others of what to do with this category. Thanks. — billinghurst sDrewth 09:55, 21 January 2021 (UTC)

that is a German view: for Americans, titles are a job description. but i leave the ontology fight to you. Slowking4Rama's revenge 14:36, 21 January 2021 (UTC)
Hm, does it mean that in American English all titles including PhD etc. qualify as "jobs"? I also would not include "professor" or any other academic title into the category of jobs or occupations, but I admit that my native language connotations interfere here too. --Jan Kameníček (talk) 17:09, 21 January 2021 (UTC)
I think it kind of does double service in English. "Sam is a professor at the university" kind of implies Sam works there and performs certain duties (and might be an answer to "what does Sam do?"), whereas "Sam is a professor of geology" speaks mostly to the title rather than the occupation (and might be an answer to "is there anyone who can comment at length on this rock?"). Inductiveloadtalk/contribs 17:28, 21 January 2021 (UTC)
Well, Billinghurst’s suggestion of making it a subcategory of Academics in fact admits them as a sort of occupation too, as Academics are a subcategory of Authors by occupation. --Jan Kameníček (talk) 21:15, 21 January 2021 (UTC)
My recommendation is that wherever we see occupation listed as "professor" that we are replacing it with "academic". So authors who are professors will be "academics as authors" and where we have biographies they will be "biographies of academics". I was not planning on retaining an open active category for professors, that would become a category redirect. — billinghurst sDrewth 00:46, 22 January 2021 (UTC)
Across the parts of academia that I'm most closely associated with (NZ, Australia, UK), Professor is a title of a senior academic. An institution only has a certain number of professorships and it is rare to grant a professorship to someone who does not hold a doctrate. A family member's academic career saw them move through the ranks of Tutor, Senior Tutor, Junior Lecturer, Lecturer, Senior Lecturer, Associate Professor, and Professor. The whole way through, they regarded themself as an academic. From this perspective, I agree with the proposal to convert any "Professor" categories to "Academic". [I am, of course, aware of the American way of calling most of those ranks "Professor", but regard this as a regional quirk.] Beeswaxcandle (talk) 06:45, 22 January 2021 (UTC)
I don't think it's correct to say that professor is a title rather than occupation. w:Professor describes it as an occupation; arguably without that title, academic is just a job. w:Academy#Academic_personnel puts academic as a larger, somewhat fuzzier category (sometimes including academic librarians). For me, using "academic" reminds me of the way that the Library of Congress uses "cookery" whereas LibraryThing users use "cooking" or "cookbook" As for regional quirk... 64.3% of the world's native English speakers live in the US. I'd be interested to see what India uses, but even tossing Ireland into the countries Beeswaxcandle mentioned, you still haven't hit 1/4 the world's native English speakers.--Prosfilaes (talk) 22:21, 22 January 2021 (UTC)
@Prosfilaes: I will agree that it can be a labelled position/rank which may be more accurate than my original statement of title, eg. Professor of English. It is a title that shows a rank (there are of course further subdivision these days within professorship). We don't categorise someone based on "Reverend", we call them clergy or similar. I will also see that I have seen the title used by people heading up 19th century schools, though no clarity on how or what it portrays, beyond a leadership role. Someone may achieve the rank of professor, however, what is it as an occupation? They all start out in a profession at a lower rank and work their way up.

Maybe we are missing classification based on "rank" or "position" at though in a tenured sense it is hard to capture that and would hate to have to manually assign. — billinghurst sDrewth 23:31, 22 January 2021 (UTC)

As a general note that WD item you linked to is professor (Q121594) academic title at universities and other education and research institutions and the article lede is => Professor (commonly abbreviated as Prof.') is an academic rank at universities and other post-secondary education and research institutions in most countries. Literally, professor derives from Latin as a "person who professes". Professors are usually experts in their field and teachers of the highest rankExcerpted from Professor on Wikipedia, the free encyclopedia.billinghurst sDrewth 23:38, 22 January 2021 (UTC)
professor tends to imply a tenured faculty, academic implies that plus adjuncts, non-tenured teaching assistants, etc. but not "Herr Professor", as Americans really don't do titles. "Is professor a formal title or just an academic qualification? Neither. It is a job description. Like a military rank, you only have it while you are doing the job." [14] but if you insist on imposing UK/German nomenclature,[15] go for it. Slowking4Rama's revenge 04:56, 23 January 2021 (UTC)
As a practical matter with respect to the category tree itself, one can be employed as a professor without being an author. Although it is almost impossible to land a professorship without at least writing a thesis and dissertation along the way, one need not be either publishing or recognized for one's writing to get such a position. I suppose that professors who aren't authors will be of little interest to this project. BD2412 T 23:59, 22 January 2021 (UTC)
Not certain where that is leading. I am currently working on author pages, so all are authors while redoing the framework for occupations so we can list "as authors", "biographies of"—later we can work out what to do with portal namespace pages. I am currently adding new hierarchy, moving pages, and converting the occupations to utilise {{meta category occupation}} making them unable to be automatically categorised with HotCat. If "Professor" is an occupation it should be dealt with, if it fits elsewhere then it will be moved and reviewed at another time. — billinghurst sDrewth 00:24, 23 January 2021 (UTC)
I see what you mean. We don't have as a subcategory of Category:Authors by occupation people whose occupation is "author". BD2412 T 01:46, 23 January 2021 (UTC)
Well ... whoever as a paid author has just ever been a paid writer? Authors will typically do something else whilst they create or supplement a literary career. If you are looking for classifications of "professional" writers then you typically see them sorted into Category:Authors by type or Category:Authors by genre; again these are a different issue. I am trying to resolve our issue where main namespace works are sorted into author: namespace categories. — billinghurst sDrewth 05:12, 23 January 2021 (UTC)
At the risk of falling into something of a more philosophical discussion, we have, for example, Author:Benjamin Disraeli in Category:Novelists because he wrote novels, but this was by no means his occupation; we similarly have Author:Victor Hugo in that category, and this was very much his profession, the labor that was the source of his income. We should have some way to delineate people for whom writing was their occupation from those in other occupations who wrote works of a particular type or in a particular genre, but not as an occupation. This is, of course, completely tangential from the question of professors. BD2412 T 06:54, 23 January 2021 (UTC)
call me lazy, but i would just import the values from wikidata. this is the kind of ontology wrangling, and maintaining that should be done at a central location, rather than deciding on each wiki. Slowking4Rama's revenge 20:26, 23 January 2021 (UTC)

PDF icon[edit]

Can we change our PDF download icon from Pdfreaders-f.svg to something like Document-pdf.svg (which is what frWS uses)? Or something else from commons:Category:PDF icons under Public Domain. Using the FSFE "free PDF reader" graphic is a noble aim, but PDF icons are white with a red trefoil for the vast, vast majority of people, and the "f and green" means nothing to most people, so there's no instant visual recognition. If we really can't tolerate the trefoil, the icon should at least use a "red" metaphor. Inductiveloadtalk/contribs 11:39, 21 January 2021 (UTC)

@Inductiveload: Are you watching the work Community Tech is doing on export? At least one of the tasks covers icons and descriptions, and one way or another these two should be sync'ed. --Xover (talk) 14:02, 21 January 2021 (UTC)
I am, but AFAIK there's not been much about icons so far. There are some very small icons on phab:T271869, but they won't be obvious at all without a label next to them. Perhaps if we just have a textual "download" button we won't need any icons, but I wonder how obvious that will be for passer-bys. And the front page especially is text-heavy (by necessity), so icons help differentiate from the content of the works.
And we can always swap it out for a really nice themed set later on if one appears. Inductiveloadtalk/contribs 14:11, 21 January 2021 (UTC)
@Inductiveload: Well, I was thinking more along the lines that it won't help if we find nice icons if they use crap ones, but… :) --Xover (talk) 14:18, 21 January 2021 (UTC)
@Xover: that sounds like a bun fight for a future day? For now, it's just {{featured download}} and {{export}} (and the icon from Mediawiki:Mobile.css). Inductiveloadtalk/contribs 14:26, 21 January 2021 (UTC)

Pictogram voting comment.svg Comment I have no preference as long as it suitably indicative of what is being presented. — billinghurst sDrewth 00:48, 22 January 2021 (UTC)

Add export/ebooks links to {{new texts}}[edit]

I think it would be a good idea to take a leaf out of frWS's book and add export links to the {{new texts}} list. For example, see Template:New texts/item/testcases for a comparison.

The export links being visible drive a lot of downloads: the featured texts get around 4000 downloads each: https://wsexport.wmflabs.org/statistics (which, if I'm reading that page right is about 2/3 of all epub downloads) and nothing that's not a featured text gets a look in. Inductiveloadtalk/contribs 13:37, 21 January 2021 (UTC)

@Inductiveload: (CC Samwilson) I think the current plan at Community Tech puts a big honkin' download button right next to the title of every single mainspace page. However that ends up we should rethink our own strategy in light of that.
PS. Sam: I've not had the cycles to think this through, but it occurred to me that that "in your face" download button is a bit of a different beast than the sidebar links. On enWS we have typically put download links front and center only on featured texts, and more recently we've been trying to surface texts that are ready for export. My gut tells me we may want to be able to control when that Download! button appears, possibly in the form of only showing up on pages in a given category. Or, you know, something smarter. In any case, throwing it out here for lack of a better way to raise the issue. --Xover (talk) 14:15, 21 January 2021 (UTC)
@Xover: I think the current plan at Community Tech puts a big honkin' download button right next to the title, sure, but that still needs a click-through, whereas this is about getting some download links on the front page as a "hook". Inductiveloadtalk/contribs 14:19, 21 January 2021 (UTC)
@Xover: Yes, thank you for raising this, we've been thinking about it a bit. One idea was to add the download button on every page but make it possible to remove (e.g. via a magic word). The current task for it is phab:T271869. One issue with hiding it by default is that for lots of wikis it may never be enabled, and so readers will be less likely to know they can download epubs etc. Very happy for any suggestions of course! And there's always scope for per-wiki differences. :) — Sam Wilson 01:58, 22 January 2021 (UTC)

Share your feedback on the OCR improvements![edit]

Magic Wand Icon 229981 Color Flipped.svg

Hello, everyone! We (the team responsible for the Community Wishlist Survey) have just launched the project for OCR improvements! With this project, we aim to improve the experience of using OCR tools on Wikisource. Please refer to our project page, which provides a full summary of the project and the main problem areas that we have identified.

We would then love if you could answer the questions below. Your feedback is incredibly important to us and it will directly impact the choices we make. Thank you in advance, and we look forward to reading your feedback! SGrabarczuk (WMF) (talk) 00:35, 22 January 2021 (UTC)

RfC: HotCat and better customisation[edit]

HotCat is a gadget that enables easier categorisation of pages throughout the wiki. There has been customisation undertaken to make things easier for our needs, or to stop duplications, or to allow for finding categories after they have been moved. There is also possible customisation that users can do individually, and some of these we can do collectively.

I invite users to read the help pages at Commons => c:Help:Gadget-HotCat and consider should we be doing some better local configuration. Current local configuration at MediaWiki:Gadget-HotCat.js/local_defaults.

At the moment we have it configured to traverse up and down hierarchies, to use its redirect functionality (present in {{category redirect}} and use the feature to not populate certain categories. We have not got any shortcuts in place globally, though I use a couple for some of my current project work.

Just putting it out there in case anyone can think of valuable changes we could discuss. — billinghurst sDrewth 09:06, 24 January 2021 (UTC)