Wikisource:Scriptorium/Archives/2020-10

Please do not post any new comments on this page.
This is a discussion archive first created in October 2020, although the comments contained were likely posted before and after this date.
See current discussion or the archives index.

Wikilivres is back

Wikilivres is back at wikilivres.org as of October 2020. The original site is now an Amazon book review site --kathleen wright5 (talk) 13:32, 3 October 2020 (UTC)

The website doesn't seem to be fully functional, sadly. Most of the pages seem to redirect back to the main page. JesseW (talk) 03:05, 4 October 2020 (UTC)

Tech News: 2020-41

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Recent changes

There is a new tool where you can see which home wiki users have in discussions on Meta. This can help show which communities are not part of the discussion on wikis where we make decisions that affect many other wikis.
You can now thank users for file uploads or for changing the language of a page. [1]

Problems

There were many errors with the new MediaWiki version last week. The new version was rolled back. Updates that should have happened last week are late. [2]
Everyone was logged out. This was because a user reported being logged in to someone else's account. The problem should be fixed now. [3]
Many pages have JavaScript errors. You can read more and now see a list of user scripts with errors.

Changes later this week

The new version of MediaWiki will be on test wikis and MediaWiki.org from 6 October. It will be on non-Wikipedia wikis and some Wikipedias from 7 October. It will be on all wikis from 8 October (calendar).
Letters immediately after a link are shown as part of the link. For example the entire word in [[Child]]ren is linked. On Arabic wikis this works at both the start and end of a word. Previously on Arabic wikis numbers and other non-letter Unicode characters were shown as part of the link at the start of a word but not at the end. Now only Latin and Arabic letters will extend links on Arabic wikis. [4]

Future changes

You will be able to read but not to edit the wikis for up to an hour on 27 October around 14:00 (UTC). It will probably be shorter than an hour. [5]

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

16:25, 5 October 2020 (UTC)

Call for feedback about Wikimedia Foundation Bylaws changes and Board candidate rubric

Hello. Apologies if you are not reading this message in your native language. Please help translate to your language.

Today the Wikimedia Foundation Board of Trustees starts two calls for feedback. One is about changes to the Bylaws mainly to increase the Board size from 10 to 16 members. The other one is about a trustee candidate rubric to introduce new, more effective ways to evaluate new Board candidates. The Board welcomes your comments through 26 October. For more details, check the full announcement.

Thank you! Qgil-WMF (talk) 17:10, 7 October 2020 (UTC)

Disambiguation of Psalm numbers

Background: the book of Psalms is an ancient collection of songs that is also part of the Bible. Each of the 150 songs in this book is its own individual Work, and some of them (such as Psalm 23 and Psalm 130) have their own Translations page on Wikisource because we have translations of them that are published outside of a complete edition of Psalms.

Now, there is a very good chance that I will be adding a lot more Translations pages for individual Psalms in the near future. Therefore I want to plan ahead and do it properly, like we did with Shakespeare's Sonnets.

The problem I have, which I am bringing to WS:S, is this: Psalms are usually identified by number (i.e. Psalm 1, Psalm 2, etc.). However, this number is not unique! There are two different numbering systems in use: the Hebrew/Masoretic system (used by Jews and Protestants and unofficially by Catholics) and the Greek/Septuagint system (used by Orthodox and officially by Catholics).

Thus, the title "Psalm 23" actually refers to two different songs:

"The Lord is my Shepherd; I shall not want", numbered as Psalm 23 in the Hebrew/Masoretic system
"The earth is the Lord's, and the fulness thereof", numbered as Psalm 23 in the Greek/Septuagint system

Our standard solution, of course, is to have Psalm 23 be a Disambiguation page which links to both of these two songs, and this is what I intend to do.

The question for all of you, therefore, is: What should be the title of the actual Psalm version page itself? —Beleg Tâl (talk) 17:48, 8 October 2020 (UTC)

The best solution I have come up with so far, is to use the practice common in Catholic bibles, of using the Hebrew/Masoretic number, and then putting the Greek/Septuagint number in parentheses. Thus: "The Lord is my Shepherd" would be Psalm 23 (22); "The earth is the Lord's" would be Psalm 24 (23); etc. However, I am open to better suggestions. —Beleg Tâl (talk) 17:48, 8 October 2020 (UTC)

Other solutions I have thought of, which I personally think are less good:

Psalm 22 or 23 (this one's not so bad tbh)
Psalm 22/23 (I actually think this one would be the best if not for the problems with subpage detection)
Tehilim 23 (explicitly using the Hebrew name and number)
The Lord is my Shepherd (Psalm) or The Lord is my Shepherd (David) (using the first line, this one being disambiguated from The Lord Is My Shepherd (Montgomery) — note, not all Psalms are attributed to David)
Dominus reget me (Latin titles for Psalms are not uncommon, especially for works based on the BCP)

—Beleg Tâl (talk) 17:00, 12 October 2020 (UTC))

en.Wikipedia has w:en:Psalm 23 as "The Lord is my Shepherd", with an explanatory hat note; why not follow suit? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:45, 10 October 2020 (UTC)

@Pigsonthewing: Because this is one of the places where Wikisource and Wikipedia differ in policy. On Wikipedia, Psalm 23 is the title of the "Lord is my Shepherd" because this is the most common meaning of "Psalm 23" in English. If they needed a disambiguation page, they would put it at w:Psalm 23 (disambiguation). Wikisource, on the other hand, always places the disambiguation page under the ambiguous title, even if the title almost always refers to only one of the disambiguated items. —Beleg Tâl (talk) 16:50, 12 October 2020 (UTC)

Tech News: 2020-42

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Problems

Because of the problems with the MediaWiki version two weeks ago last week's updates are also late. [6][7][8]

Changes later this week

Live previews didn't show the templates used in the preview if you just edited a section. This has now been fixed. You can also test CSS and JavaScript pages even if you have the live preview enabled. Previously this didn't work well. [9][10]
The new version of MediaWiki will be on test wikis and MediaWiki.org from 13 October. It will be on non-Wikipedia wikis and some Wikipedias from 14 October. It will be on all wikis from 15 October (calendar).

Future changes

A new stable version of Pywikibot is coming soon. [11]

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

15:24, 12 October 2020 (UTC)

Removing DNB pages, Executive Orders and US Supreme Court decisions from "Random Works"

The "Random Work" function is pretty overloaded with DNB articles, Executive Orders and US Supreme Court documents, because there are thousands of each and they're all top level pages. This means the button returns these documents very frequently, more than half the time from a highly unscientific trial. This is a little bit monotonous, compared to the diverse set of works available.

I wonder if it's possible to petition for a change to the enWS SpecialRandomGetRandomTitle hook to exclude pages that:

End in (DNB[0-9]{2})
Contain \bv.\b (almost certainly a SCOTUS decision)
Start with Executive Order
(any other suggestions?)

Inductiveload—talk/contribs 11:35, 12 October 2020 (UTC)

I think a more pertinent solution to this problem would be to move all DNB articles to sub-pages, move all court cases to sub-pages of their respective volumes (of the U. S. Reports), and to have all Executive Orders as sub-pages of “Executive Orders President [name],” possible adding the year to the latter. Of these, the last is a suggestion for better navigation, but the other two should happen anyway. TE(æ)A,ea. (talk) 18:23, 12 October 2020 (UTC).

Agree about moving the DNB pages to be subpages. It was always the plan to do it once we had them all finished. I disagree about moving the court cases, as they are works in their own rights, and many have not come via those publications, and I prefer to not be a case of half pregnant. — billinghurst sDrewth 11:18, 13 October 2020 (UTC)

I will note that one of the reasons we did defer was that when pages are categorised that they have a VEEEEEEERRRRRY long page name which is a bit of a PITA when they display in categories, though this is now an issue for so many of our subpages of our biographical works, so that is just a cross we bear. The other impediment was the issue of typeahead for page names which has been resolved with improved indexing, and the preferences ability to how you search. There is a fair bit of work to do to get DNB moved, though it is all worthwhile. — billinghurst sDrewth 11:28, 13 October 2020 (UTC)

I think that, e. g., this court (found by random search) should be under United States Reports/Volume 490, because that is where is is published. Newer cases are also published individually, but they are eventually consolidated as well. As the source of these older court cases are the collected volumes, rather than independent publication, they should be given under the volume sub-page. This would be especially helpful to reduce the number of pages in the main namespace that aren’t really independent works. TE(æ)A,ea. (talk) 21:25, 13 October 2020 (UTC).

If someone wishes to produce a volume and transclude them that way, then they are most welcome. Forcing them under a volume for what is an independent case because it is (later) published in a volume is not the right approach. @Inductiveload: might it be possible to exclude based on categorisation? — billinghurst sDrewth 22:14, 13 October 2020 (UTC)

Comment I spoke with Reedy, and he says not really, though he said ...

There's no hooks or anything
  $this->extra[] = 'page_title NOT ' . $dbr->buildLike( $dbr->anyString(), '/', $dbr->anyString() );
The parent page does have some hooks...
  $this->getHookRunner()->onRandomPageQuery( $tables, $conds, $joinConds );
But can't differentiate between random page or random root page

I hope that helps someone. — billinghurst sDrewth 15:35, 16 October 2020 (UTC)

Löbel Schottländer (Q1879596)

How would I let the reader know by linking that Löbel Schottländer is the person in Guide through Carlsbad and its environs/The Mineral Waters for Exportation and a few other Wikisource entries? --Richard Arthur Norton (1958- ) (talk) 05:17, 16 October 2020 (UTC)

Very good question in the general case.

The best case in my personal opinion, is to dig up documents that we can link to him and then give an author or Portal page. In this specific case we can probably add Index:The Morning Call - 1890-05-07.pdf as a document and then Löbel Schottländer can have an author page for his advert on page 3. Inductiveload—talk/contribs 10:48, 16 October 2020 (UTC)

If the chapter of the work is primarily about the person, when you create a wikidata item for the chapter, you would use the main subject field for the chapter item to link to the person. — billinghurst sDrewth 15:18, 16 October 2020 (UTC)

Narrow footer and editing window in some layouts

When a work is switched to some narrower layout like Layout 2, not only the text gets narrower, but also the footer with the navigation (while the header stays wide), which is very inconvenient if the footer includes some long titles. What is more, when you click the edit button and then the preview, the editing windows gets narrow too, which makes further editing very difficult. Btw, in the common Layout 1 the footer is also slightly narrower than the header for some reason. Is it possible to exclude both the footer and the editing window from the width change in various layout modes? --Jan Kameníček (talk) 17:31, 17 October 2020 (UTC)

Gadget to resolve issues with HTML entities like ' in Page OCR

There is an issue where some ASCII characters are being replaced by "HTML entity codes" that look like ' in the preloaded OCR text of new pages in the Page namespace.

A quick-fix gadget has been deployed to undo the transformation when new Page-namespace pages are created. This should result in you not noticing any problems. The gadget is ~~enabled by default, so users are opted in automatically~~. You can turn it off at any time by un-checking the "Automatically convert HTML entities mistakenly replaced in the Page namespace due to phab:T265571." checkbox in your gadget preferences, under the "Editing tools for Page: namespace" section.

For discussion of the issue in general, there is a thread here: Wikisource:Administrators'_noticeboard#OCR_change?. The issue has been reported at Phabricator as phab:T265571 and a fix is likely within a week or so, at which point the gadget will be be removed.

Any issues with the gadget in the meantime can be reported here. Inductiveload—talk/contribs 10:20, 16 October 2020 (UTC)

As a fix was rushed though upstream out-of-cycle, this is no longer required. Thus, it will be made non-default, and the gadget will be removed entirely when the phab:T265571 ticket is closed. If you still see spurious HTML entities on new page creations, turn the gadget on and report at Phabricator. Inductiveload—talk/contribs 12:06, 16 October 2020 (UTC)

The gadget has now been removed since the upstream fix appears to be working. Inductiveload—talk/contribs 13:59, 18 October 2020 (UTC)

Tech News: 2020-43

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Changes later this week

The new version of MediaWiki will be on test wikis and MediaWiki.org from 20 October. It will be on non-Wikipedia wikis and some Wikipedias from 21 October. It will be on all wikis from 22 October (calendar).

Future changes

You will be able to read but not to edit the wikis for up to an hour on 27 October around 14:00 (UTC). It will probably be shorter than an hour. [12]
In the AbuseFilter extension, the rmspecials() function will be updated soon so that it does not remove the "space" character. Wikis are advised to wrap all the uses of rmspecials() with rmwhitespace() wherever necessary to keep filters' behavior unchanged. You can use the search function on Special:AbuseFilter to locate its usage. [13]
Some gadgets and user-scripts use the HTML div with the ID #jump-to-nav. This div will be removed soon. Maintainers should replace these uses with either #siteSub or #mw-content-text. A list of affected scripts is at the top of phab:T265373.

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

16:31, 19 October 2020 (UTC)

Comment We have no abusefilter using rmspecial. — billinghurst sDrewth 12:52, 20 October 2020 (UTC)

Side text ?

Does the short left side text "Life of Sri Ramakrishna by European Scholars" also come in the text ?

From : The Gospel of Râmakrishna

--Riquix (talk) 06:45, 17 October 2020 (UTC)

@Riquix: It definitely does. You may try {{Left sidenote}}. --Jan Kameníček (talk) 17:49, 17 October 2020 (UTC)

Ok Thank you ! --Riquix (talk) 05:39, 18 October 2020 (UTC)

i would not use left sidenote, i would use template:PT Shoulder Heading see if you like it. Slowking4 ⚔ Rama's revenge 23:35, 21 October 2020 (UTC)

Tom Lehrer

Tom Lehrer has put all of his lyrics into the public domain; see: https://tomlehrersongs.com/ Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:28, 20 October 2020 (UTC)

Awesome. See https://web.archive.org/web/20201020064817/https://tomlehrersongs.com/ if you're around after tomlehrersongs.com is taken off the web. But we should upload all of those songs long before that.--Prosfilaes (talk) 20:00, 20 October 2020 (UTC)

See Author:Thomas Andrew Lehrer, where some people have already started working on it.--Prosfilaes (talk) 20:03, 20 October 2020 (UTC)

to do it scan backed, someone should knit together all the pdfs, i.e. [14] and upload them to commons, Slowking4 ⚔ Rama's revenge 23:19, 21 October 2020 (UTC)

@Slowking4: ta-da: Index:Tom Lehrer song lyrics (website).pdf

Are we going to need OTRS for this, does anyone think? Inductiveload—talk/contribs 10:58, 22 October 2020 (UTC)

thank you very much - i certainly hope not; i do not trust the otrs admins to come to the correct conclusion, give the MacArthur decision. i would not submit anything to them, given the lack of accountability. Slowking4 ⚔ Rama's revenge 23:52, 23 October 2020 (UTC)

Import pagelist gadget

There is a new gadget for importing pagelists. Currently the Internet Archive is supported.

Please see Help:Gadget-ImportPagelist for more documentation and instructions. It can be found in the "experimental" section of Special:Preferences#mw-prefsection-gadgets.

Hopefully, this will be useful for people when building pagelists. The IA pagelists aren't perfect, but they're a decent starting point. Inductiveload—talk/contribs 19:43, 24 October 2020 (UTC)

Have I mentioned lately, that you're made of pure awesome? :) --Xover (talk) 08:04, 25 October 2020 (UTC)

DNB biographies have been moved to subpages

Following a discussion above about random pages, and a follow up discussion on the DNB project page, the DNB biographies (DNB00), (DNB01) and (DNB12) have (finally) been moved to be subpages of the works [redirects in place]. Accordingly, the templates are being updated—some done—locally, and after that we can start to look off-wiki.

As part of the updates of templates I have modernised what was an old implementation of header prior to some newer parameters. I have also default utilised {{import enwiki}} to leverage the main subject and the person interwiki to the enWP article. At some point, I will look at some maintenance to match the automatic parameters and linked parameters and rectify and then remove.

If people see issues, please leave me a message on my talk page. — billinghurst sDrewth 02:40, 24 October 2020 (UTC)

I have updated enWP's {{cite DNB}} series and related citation, attribution and post templates. If anyone sees others, then please let me know. — billinghurst sDrewth 10:50, 24 October 2020 (UTC)

Hooray! Thank you @Billinghurst: for the effort required. I'll keep my eyes peeled for bustications. Inductiveload—talk/contribs 20:40, 24 October 2020 (UTC)

The following are all pages with “(DNB00)” which are either not redirection pages, or, if they are a redirection page, do not lead to a sub-page of “Dictionary of National Biography, 1885-1900:”
- Ramsay, William,
- Lewis Atterbury,
- James Balfour,
- Lambe, John, and
- Robert Manners, which are all disambiguation pages for articles with the same beginning; and
- Seton, Alexander (d.1555?-1622) (DNB00), which is dated.
There are no pages which fit the same respective requirements under either “(DNB01)” or “(DNB12).” TE(æ)A,ea. (talk) 23:41, 24 October 2020 (UTC).
1 through 5 as disambiguation pages should be dealt with under normal processes; 6 is a soft redirect and I have deleted it. We probably should look to deal with the dated soft redirects. Someone with python skills may wish to look at TalBot's scripts. — billinghurst sDrewth 06:12, 25 October 2020 (UTC)

Tech News: 2020-44

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Problems

You will be able to read but not to edit the wikis for up to an hour on October 27 around 14:00 (UTC). It will probably be shorter than an hour. [15]
Last week, links to "diffs" from mobile watchlists and recentchanges were linking to page-revisions instead of diffs. This has now been fixed. [16]

Changes later this week

There is no new MediaWiki version this week.

Future changes

Since the introduction of the interface administrators user group in 2018, administrators couldn’t view the deleted history of CSS/JS pages. Now they can. [17]
There was a problem with the Change Tags. The software would apply the "Reverted" tag to any page actions such as page-protection changes if they came directly after a reverted edit. This has now been fixed for new edits. [18]
The Reply tool will be offered as an opt-in Beta Feature on most Wikipedias in November. Another announcement will be made once the date is finalized. [19]

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

17:38, 26 October 2020 (UTC)

Important: maintenance operation on October 27

Read this message in another language • Please help translate to your language

The Wikimedia Foundation tests the switch between its first and secondary data centers. This will make sure that Wikipedia and the other Wikimedia wikis can stay online even after a disaster. To make sure everything is working, the Wikimedia Technology department needs to do a planned test. This test will show if they can reliably switch from one data centre to the other. It requires many teams to prepare for the test and to be available to fix any unexpected problems.

They will switch all traffic back to the primary data center on Tuesday, October 27 2020.

Unfortunately, because of some limitations in MediaWiki, all editing must stop while the switch is made. We apologize for this disruption, and we are working to minimize it in the future.

You will be able to read, but not edit, all wikis for a short period of time.

You will not be able to edit for up to an hour on Tuesday, October 27. The test will start at 14:00 UTC (14:00 WET, 15:00 CET, 10:00 EDT, 19:30 IST, 07:00 PDT, 23:00 JST, and in New Zealand at 03:00 NZDT on Wednesday October 28).
If you try to edit or save during these times, you will see an error message. We hope that no edits will be lost during these minutes, but we can't guarantee it. If you see the error message, then please wait until everything is back to normal. Then you should be able to save your edit. But, we recommend that you make a copy of your changes first, just in case.

Other effects:

Background jobs will be slower and some may be dropped. Red links might not be updated as quickly as normal. If you create an article that is already linked somewhere else, the link will stay red longer than usual. Some long-running scripts will have to be stopped.
There will be code freezes for the week of October 26, 2020. Non-essential code deployments will not happen.

This project may be postponed if necessary. You can read the schedule at wikitech.wikimedia.org. Any changes will be announced in the schedule. There will be more notifications about this. A banner will be displayed on all wikis 30 minutes before this operation happens. Please share this information with your community.

-- Trizek (WMF) (talk) 17:11, 21 October 2020 (UTC)

This section was archived on a request by: complete — billinghurst sDrewth 15:25, 27 October 2020 (UTC)

Findability

In the WikiCite conference just now, we had a presentation from a professional librarian who showed how a full copy of the so-called "Finch Report" (formally "Accessibility, sustainability, excellence: how to expand access to research publications") is very hard to find online. Several of us tried to find it, initially without success. After some digging, I eventually found that it has been on Commons since January 2014, and from there discovered that it was published here the following month. What can we do to improve findability? It lacks a header template - would that help? Is there a problem specific to this work, or is it more general? I have no linked the Wikidata item on the report to the Wikisource page; is that omission unusual, or do we need a concerted effort to link other works? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 00:26, 27 October 2020 (UTC)

@Pigsonthewing: It does have a header, it is built in for its generation and its data is populated from the index page. I think that the question you ask is the one that probably defeats many of us locals whose expertise lies in proofreading, and transcluding works, not in findability. For ever we have tried to have the metadata in to the headers through the tags, and I thought that they were COINS aligned, though I could be wrong. I would think that we are needing the advice of someone like your librarian, and some data search experts to tell us what we are missing in our metadata. I note that the WD item comes in at #13 in a google search for the title, and not at all for enWS based solely on title search, though appears if you add wikisource.org to the search. It almost seems that our headers may even obfuscate works. Can we turn the question outwards and implement other's guidance? — billinghurst sDrewth 15:02, 27 October 2020 (UTC)

One also wonders whether we need to somehow do better at Wikidata Accessibility, sustainability, excellence: how to expand access to research publications (Q19028392), it has a full article link pointing elsewhere, and though the interwiki is present, nothing so overt for enWS's copy. I would hope that we could have been better served from Wikidata, but we don't even have good tools to easily populate the data from here to there, and you have to then manually set flags like "proofread" which I have just done, so then it leverages Kaldari's highlight script. The simple triangular linking of WD <> Commons <> WS is fiddly and manual. — billinghurst sDrewth 15:14, 27 October 2020 (UTC)

yeah, searching for works by title is hopeless. i try to aid findability by adding to an author list, linking at wikipedia and wikidata. but we should not imagine that by just transcribing, the world will find us. it is going to take some interwiki cooperation (and promotion on social media). and it is all manual since we do not have tools / bots to propagate work links. Slowking4 ⚔ Rama's revenge 23:17, 28 October 2020 (UTC)

btw, please white list this link, that is in a footnote, so that we can edit the document youtube.com/watch?v=niyYWVa2w6w. Slowking4 ⚔ Rama's revenge 00:14, 29 October 2020 (UTC)

whitelisted — billinghurst sDrewth 10:37, 29 October 2020 (UTC)

Community collaboration

In case no one noticed, the Community collaboration is finished and needs to be replaced with something new. Kaldari (talk) 06:37, 29 October 2020 (UTC)

I boldly updated it with the next collaboration in line on the project page. Hope that's OK. Kaldari (talk) 05:33, 30 October 2020 (UTC)

Scottish Chapbooks with blurred pages.

Just for information, I have noticed some of the Scottish Chapbooks are displaying blurred pages.

I have found that by clicking the link to the original source page you can download the page and it is not blurred.

I'm not sure of the cause of this issue, but thought I'd flag it here in case anyone else is having difficulty with them.

For example a recent blurred page I encountered is Index:Young Gregor's ghost in three parts (NLS104184433).pdf with the clear page found at [20]

Sp1nd01 (talk) 12:25, 27 October 2020 (UTC)

This is caused by over-compression by the LuraDocument compressor used by NLS, which looks like it's failed to separate text and image on the first page. Either the PDF can be regenerated at our end from the source images, which needs a bit of faffing about, or maybe NLS can just re-run the derivation step. I have no idea if their workflow can do that easily. @LilacRoses: any idea? Inductiveload—talk/contribs 12:34, 27 October 2020 (UTC)

Hi @Inductiveload:, apologies for the long wait. I have looked into this issue. Since their initial upload to Wikimedia, I understand our LuraTech compressor settings and version have been updated and therefore we would be able to redo the PDFs, but it would need to be done as more of a batch rather than individual items. With that in mind, I wondered if it would be possible to replace the files on Wikimedia Commons which connect to the Wikisource items without too much of an issue? It's an area I'm not too familiar with so any guidance on this would be much appreciated. The main concern I have is that I would want the new file to link to the same item on Wikisource, so for the example used above, if we were to redo the file https://upload.wikimedia.org/wikipedia/commons/6/66/Young_Gregor%27s_ghost_in_three_parts_%28NLS104184433%29.pdf, the new file would need to still link to https://en.wikisource.org/wiki/Index:Young_Gregor%27s_ghost_in_three_parts_(NLS104184433).pdf as we use these links in order to find and track the items, and it will really complicate things on our end if this is changed.

If you are able to please advise on the best way to overwrite the files, that would be great! Thanks in advance, LilacRoses (talk) 12:47, 16 November 2020 (UTC)

@LilacRoses: hi! It's easy to add new versions of files at Commons:

Go to the commons page commons:File:Young Gregor's ghost in three parts (NLS104184433).pdf
Find the "Upload a new version of this file" link which is just below the "history" table
Follow that link and select a new file to upload from your computer
Enter a description (e.g. "regenerated PDF with clearer compressor settings")
Click OK and let it upload
As long as the file has the same pages as the original, the Wikisource index will not need any changes and the page images will update automatically. Sometimes there is a short delay while the caches regenerate at Commons.
The whole process can be automated - Pywikibot and friends will happily do this as well (I'm not sure how NLS has been uploading files).

Tl;dr update at Commons, everything else "just works". Inductiveload—talk/contribs 13:37, 16 November 2020 (UTC)

@Inductiveload: thank you for the help! It will be scheduled to be done later this month. Pattypan was used to upload the files to Commons. As we have quite a few blurry items, do you have any guidance on how the process could be automated? Thanks in advance! LilacRoses (talk) 14:41, 16 November 2020 (UTC)

A script to upload files would be pretty simple using Pywikibot. It really depends what data you have. As long as you know the filename at commons, it's very easy. Otherwise if you only have the NLS ID, you might need to first search for the file ending in (NLS xxxxxxxx).pdf. It really depends how many files there are to replace: if there are 10, it'll be quicker to do it manually, if there are 10k, then you need a script (or you need to buy the interns more coffee and pizza). Inductiveload—talk/contribs 14:50, 16 November 2020 (UTC)

Call for feedback on archiving POTUS tweets

I would appreciate hearing the community's thoughts on archiving Presidents Trump's communications to the public via tweeting.

If you are new to the topic of the status of POTUS tweets, this article from NPR is a good introduction which happens to namecheck Wikipedia while discussing crowdsourcing of Presidential records.

My take is that post-11/3/2016 tweets from the @realDonaldTrump account - even those that have been subsequently deleted - are official Presidential records within the scope of being archived here. Here is why I believe this:

The Presidential and Federal Records Act was amended in 2014 to expand the definition of records to electronic content, including social media communications. The Obama administration complied with this by auto-archiving Obama's posts made from the @POTUS twitter account, and publishing a searchable archive of those tweets shortly before he left office. link

Trump's press secretary said on June 6, 2017, when asked whether POTUS tweets are official statements: "The President is the President of the United States, so they're considered official statements by the President of the United States."

Trump affirmed that he considered tweeting part of his presidential duties in July 2017 when he tweeted that "My use of social media is not Presidential - it's MODERN DAY PRESIDENTIAL."

This issue of the status of deleted POTUS tweets was asked about in this letter from two U.S. Senators to the Archivist of the United States. The Archivist responded that the National Archives and Records Administration "...has advised the White House that it should capture and preserve all tweets that the President posts in the course of his official duties, including those that are subsequently deleted, as Presidential records, and NARA has been informed by White House officials that they are, in fact, doing so." link

On March 15, 2018 Secretary of State Rex Tillerson learned that he was fired via twitter. The firing announcement was tweeted from the @realDonaldTrump account. The @POTUS account set up by the Obama administration, which during the Trump administration has consisted mostly of retweets from @realDonaldTrump, was silent on the firing. This is an example of why there is general agreement that when someone talks about "President Trump's tweets", they are referring to those from the @realDonaldTrump account.

Wikimedia Commons has two screengrabs of @realDonaldTrump tweets archived there, and some content sourced to Congressperson twitter accounts. Since there hadn't been any discussion specifically about the copyright status of POTUS tweet screengrabs I asked for clarification there. They agreed with my take that a screengrab of a basic POTUS tweet showing text and a profile picture is PD-USGOV, but that a screengrab showing anything more within it has to have those interior items separately evaluated, and blurred out if they are not PD.

Thanks! Dennis the Peasant (talk) 02:51, 10 October 2020 (UTC)

Unfortunately, given the above notes on Copyright status and the guidance at WS:WWI on documentary sources, they do appear to meet the criteria for being here. However, per the precedent exclusions given at WS:WWI, they must be complete and not fragmentary. I would expect them to be verifiable on Wiki. I say "unfortunately", because I'm not convinced that they will have a long-term value here at enWS. They will be archived in other places, because of what they are. I would anticipate that they would become a vandalism target, as are the letters from the Zodiac killer. Beeswaxcandle (talk) 04:41, 10 October 2020 (UTC)

The copyright status is a separate issue (and, NB, note that retweets are not PD-USGov under any circumstance!); my main concern is that these do not fit the purpose of Wikisource. There are lots of services that archive tweets and there is very little we can do to add value to them. They are some kind of bastard hybrid between off-the-cuff verbal communication and extremely informal and short written communication. They are not published in any sense that is relevant to our inclusion criteria. With a book or news article-style publication, subject to editorial control, sure: we could figure out the copyright situation and, if compatible, host. But indiscriminate inclusion of all, or a random excerpt of some, of an account's tweets makes absolutely zero sense. If any tweets should be permitted it would certainly be the tweets from a sitting President of the US, but I just don't see it. This is not what Wikisource is for. --Xover (talk) 07:03, 10 October 2020 (UTC)
I agree with the above, this is not a good use of Wikisource. On the other hand, we could definitely host content along the lines of The Tweets of President Donald J Trump (2020) provided that the work as a whole is freely licensed or PD. —Beleg Tâl (talk) 12:19, 10 October 2020 (UTC)
Indeed. --Xover (talk) 12:41, 10 October 2020 (UTC)
Comment I don't see it within our scope. The overarching conversations and the retweets are not within scope, and by their nature they are neverending conversations. Trump's tweets are excerpts of the conversations. Aside I don't see that it is within the indication of our scope of published works. — billinghurst sDrewth 13:24, 10 October 2020 (UTC)
comment there are other sites doing this work, http://trumptwitterarchive.com/archive and can be a citation for quotes. this community tends to concentrate on excavating reference texts not available elsewhere. Slowking4 ⚔ Rama's revenge 15:19, 10 October 2020 (UTC)

Thanks for the helpful, albeit discouraging comments!

As I noted above, the Presidential and Federal Records Act Amendments of 2014 revised the definition of official "records" to include all recorded information, regardless of form or characteristics. To summarize the feedback, it seems that a subset of Presidential official records, including but presumably not limited to posts on Twitter, possess characteristics which put them outside the scope of Wikisource.

To help out future Wikisourcians thinking about archiving Presidential and Federal Records, may I ask for clarity on what exactly are the forbidden characteristics? Length, formality, interactivity, possible vandals, lack of publication elsewhere, and the existence of other archives have all been mentioned, what are the red lines in these categories?

Thinking about other social media platforms commonly used by Congresspeople and Presidents, are reddit or Facebook posts (which typically exceed 280 characters but can involve interactivity) also outside of the scope of Wikisource? How about longer posts, without any interactivity, on a digital-only platform like Medium?

I'll toss out two test cases of digital Presidential communications which may help structure the discussion. Here is the URL to an archived Medium post by Obama: medium.com/obama-white-house/to-my-fellow-americans-649af4c5fc49; it is lengthy, contains images but no hyperlinks, and is not part of any conversation. To me it reads like an ordinary press release, or a transcript of a speech. The post's embedded images would certainly be OK to upload on Commons. Does archiving the text of this post fall within the scope of Wikisource, alongside the existing material at Author:Barack_Hussein_Obama?

For a second test case let's consider a tweet, from Obama to separate out the issues of potential vandals and alternative archives. With Obama's Twitter communications, the administration complied with its archival responsibilities in two ways. The most public archive is the @POTUS44 account which had all Obama @POTUS tweets migrated to it. Currently this account is easy to access and use, but of course there is nothing preventing Twitter from going out of business, deciding to delete the account, putting the information behind a paywall, etc.

The administration also made available for download a zipped archive with the text of the tweets in CVS and JSON formats, and included an html file to allow searching and reading within a browser. While this form of archiving has a lot going for it, it requires multiple actions and software to get the browser access going, and while this functionality worked well on my desktop, I couldn't get it to work on my Android phone. Additionally, the raw date is incomplete (ending on 11/16/2016), and in minor aspects often wrong (many tweets are mislabeled as retweets, probably due to the migration activity).

It seems to me that the public would benefit (admittedly, only a tiny bit) by having access to an archive of Obama's tweets in an easily readable and searchable format outside of Twitter. These would have to be reformatted from CVS or JSON to be readable, and the t.co redirection links would need to be replaced with URLs to their destination. These tasks are straightforward to automate, and here's a sample reformatted tweet:

[16-10-01 02:27 PM] Paid leave shouldn't be a luxury. It's a basic necessity that we should secure for every working American. nytimes.com/2016/09/30/busines...

This sample POTUS tweet seems pretty anodyne to me, but it seems the community feels strongly that archiving tweets like it does not fall within the scope of Wikisource. OK, but why? The brevity? Thanks again! Dennis the Peasant (talk) 20:10, 11 October 2020 (UTC)

i tend to be more tolerant of scope than most, but i have several questions: who is going to transcribe and maintain this? who is going to build the index? how are you going to find anything? where is the pdf text? did you upload the text to internet archive? how are you going to deal with deleted tweets? you realize how large the federal government document backlog is? you realize this community gets grumpy when people dump non-scan backed text and leave? you realize that archiving social media is a challenge for the library of congress and national archives? Slowking4 ⚔ Rama's revenge 03:48, 12 October 2020 (UTC)

^ this is exactly how I feel as well. —Beleg Tâl (talk) 17:06, 12 October 2020 (UTC)

sorry to rain on your bright idea. the problem being, there are a lot of bright people here with ideas; the sticking point is always the implementation plan, and the team recruitment. (it is a wikimedia pain point) Slowking4 ⚔ Rama's revenge 01:40, 13 October 2020 (UTC)

I appreciate the questions, and apologize for the delay in answering them. Deletion is a thorny issue, so allow me to pivot from suggesting we archive President Trump's tweets [2016 - present] to suggesting we archive President Obama's tweets [2015-17], a simpler project. We can move on the Trump case later, if warranted. So with the proposal on the table now being to archive Obama's @POTUS tweets, on to your questions:

Who is going to transcribe and maintain this? I am volunteering to transcribe them, and since this is a pretty small project I wouldn't need collaborators although I would welcome them. I'm also happy to work on their maintenance, although since Obama's are static I do not know what is needed beyond keeping the pages on my watch list to catch vandalism.

you realize that archiving social media is a challenge for the library of congress and national archives? Yes I am aware of the challenge, and the very rapid pace of software development further increases the difficulty. With Obama's tweets, the National Archives and Records Administration (NARA) has taken action - they maintain the @POTUS44 archival Twitter account - but I don't know of any other archiving actions by them.

where is the pdf text? did you upload the text to internet archive? Currently there is no official pdf text archive of Obama tweets to scan and upload, but one can make links at each tweet here to the corresponding tweet at the official NARA online archive. (I did this in the above sample Obama entry, it's the first link.) So each tweet archived here would be readily verifiable, in perpetuity since the NARA is maintaining the archives.

you realize this community gets grumpy when people dump non-scan backed text and leave? Understandable, but in this case there are no backing documents existing on paper or as pdf. So what is the verification process, or does one need to be decided upon?

Commons has a "trust but verify" copyright verification process - if an uploader claims that some content is CC licensed at Youtube, it is posted but with an automatic notice that an admin will verify this claim is true at some point. Maybe something similar could be done here, with an admin or proofreader clicking on each verification link after initial posting, and then noting on the page's notes that the transcription checks out.

who is going to build the index? I volunteer to also build an index, perhaps one modeled on the index for Obama's Presidential Weekly Addresses would work. I envision 20 pages (one for each month), with subsections for each day.

how are you going to find anything? I anticipate three major ways:

People who are interested in a subject would use keyword searching
People interested in a specific time period would navigate using the index and the pages' TOC
People interested in a specific tweet could find it either through searching (if they know some specific wording) or via timestamp anchors (if they know the date and time of the tweet).

The timestamps also offer an easily sharable entry to the archive, as the URL will indicate the month, day and time of the tweet. So if one shared a Wikisource URL containing "/wiki/President_Obama_Tweets_2015-10#01-02:27PM", it is clear that the link refers to an Obama tweet from 10/1/2015, tweeted at 2:27PM EST.

you realize how large the federal government document backlog is? Yes, but it is natural to update which documents are archived. Trump discontinued the time-honored Presidential Weekly Address tradition entirely in June 2018 in favor of other forms of communication, the most important of which (for him) is tweeting. Since Presidential Weekly Addresses are no longer given, it makes sense to think about archiving the communications which displaced them.

And while POTUS Twitter communications are sometimes no more than barbaric yawps, there have been others which have had great historical significance. As a category, it seems to me that they deserve to be archived here. Dennis the Peasant (talk) 06:25, 24 October 2020 (UTC)

This discussion is about to get archived without any comments on my last post. @Slowking4, @Beleg Tâl: did you have any thoughts about my responses to your questions? @Billinghurst: can you interpret for me how this discussion comes down on the question of whether I should go ahead with the archiving of Obama tweets? Thanks! Dennis the Peasant (talk) 05:01, 17 November 2020 (UTC)

I don't see an consensus of opinion that a tweet or a collection of tweets are in scope, nor that we should expand our scope to have them included. My personal opinion is unchanged. — billinghurst sDrewth 05:43, 17 November 2020 (UTC)

I agree with Billinghurst on this one. --Xover (talk) 06:15, 17 November 2020 (UTC)

Self-published lectures

What is our attitude to works like On Marquez's One Hundred Years of Solitude, originally self-published at [21]? I personally can imagine inclusion of such works, but Wikisource:What Wikisource includes states that works hosted at Wikisource "… must have been published in a medium that includes peer review or editorial controls; this excludes self-publication." Is it possible to alter the particular criterion to include such works somehow (e.g. making an exception to selfpublished lectures of well-known authors), or should this work rather go? --Jan Kameníček (talk) 14:38, 29 October 2020 (UTC)

Whatever the criteria, this work (and the others listed on his author page) should definitely be included. If a change is necessary (which it really is all for all guidelines), it should occur. TE(æ)A,ea. (talk) 16:23, 29 October 2020 (UTC).

OK, so to keep the work here as well as enable inclusion of others from Author:Ian Courtenay Johnston#Lectures (in fact I am considering adding some of them here), I suggest to replace the the part of a sentence "…this excludes self-publication" by "This usually excludes self-publication; rare exceptions can be considered provided that the writer of the self-published analytical work is a renowned academic author". --Jan Kameníček (talk) 08:25, 30 October 2020 (UTC)

It was accepted at the time, it is in scope. Anything can be discussed within scope, as there are edge cases. I don't think that we need any change in the policy or the wording, just bring forward items that are those edge cases. We already have many self-published old works, the rule is primarily aimed at conflict of interest and self-interest additions. — billinghurst sDrewth 09:02, 30 October 2020 (UTC)

How can something about which our rules say that it cannot be included to WS be in scope?

Johnston’s self-published lectures are definitely not an edge case at the moment. Currently the rule clearly and explicitely says that such works are excluded and forbids adding them, which is a pity. Here we can state an opinion that such works can be included, but generally it does not solve anything if we do not write this opinion into the rule’s page. If later some other contributors come to similar cases and they start wondering whether the work can be added to WS, they will most probably (similarly as I did) go to the "What Wikisource includes" page, where they will learn that the work cannot be included. So, if the work can be included, the rule should reflect it.

@"the rule is primarily aimed at conflict of interest and self-interest additions": I am not sure if this is only an opinion or a fact. The rule itself does not say it.

One more problem: Let’s say that on the basis of the opinion expressed above I will add the lectures from the list to WS. That would require some amount of work. How can I be sure that later it will not be deleted because of the current rule stating explicitely that such works are excluded? It cannot be required that people should work against our rules. If something can be acceptable, the rules should at least admit that it can be acceptable. Edge cases will always happen, but this is not an edge case, it is clearly behind the current fence.

Suggested addition makes adding such works possible and makes it known to everybody searching for such information in our rules. --Jan Kameníček (talk) 09:57, 30 October 2020 (UTC)

The word "renowned" makes this suggested change untenable for me. Who is to define renowned in any particular case? How broadly should the person be known as an academic author? Outside their institute of higher learning? Outside a geographical region? Outside their discipline? I'm also not convinced that a publication of a lecture given in the context of a course of learning is covered by a clause titled Analytical and artistic works. The lecture in question is neither.

In terms of self-publishing, what is the reason the work was self-published? To prevent censorship or suppression? Or self-aggrandisement? What's the difference between the vanity presses of the late 19th and early 20th centuries and the blog-posts of today? What's the distinction that allowed us to take Tom Lehrer's lyrics from his website and put them up, but not a piece of fan-fiction? I don't have definitive answers to these philosophic questions, other than to note that the Consensus section of WS:WWI allows us to agree to include or not include particular works by discussion here. A policy is meant to be read and applied in toto. Beeswaxcandle (talk) 18:04, 30 October 2020 (UTC)

Amen. If someone has a work that they wish considered by enWS then point to it, and ask about it. In the range of the works that we reproduce they are definitely edge cases. And of course that part of the rule is aimed at modern self-addition, how else will we exclude someone from publishing their poetry, their writings etc. ? While irregular now, it used to be consistent issue. — billinghurst sDrewth 13:15, 31 October 2020 (UTC)

OK, so I am leaving my attempt for more general pardon of such works.

Despite that, it seems that nobody raised any objections against adding Johnston’s lectures as such. Unless some objections appear, I will probably add some of them to WS. --Jan Kameníček (talk) 22:24, 31 October 2020 (UTC)

@Jan.Kamenicek: Just for the record, I think this should ideally have led to an amendment to the policy to make the scope for discussing edge cases clear and explicit, and for pretty much the reasons you articulated above. I think Beeswaxcandle makes goods points that should be addressed, but I see that as a matter of "how best to" and not "whether to". But as I don't have the spare cycles to participate meaningfully in such an effort, I'll limit myself to just expressing general support for the idea and leave it at that.

PS. Please link to this discussion from the works' talk page or similar (even having it in an edit summary helps), so any future deletion discussion will have easy reference to it. --Xover (talk) 07:37, 2 November 2020 (UTC)

"How can something about which our rules say that it cannot be included to WS be in scope?" = IAR. i find all the rules lawyering, and amendment, and strict constructionism, to be a waste of time. go ahead an rewrite rules if it makes you feel better, but it will not stop a deletion, if a rouge admin wants to assert "out of scope" as we have seen on other projects (like commons) Slowking4 ⚔ Rama's revenge 01:50, 3 November 2020 (UTC)

This section was archived on a request by: Jan Kameníček (talk) 22:25, 25 November 2020 (UTC)

Wikisource:Scriptorium/Archives/2020-10

Contents

Wikilivres is back

Tech News: 2020-41

Call for feedback about Wikimedia Foundation Bylaws changes and Board candidate rubric

Disambiguation of Psalm numbers

Tech News: 2020-42

Removing DNB pages, Executive Orders and US Supreme Court decisions from "Random Works"

Löbel Schottländer (Q1879596)

Narrow footer and editing window in some layouts

Gadget to resolve issues with HTML entities like ' in Page OCR

Tech News: 2020-43

Side text ?

Tom Lehrer

Import pagelist gadget

DNB biographies have been moved to subpages

Tech News: 2020-44

Important: maintenance operation on October 27

Findability

Community collaboration

Scottish Chapbooks with blurred pages.

Call for feedback on archiving POTUS tweets

Self-published lectures

Navigation menu

Wikisource:Scriptorium/Archives/2020-10

Wikilivres is back

Tech News: 2020-41

Call for feedback about Wikimedia Foundation Bylaws changes and Board candidate rubric

Disambiguation of Psalm numbers

Tech News: 2020-42

Removing DNB pages, Executive Orders and US Supreme Court decisions from "Random Works"

Löbel Schottländer (Q1879596)

Narrow footer and editing window in some layouts

Gadget to resolve issues with HTML entities like &#39; in Page OCR

Tech News: 2020-43

Side text ?

Tom Lehrer

Import pagelist gadget

DNB biographies have been moved to subpages

Tech News: 2020-44

Important: maintenance operation on October 27

Findability

Community collaboration

Scottish Chapbooks with blurred pages.

Call for feedback on archiving POTUS tweets

Self-published lectures

Navigation menu

Search

Gadget to resolve issues with HTML entities like ' in Page OCR