Wikisource:Scriptorium/Archives/2020-10

From Wikisource
Jump to navigation Jump to search
Warning Please do not post any new comments on this page.
This is a discussion archive first created in , although the comments contained were likely posted before and after this date.
See current discussion or the archives index.

Wikilivres is back

Wikilivres is back at wikilivres.org as of October 2020. The original site is now an Amazon book review site --kathleen wright5 (talk) 13:32, 3 October 2020 (UTC)

The website doesn't seem to be fully functional, sadly. Most of the pages seem to redirect back to the main page. JesseW (talk) 03:05, 4 October 2020 (UTC)

16:25, 5 October 2020 (UTC)

Call for feedback about Wikimedia Foundation Bylaws changes and Board candidate rubric

Hello. Apologies if you are not reading this message in your native language. Please help translate to your language.

Today the Wikimedia Foundation Board of Trustees starts two calls for feedback. One is about changes to the Bylaws mainly to increase the Board size from 10 to 16 members. The other one is about a trustee candidate rubric to introduce new, more effective ways to evaluate new Board candidates. The Board welcomes your comments through 26 October. For more details, check the full announcement.

Thank you! Qgil-WMF (talk) 17:10, 7 October 2020 (UTC)

Disambiguation of Psalm numbers

Background: the book of Psalms is an ancient collection of songs that is also part of the Bible. Each of the 150 songs in this book is its own individual Work, and some of them (such as Psalm 23 and Psalm 130) have their own Translations page on Wikisource because we have translations of them that are published outside of a complete edition of Psalms.

Now, there is a very good chance that I will be adding a lot more Translations pages for individual Psalms in the near future. Therefore I want to plan ahead and do it properly, like we did with Shakespeare's Sonnets.

The problem I have, which I am bringing to WS:S, is this: Psalms are usually identified by number (i.e. Psalm 1, Psalm 2, etc.). However, this number is not unique! There are two different numbering systems in use: the Hebrew/Masoretic system (used by Jews and Protestants and unofficially by Catholics) and the Greek/Septuagint system (used by Orthodox and officially by Catholics).

Thus, the title "Psalm 23" actually refers to two different songs:

  • "The Lord is my Shepherd; I shall not want", numbered as Psalm 23 in the Hebrew/Masoretic system
  • "The earth is the Lord's, and the fulness thereof", numbered as Psalm 23 in the Greek/Septuagint system

Our standard solution, of course, is to have Psalm 23 be a Disambiguation page which links to both of these two songs, and this is what I intend to do.

The question for all of you, therefore, is: What should be the title of the actual Psalm version page itself?Beleg Tâl (talk) 17:48, 8 October 2020 (UTC)

The best solution I have come up with so far, is to use the practice common in Catholic bibles, of using the Hebrew/Masoretic number, and then putting the Greek/Septuagint number in parentheses. Thus: "The Lord is my Shepherd" would be Psalm 23 (22); "The earth is the Lord's" would be Psalm 24 (23); etc. However, I am open to better suggestions. —Beleg Tâl (talk) 17:48, 8 October 2020 (UTC)
Other solutions I have thought of, which I personally think are less good:
Beleg Tâl (talk) 17:00, 12 October 2020 (UTC))
en.Wikipedia has w:en:Psalm 23 as "The Lord is my Shepherd", with an explanatory hat note; why not follow suit? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:45, 10 October 2020 (UTC)
@Pigsonthewing: Because this is one of the places where Wikisource and Wikipedia differ in policy. On Wikipedia, Psalm 23 is the title of the "Lord is my Shepherd" because this is the most common meaning of "Psalm 23" in English. If they needed a disambiguation page, they would put it at w:Psalm 23 (disambiguation). Wikisource, on the other hand, always places the disambiguation page under the ambiguous title, even if the title almost always refers to only one of the disambiguated items. —Beleg Tâl (talk) 16:50, 12 October 2020 (UTC)

15:24, 12 October 2020 (UTC)

Removing DNB pages, Executive Orders and US Supreme Court decisions from "Random Works"

The "Random Work" function is pretty overloaded with DNB articles, Executive Orders and US Supreme Court documents, because there are thousands of each and they're all top level pages. This means the button returns these documents very frequently, more than half the time from a highly unscientific trial. This is a little bit monotonous, compared to the diverse set of works available.

I wonder if it's possible to petition for a change to the enWS SpecialRandomGetRandomTitle hook to exclude pages that:

  • End in (DNB[0-9]{2})
  • Contain \bv.\b (almost certainly a SCOTUS decision)
  • Start with Executive Order
  • (any other suggestions?)

Inductiveloadtalk/contribs 11:35, 12 October 2020 (UTC)

  • I think a more pertinent solution to this problem would be to move all DNB articles to sub-pages, move all court cases to sub-pages of their respective volumes (of the U. S. Reports), and to have all Executive Orders as sub-pages of “Executive Orders President [name],” possible adding the year to the latter. Of these, the last is a suggestion for better navigation, but the other two should happen anyway. TE(æ)A,ea. (talk) 18:23, 12 October 2020 (UTC).
Agree about moving the DNB pages to be subpages. It was always the plan to do it once we had them all finished. I disagree about moving the court cases, as they are works in their own rights, and many have not come via those publications, and I prefer to not be a case of half pregnant. — billinghurst sDrewth 11:18, 13 October 2020 (UTC)
I will note that one of the reasons we did defer was that when pages are categorised that they have a VEEEEEEERRRRRY long page name which is a bit of a PITA when they display in categories, though this is now an issue for so many of our subpages of our biographical works, so that is just a cross we bear. The other impediment was the issue of typeahead for page names which has been resolved with improved indexing, and the preferences ability to how you search. There is a fair bit of work to do to get DNB moved, though it is all worthwhile. — billinghurst sDrewth 11:28, 13 October 2020 (UTC)
I think that, e. g., this court (found by random search) should be under United States Reports/Volume 490, because that is where is is published. Newer cases are also published individually, but they are eventually consolidated as well. As the source of these older court cases are the collected volumes, rather than independent publication, they should be given under the volume sub-page. This would be especially helpful to reduce the number of pages in the main namespace that aren’t really independent works. TE(æ)A,ea. (talk) 21:25, 13 October 2020 (UTC).
If someone wishes to produce a volume and transclude them that way, then they are most welcome. Forcing them under a volume for what is an independent case because it is (later) published in a volume is not the right approach. @Inductiveload: might it be possible to exclude based on categorisation? — billinghurst sDrewth 22:14, 13 October 2020 (UTC)

 Comment I spoke with Reedy, and he says not really, though he said ...

There's no hooks or anything
  $this->extra[] = 'page_title NOT ' . $dbr->buildLike( $dbr->anyString(), '/', $dbr->anyString() );
The parent page does have some hooks...
  $this->getHookRunner()->onRandomPageQuery( $tables, $conds, $joinConds );
But can't differentiate between random page or random root page

I hope that helps someone. — billinghurst sDrewth 15:35, 16 October 2020 (UTC)

Löbel Schottländer (Q1879596)

How would I let the reader know by linking that Löbel Schottländer is the person in Guide through Carlsbad and its environs/The Mineral Waters for Exportation and a few other Wikisource entries? --Richard Arthur Norton (1958- ) (talk) 05:17, 16 October 2020 (UTC)

Very good question in the general case.
The best case in my personal opinion, is to dig up documents that we can link to him and then give an author or Portal page. In this specific case we can probably add Index:The Morning Call - 1890-05-07.pdf as a document and then Löbel Schottländer can have an author page for his advert on page 3. Inductiveloadtalk/contribs 10:48, 16 October 2020 (UTC)
If the chapter of the work is primarily about the person, when you create a wikidata item for the chapter, you would use the main subject field for the chapter item to link to the person. — billinghurst sDrewth 15:18, 16 October 2020 (UTC)

Narrow footer and editing window in some layouts

When a work is switched to some narrower layout like Layout 2, not only the text gets narrower, but also the footer with the navigation (while the header stays wide), which is very inconvenient if the footer includes some long titles. What is more, when you click the edit button and then the preview, the editing windows gets narrow too, which makes further editing very difficult. Btw, in the common Layout 1 the footer is also slightly narrower than the header for some reason. Is it possible to exclude both the footer and the editing window from the width change in various layout modes? --Jan Kameníček (talk) 17:31, 17 October 2020 (UTC)

Gadget to resolve issues with HTML entities like ' in Page OCR

There is an issue where some ASCII characters are being replaced by "HTML entity codes" that look like ' in the preloaded OCR text of new pages in the Page namespace.

A quick-fix gadget has been deployed to undo the transformation when new Page-namespace pages are created. This should result in you not noticing any problems. The gadget is enabled by default, so users are opted in automatically. You can turn it off at any time by un-checking the "Automatically convert HTML entities mistakenly replaced in the Page namespace due to phab:T265571." checkbox in your gadget preferences, under the "Editing tools for Page: namespace" section.

For discussion of the issue in general, there is a thread here: Wikisource:Administrators'_noticeboard#OCR_change?. The issue has been reported at Phabricator as phab:T265571 and a fix is likely within a week or so, at which point the gadget will be be removed.

Any issues with the gadget in the meantime can be reported here. Inductiveloadtalk/contribs 10:20, 16 October 2020 (UTC)

As a fix was rushed though upstream out-of-cycle, this is no longer required. Thus, it will be made non-default, and the gadget will be removed entirely when the phab:T265571 ticket is closed. If you still see spurious HTML entities on new page creations, turn the gadget on and report at Phabricator. Inductiveloadtalk/contribs 12:06, 16 October 2020 (UTC)
The gadget has now been removed since the upstream fix appears to be working. Inductiveloadtalk/contribs 13:59, 18 October 2020 (UTC)

16:31, 19 October 2020 (UTC)

 Comment We have no abusefilter using rmspecial. — billinghurst sDrewth 12:52, 20 October 2020 (UTC)

Side text ?

Does the short left side text "Life of Sri Ramakrishna by European Scholars" also come in the text ?

From : The Gospel of Râmakrishna

--Riquix (talk) 06:45, 17 October 2020 (UTC)

@Riquix: It definitely does. You may try {{Left sidenote}}. --Jan Kameníček (talk) 17:49, 17 October 2020 (UTC)
Ok Thank you ! --Riquix (talk) 05:39, 18 October 2020 (UTC)
i would not use left sidenote, i would use template:PT Shoulder Heading see if you like it. Slowking4Rama's revenge 23:35, 21 October 2020 (UTC)

Tom Lehrer

Tom Lehrer has put all of his lyrics into the public domain; see: https://tomlehrersongs.com/ Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:28, 20 October 2020 (UTC)

Awesome. See https://web.archive.org/web/20201020064817/https://tomlehrersongs.com/ if you're around after tomlehrersongs.com is taken off the web. But we should upload all of those songs long before that.--Prosfilaes (talk) 20:00, 20 October 2020 (UTC)
See Author:Thomas Andrew Lehrer, where some people have already started working on it.--Prosfilaes (talk) 20:03, 20 October 2020 (UTC)
to do it scan backed, someone should knit together all the pdfs, i.e. [14] and upload them to commons, Slowking4Rama's revenge 23:19, 21 October 2020 (UTC)
@Slowking4: ta-da: Index:Tom Lehrer song lyrics (website).pdf
Are we going to need OTRS for this, does anyone think? Inductiveloadtalk/contribs 10:58, 22 October 2020 (UTC)
thank you very much - i certainly hope not; i do not trust the otrs admins to come to the correct conclusion, give the MacArthur decision. i would not submit anything to them, given the lack of accountability. Slowking4Rama's revenge 23:52, 23 October 2020 (UTC)

Import pagelist gadget

Import Pagelist dialog

There is a new gadget for importing pagelists. Currently the Internet Archive is supported.

Please see Help:Gadget-ImportPagelist for more documentation and instructions. It can be found in the "experimental" section of Special:Preferences#mw-prefsection-gadgets.

Hopefully, this will be useful for people when building pagelists. The IA pagelists aren't perfect, but they're a decent starting point. Inductiveloadtalk/contribs 19:43, 24 October 2020 (UTC)

Have I mentioned lately, that you're made of pure awesome? :) --Xover (talk) 08:04, 25 October 2020 (UTC)

DNB biographies have been moved to subpages

Following a discussion above about random pages, and a follow up discussion on the DNB project page, the DNB biographies (DNB00), (DNB01) and (DNB12) have (finally) been moved to be subpages of the works [redirects in place]. Accordingly, the templates are being updated—some done—locally, and after that we can start to look off-wiki.

As part of the updates of templates I have modernised what was an old implementation of header prior to some newer parameters. I have also default utilised {{import enwiki}} to leverage the main subject and the person interwiki to the enWP article. At some point, I will look at some maintenance to match the automatic parameters and linked parameters and rectify and then remove.

If people see issues, please leave me a message on my talk page. — billinghurst sDrewth 02:40, 24 October 2020 (UTC)

I have updated enWP's {{cite DNB}} series and related citation, attribution and post templates. If anyone sees others, then please let me know. — billinghurst sDrewth 10:50, 24 October 2020 (UTC)
Hooray! Thank you @Billinghurst: for the effort required. I'll keep my eyes peeled for bustications. Inductiveloadtalk/contribs 20:40, 24 October 2020 (UTC)

17:38, 26 October 2020 (UTC)

Important: maintenance operation on October 27

-- Trizek (WMF) (talk) 17:11, 21 October 2020 (UTC)

This section was archived on a request by: complete — billinghurst sDrewth 15:25, 27 October 2020 (UTC)

Findability

In the WikiCite conference just now, we had a presentation from a professional librarian who showed how a full copy of the so-called "Finch Report" (formally "Accessibility, sustainability, excellence: how to expand access to research publications") is very hard to find online. Several of us tried to find it, initially without success. After some digging, I eventually found that it has been on Commons since January 2014, and from there discovered that it was published here the following month. What can we do to improve findability? It lacks a header template - would that help? Is there a problem specific to this work, or is it more general? I have no linked the Wikidata item on the report to the Wikisource page; is that omission unusual, or do we need a concerted effort to link other works? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 00:26, 27 October 2020 (UTC)

@Pigsonthewing: It does have a header, it is built in for its generation and its data is populated from the index page. I think that the question you ask is the one that probably defeats many of us locals whose expertise lies in proofreading, and transcluding works, not in findability. For ever we have tried to have the metadata in to the headers through the tags, and I thought that they were COINS aligned, though I could be wrong. I would think that we are needing the advice of someone like your librarian, and some data search experts to tell us what we are missing in our metadata. I note that the WD item comes in at #13 in a google search for the title, and not at all for enWS based solely on title search, though appears if you add wikisource.org to the search. It almost seems that our headers may even obfuscate works. Can we turn the question outwards and implement other's guidance? — billinghurst sDrewth 15:02, 27 October 2020 (UTC)
One also wonders whether we need to somehow do better at Wikidata Accessibility, sustainability, excellence: how to expand access to research publications (Q19028392), it has a full article link pointing elsewhere, and though the interwiki is present, nothing so overt for enWS's copy. I would hope that we could have been better served from Wikidata, but we don't even have good tools to easily populate the data from here to there, and you have to then manually set flags like "proofread" which I have just done, so then it leverages Kaldari's highlight script. The simple triangular linking of WD <> Commons <> WS is fiddly and manual. — billinghurst sDrewth 15:14, 27 October 2020 (UTC)
yeah, searching for works by title is hopeless. i try to aid findability by adding to an author list, linking at wikipedia and wikidata. but we should not imagine that by just transcribing, the world will find us. it is going to take some interwiki cooperation (and promotion on social media). and it is all manual since we do not have tools / bots to propagate work links. Slowking4Rama's revenge 23:17, 28 October 2020 (UTC)
btw, please white list this link, that is in a footnote, so that we can edit the document youtube.com/watch?v=niyYWVa2w6w. Slowking4Rama's revenge 00:14, 29 October 2020 (UTC)
whitelisted — billinghurst sDrewth 10:37, 29 October 2020 (UTC)

Community collaboration

In case no one noticed, the Community collaboration is finished and needs to be replaced with something new. Kaldari (talk) 06:37, 29 October 2020 (UTC)

I boldly updated it with the next collaboration in line on the project page. Hope that's OK. Kaldari (talk) 05:33, 30 October 2020 (UTC)

Scottish Chapbooks with blurred pages.

Just for information, I have noticed some of the Scottish Chapbooks are displaying blurred pages.

I have found that by clicking the link to the original source page you can download the page and it is not blurred.

I'm not sure of the cause of this issue, but thought I'd flag it here in case anyone else is having difficulty with them.

For example a recent blurred page I encountered is Index:Young Gregor's ghost in three parts (NLS104184433).pdf with the clear page found at [20]

Sp1nd01 (talk) 12:25, 27 October 2020 (UTC)

This is caused by over-compression by the LuraDocument compressor used by NLS, which looks like it's failed to separate text and image on the first page. Either the PDF can be regenerated at our end from the source images, which needs a bit of faffing about, or maybe NLS can just re-run the derivation step. I have no idea if their workflow can do that easily. @LilacRoses: any idea? Inductiveloadtalk/contribs 12:34, 27 October 2020 (UTC)
Hi @Inductiveload:, apologies for the long wait. I have looked into this issue. Since their initial upload to Wikimedia, I understand our LuraTech compressor settings and version have been updated and therefore we would be able to redo the PDFs, but it would need to be done as more of a batch rather than individual items. With that in mind, I wondered if it would be possible to replace the files on Wikimedia Commons which connect to the Wikisource items without too much of an issue? It's an area I'm not too familiar with so any guidance on this would be much appreciated. The main concern I have is that I would want the new file to link to the same item on Wikisource, so for the example used above, if we were to redo the file https://upload.wikimedia.org/wikipedia/commons/6/66/Young_Gregor%27s_ghost_in_three_parts_%28NLS104184433%29.pdf, the new file would need to still link to https://en.wikisource.org/wiki/Index:Young_Gregor%27s_ghost_in_three_parts_(NLS104184433).pdf as we use these links in order to find and track the items, and it will really complicate things on our end if this is changed.
If you are able to please advise on the best way to overwrite the files, that would be great! Thanks in advance, LilacRoses (talk) 12:47, 16 November 2020 (UTC)
@LilacRoses: hi! It's easy to add new versions of files at Commons:
  • Go to the commons page commons:File:Young Gregor's ghost in three parts (NLS104184433).pdf
  • Find the "Upload a new version of this file" link which is just below the "history" table
  • Follow that link and select a new file to upload from your computer
  • Enter a description (e.g. "regenerated PDF with clearer compressor settings")
  • Click OK and let it upload
  • As long as the file has the same pages as the original, the Wikisource index will not need any changes and the page images will update automatically. Sometimes there is a short delay while the caches regenerate at Commons.
  • The whole process can be automated - Pywikibot and friends will happily do this as well (I'm not sure how NLS has been uploading files).
Tl;dr update at Commons, everything else "just works". Inductiveloadtalk/contribs 13:37, 16 November 2020 (UTC)
@Inductiveload: thank you for the help! It will be scheduled to be done later this month. Pattypan was used to upload the files to Commons. As we have quite a few blurry items, do you have any guidance on how the process could be automated? Thanks in advance! LilacRoses (talk) 14:41, 16 November 2020 (UTC)
A script to upload files would be pretty simple using Pywikibot. It really depends what data you have. As long as you know the filename at commons, it's very easy. Otherwise if you only have the NLS ID, you might need to first search for the file ending in (NLS xxxxxxxx).pdf. It really depends how many files there are to replace: if there are 10, it'll be quicker to do it manually, if there are 10k, then you need a script (or you need to buy the interns more coffee and pizza). Inductiveloadtalk/contribs 14:50, 16 November 2020 (UTC)

Call for feedback on archiving POTUS tweets

I would appreciate hearing the community's thoughts on archiving Presidents Trump's communications to the public via tweeting.

If you are new to the topic of the status of POTUS tweets, this article from NPR is a good introduction which happens to namecheck Wikipedia while discussing crowdsourcing of Presidential records.

My take is that post-11/3/2016 tweets from the @realDonaldTrump account - even those that have been subsequently deleted - are official Presidential records within the scope of being archived here. Here is why I believe this:

  • The Presidential and Federal Records Act was amended in 2014 to expand the definition of records to electronic content, including social media communications. The Obama administration complied with this by auto-archiving Obama's posts made from the @POTUS twitter account, and publishing a searchable archive of those tweets shortly before he left office. link
  • Trump's press secretary said on June 6, 2017, when asked whether POTUS tweets are official statements: "The President is the President of the United States, so they're considered official statements by the President of the United States."
  • Trump affirmed that he considered tweeting part of his presidential duties in July 2017 when he tweeted that "My use of social media is not Presidential - it's MODERN DAY PRESIDENTIAL."
  • This issue of the status of deleted POTUS tweets was asked about in this letter from two U.S. Senators to the Archivist of the United States. The Archivist responded that the National Archives and Records Administration "...has advised the White House that it should capture and preserve all tweets that the President posts in the course of his official duties, including those that are subsequently deleted, as Presidential records, and NARA has been informed by White House officials that they are, in fact, doing so." link
  • On March 15, 2018 Secretary of State Rex Tillerson learned that he was fired via twitter. The firing announcement was tweeted from the @realDonaldTrump account. The @POTUS account set up by the Obama administration, which during the Trump administration has consisted mostly of retweets from @realDonaldTrump, was silent on the firing. This is an example of why there is general agreement that when someone talks about "President Trump's tweets", they are referring to those from the @realDonaldTrump account.

Wikimedia Commons has two screengrabs of @realDonaldTrump tweets archived there, and some content sourced to Congressperson twitter accounts. Since there hadn't been any discussion specifically about the copyright status of POTUS tweet screengrabs I asked for clarification there. They agreed with my take that a screengrab of a basic POTUS tweet showing text and a profile picture is PD-USGOV, but that a screengrab showing anything more within it has to have those interior items separately evaluated, and blurred out if they are not PD.

Thanks! Dennis the Peasant (talk) 02:51, 10 October 2020 (UTC)

Unfortunately, given the above notes on Copyright status and the guidance at WS:WWI on documentary sources, they do appear to meet the criteria for being here. However, per the precedent exclusions given at WS:WWI, they must be complete and not fragmentary. I would expect them to be verifiable on Wiki. I say "unfortunately", because I'm not convinced that they will have a long-term value here at enWS. They will be archived in other places, because of what they are. I would anticipate that they would become a vandalism target, as are the letters from the Zodiac killer. Beeswaxcandle (talk) 04:41, 10 October 2020 (UTC)
  • The copyright status is a separate issue (and, NB, note that retweets are not PD-USGov under any circumstance!); my main concern is that these do not fit the purpose of Wikisource. There are lots of services that archive tweets and there is very little we can do to add value to them. They are some kind of bastard hybrid between off-the-cuff verbal communication and extremely informal and short written communication. They are not published in any sense that is relevant to our inclusion criteria. With a book or news article-style publication, subject to editorial control, sure: we could figure out the copyright situation and, if compatible, host. But indiscriminate inclusion of all, or a random excerpt of some, of an account's tweets makes absolutely zero sense. If any tweets should be permitted it would certainly be the tweets from a sitting President of the US, but I just don't see it. This is not what Wikisource is for. --Xover (talk) 07:03, 10 October 2020 (UTC)
  • I agree with the above, this is not a good use of Wikisource. On the other hand, we could definitely host content along the lines of The Tweets of President Donald J Trump (2020) provided that the work as a whole is freely licensed or PD. —Beleg Tâl (talk) 12:19, 10 October 2020 (UTC)
    Indeed. --Xover (talk) 12:41, 10 October 2020 (UTC)
  •  Comment I don't see it within our scope. The overarching conversations and the retweets are not within scope, and by their nature they are neverending conversations. Trump's tweets are excerpts of the conversations. Aside I don't see that it is within the indication of our scope of published works. — billinghurst sDrewth 13:24, 10 October 2020 (UTC)
  • comment there are other sites doing this work, http://trumptwitterarchive.com/archive and can be a citation for quotes. this community tends to concentrate on excavating reference texts not available elsewhere. Slowking4Rama's revenge 15:19, 10 October 2020 (UTC)

Thanks for the helpful, albeit discouraging comments!

As I noted above, the Presidential and Federal Records Act Amendments of 2014 revised the definition of official "records" to include all recorded information, regardless of form or characteristics. To summarize the feedback, it seems that a subset of Presidential official records, including but presumably not limited to posts on Twitter, possess characteristics which put them outside the scope of Wikisource.

To help out future Wikisourcians thinking about archiving Presidential and Federal Records, may I ask for clarity on what exactly are the forbidden characteristics? Length, formality, interactivity, possible vandals, lack of publication elsewhere, and the existence of other archives have all been mentioned, what are the red lines in these categories?

Thinking about other social media platforms commonly used by Congresspeople and Presidents, are reddit or Facebook posts (which typically exceed 280 characters but can involve interactivity) also outside of the scope of Wikisource? How about longer posts, without any interactivity, on a digital-only platform like Medium?

I'll toss out two test cases of digital Presidential communications which may help structure the discussion. Here is the URL to an archived Medium post by Obama: medium.com/obama-white-house/to-my-fellow-americans-649af4c5fc49; it is lengthy, contains images but no hyperlinks, and is not part of any conversation. To me it reads like an ordinary press release, or a transcript of a speech. The post's embedded images would certainly be OK to upload on Commons. Does archiving the text of this post fall within the scope of Wikisource, alongside the existing material at Author:Barack_Hussein_Obama?

For a second test case let's consider a tweet, from Obama to separate out the issues of potential vandals and alternative archives. With Obama's Twitter communications, the administration complied with its archival responsibilities in two ways. The most public archive is the @POTUS44 account which had all Obama @POTUS tweets migrated to it. Currently this account is easy to access and use, but of course there is nothing preventing Twitter from going out of business, deciding to delete the account, putting the information behind a paywall, etc.

The administration also made available for download a zipped archive with the text of the tweets in CVS and JSON formats, and included an html file to allow searching and reading within a browser. While this form of archiving has a lot going for it, it requires multiple actions and software to get the browser access going, and while this functionality worked well on my desktop, I couldn't get it to work on my Android phone. Additionally, the raw date is incomplete (ending on 11/16/2016), and in minor aspects often wrong (many tweets are mislabeled as retweets, probably due to the migration activity).

It seems to me that the public would benefit (admittedly, only a tiny bit) by having access to an archive of Obama's tweets in an easily readable and searchable format outside of Twitter. These would have to be reformatted from CVS or JSON to be readable, and the t.co redirection links would need to be replaced with URLs to their destination. These tasks are straightforward to automate, and here's a sample reformatted tweet:

This sample POTUS tweet seems pretty anodyne to me, but it seems the community feels strongly that archiving tweets like it does not fall within the scope of Wikisource. OK, but why? The brevity? Thanks again! Dennis the Peasant (talk) 20:10, 11 October 2020 (UTC)

i tend to be more tolerant of scope than most, but i have several questions: who is going to transcribe and maintain this? who is going to build the index? how are you going to find anything? where is the pdf text? did you upload the text to internet archive? how are you going to deal with deleted tweets? you realize how large the federal government document backlog is? you realize this community gets grumpy when people dump non-scan backed text and leave? you realize that archiving social media is a challenge for the library of congress and national archives? Slowking4Rama's revenge 03:48, 12 October 2020 (UTC)
^ this is exactly how I feel as well. —Beleg Tâl (talk) 17:06, 12 October 2020 (UTC)
sorry to rain on your bright idea. the problem being, there are a lot of bright people here with ideas; the sticking point is always the implementation plan, and the team recruitment. (it is a wikimedia pain point) Slowking4Rama's revenge 01:40, 13 October 2020 (UTC)
I appreciate the questions, and apologize for the delay in answering them. Deletion is a thorny issue, so allow me to pivot from suggesting we archive President Trump's tweets [2016 - present] to suggesting we archive President Obama's tweets [2015-17], a simpler project. We can move on the Trump case later, if warranted. So with the proposal on the table now being to archive Obama's @POTUS tweets, on to your questions:
Who is going to transcribe and maintain this? I am volunteering to transcribe them, and since this is a pretty small project I wouldn't need collaborators although I would welcome them. I'm also happy to work on their maintenance, although since Obama's are static I do not know what is needed beyond keeping the pages on my watch list to catch vandalism.
you realize that archiving social media is a challenge for the library of congress and national archives? Yes I am aware of the challenge, and the very rapid pace of software development further increases the difficulty. With Obama's tweets, the National Archives and Records Administration (NARA) has taken action - they maintain the @POTUS44 archival Twitter account - but I don't know of any other archiving actions by them.
where is the pdf text? did you upload the text to internet archive? Currently there is no official pdf text archive of Obama tweets to scan and upload, but one can make links at each tweet here to the corresponding tweet at the official NARA online archive. (I did this in the above sample Obama entry, it's the first link.) So each tweet archived here would be readily verifiable, in perpetuity since the NARA is maintaining the archives.
you realize this community gets grumpy when people dump non-scan backed text and leave? Understandable, but in this case there are no backing documents existing on paper or as pdf. So what is the verification process, or does one need to be decided upon?
Commons has a "trust but verify" copyright verification process - if an uploader claims that some content is CC licensed at Youtube, it is posted but with an automatic notice that an admin will verify this claim is true at some point. Maybe something similar could be done here, with an admin or proofreader clicking on each verification link after initial posting, and then noting on the page's notes that the transcription checks out.
who is going to build the index? I volunteer to also build an index, perhaps one modeled on the index for Obama's Presidential Weekly Addresses would work. I envision 20 pages (one for each month), with subsections for each day.
how are you going to find anything? I anticipate three major ways:
  • People who are interested in a subject would use keyword searching
  • People interested in a specific time period would navigate using the index and the pages' TOC
  • People interested in a specific tweet could find it either through searching (if they know some specific wording) or via timestamp anchors (if they know the date and time of the tweet).
The timestamps also offer an easily sharable entry to the archive, as the URL will indicate the month, day and time of the tweet. So if one shared a Wikisource URL containing "/wiki/President_Obama_Tweets_2015-10#01-02:27PM", it is clear that the link refers to an Obama tweet from 10/1/2015, tweeted at 2:27PM EST.
you realize how large the federal government document backlog is? Yes, but it is natural to update which documents are archived. Trump discontinued the time-honored Presidential Weekly Address tradition entirely in June 2018 in favor of other forms of communication, the most important of which (for him) is tweeting. Since Presidential Weekly Addresses are no longer given, it makes sense to think about archiving the communications which displaced them.
And while POTUS Twitter communications are sometimes no more than barbaric yawps, there have been others which have had great historical significance. As a category, it seems to me that they deserve to be archived here. Dennis the Peasant (talk) 06:25, 24 October 2020 (UTC)

This discussion is about to get archived without any comments on my last post. @Slowking4, @Beleg Tâl: did you have any thoughts about my responses to your questions? @Billinghurst: can you interpret for me how this discussion comes down on the question of whether I should go ahead with the archiving of Obama tweets? Thanks! Dennis the Peasant (talk) 05:01, 17 November 2020 (UTC)

I don't see an consensus of opinion that a tweet or a collection of tweets are in scope, nor that we should expand our scope to have them included. My personal opinion is unchanged. — billinghurst sDrewth 05:43, 17 November 2020 (UTC)
I agree with Billinghurst on this one. --Xover (talk) 06:15, 17 November 2020 (UTC)

Self-published lectures

What is our attitude to works like On Marquez's One Hundred Years of Solitude, originally self-published at [21]? I personally can imagine inclusion of such works, but Wikisource:What Wikisource includes states that works hosted at Wikisource "… must have been published in a medium that includes peer review or editorial controls; this excludes self-publication." Is it possible to alter the particular criterion to include such works somehow (e.g. making an exception to selfpublished lectures of well-known authors), or should this work rather go? --Jan Kameníček (talk) 14:38, 29 October 2020 (UTC)

  • Whatever the criteria, this work (and the others listed on his author page) should definitely be included. If a change is necessary (which it really is all for all guidelines), it should occur. TE(æ)A,ea. (talk) 16:23, 29 October 2020 (UTC).

OK, so to keep the work here as well as enable inclusion of others from Author:Ian Courtenay Johnston#Lectures (in fact I am considering adding some of them here), I suggest to replace the the part of a sentence "…this excludes self-publication" by "This usually excludes self-publication; rare exceptions can be considered provided that the writer of the self-published analytical work is a renowned academic author". --Jan Kameníček (talk) 08:25, 30 October 2020 (UTC)

It was accepted at the time, it is in scope. Anything can be discussed within scope, as there are edge cases. I don't think that we need any change in the policy or the wording, just bring forward items that are those edge cases. We already have many self-published old works, the rule is primarily aimed at conflict of interest and self-interest additions. — billinghurst sDrewth 09:02, 30 October 2020 (UTC)
How can something about which our rules say that it cannot be included to WS be in scope?
Johnston’s self-published lectures are definitely not an edge case at the moment. Currently the rule clearly and explicitely says that such works are excluded and forbids adding them, which is a pity. Here we can state an opinion that such works can be included, but generally it does not solve anything if we do not write this opinion into the rule’s page. If later some other contributors come to similar cases and they start wondering whether the work can be added to WS, they will most probably (similarly as I did) go to the "What Wikisource includes" page, where they will learn that the work cannot be included. So, if the work can be included, the rule should reflect it.
@"the rule is primarily aimed at conflict of interest and self-interest additions": I am not sure if this is only an opinion or a fact. The rule itself does not say it.
One more problem: Let’s say that on the basis of the opinion expressed above I will add the lectures from the list to WS. That would require some amount of work. How can I be sure that later it will not be deleted because of the current rule stating explicitely that such works are excluded? It cannot be required that people should work against our rules. If something can be acceptable, the rules should at least admit that it can be acceptable. Edge cases will always happen, but this is not an edge case, it is clearly behind the current fence.
Suggested addition makes adding such works possible and makes it known to everybody searching for such information in our rules. --Jan Kameníček (talk) 09:57, 30 October 2020 (UTC)
The word "renowned" makes this suggested change untenable for me. Who is to define renowned in any particular case? How broadly should the person be known as an academic author? Outside their institute of higher learning? Outside a geographical region? Outside their discipline? I'm also not convinced that a publication of a lecture given in the context of a course of learning is covered by a clause titled Analytical and artistic works. The lecture in question is neither.

In terms of self-publishing, what is the reason the work was self-published? To prevent censorship or suppression? Or self-aggrandisement? What's the difference between the vanity presses of the late 19th and early 20th centuries and the blog-posts of today? What's the distinction that allowed us to take Tom Lehrer's lyrics from his website and put them up, but not a piece of fan-fiction? I don't have definitive answers to these philosophic questions, other than to note that the Consensus section of WS:WWI allows us to agree to include or not include particular works by discussion here. A policy is meant to be read and applied in toto. Beeswaxcandle (talk) 18:04, 30 October 2020 (UTC)

Amen. If someone has a work that they wish considered by enWS then point to it, and ask about it. In the range of the works that we reproduce they are definitely edge cases. And of course that part of the rule is aimed at modern self-addition, how else will we exclude someone from publishing their poetry, their writings etc. ? While irregular now, it used to be consistent issue. — billinghurst sDrewth 13:15, 31 October 2020 (UTC)
OK, so I am leaving my attempt for more general pardon of such works.
Despite that, it seems that nobody raised any objections against adding Johnston’s lectures as such. Unless some objections appear, I will probably add some of them to WS. --Jan Kameníček (talk) 22:24, 31 October 2020 (UTC)
@Jan.Kamenicek: Just for the record, I think this should ideally have led to an amendment to the policy to make the scope for discussing edge cases clear and explicit, and for pretty much the reasons you articulated above. I think Beeswaxcandle makes goods points that should be addressed, but I see that as a matter of "how best to" and not "whether to". But as I don't have the spare cycles to participate meaningfully in such an effort, I'll limit myself to just expressing general support for the idea and leave it at that.
PS. Please link to this discussion from the works' talk page or similar (even having it in an edit summary helps), so any future deletion discussion will have easy reference to it. --Xover (talk) 07:37, 2 November 2020 (UTC)
"How can something about which our rules say that it cannot be included to WS be in scope?" = IAR. i find all the rules lawyering, and amendment, and strict constructionism, to be a waste of time. go ahead an rewrite rules if it makes you feel better, but it will not stop a deletion, if a rouge admin wants to assert "out of scope" as we have seen on other projects (like commons) Slowking4Rama's revenge 01:50, 3 November 2020 (UTC)
This section was archived on a request by: Jan Kameníček (talk) 22:25, 25 November 2020 (UTC)