Wikisource:Scriptorium/Archives/2022-11

From Wikisource
Jump to navigation Jump to search
Warning Please do not post any new comments on this page.
This is a discussion archive first created in , although the comments contained were likely posted before and after this date.
See current discussion or the archives index.

PDF split needed

Please could someone split File:Ohio State Exhibit - A Century of Progress International Exposition brochure.pdf into separate pages (and convert to DejaVu if necessary)? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:14, 19 November 2022 (UTC)

@Pigsonthewing I prepared the file but I am unsure about the copyright status as this is published in 1933 and William Mark Young died in 1946. Mpaa (talk) 22:44, 19 November 2022 (UTC)
It was published without a copyright statement, which in the U.S. before 1977 means it's public domain (see {{PD-US-no-notice}}). —Mahāgaja · talk 17:26, 20 November 2022 (UTC)
See File:Ohio State Exhibit - A Century of Progress International Exposition brochure.djvu Mpaa (talk) 21:30, 20 November 2022 (UTC)
This section was archived on a request by: Thank you. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:01, 21 November 2022 (UTC)

Asking on behalf of Reboot01. This file has been moved to File:An Investigation of the Laws of Thought (1854, Boole, investigationofl00boolrich).djvu so it and all its pages need to be moved over on Wikisource. PseudoSkull (talk) 15:39, 17 November 2022 (UTC)

Done PseudoSkull (talk) 18:41, 17 November 2022 (UTC)
This section was archived on a request by: Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:28, 22 November 2022 (UTC)

Google OCR

It seems there are some problems with Google OCR. When pressing the button, often the OCR text is not obtained and a message <error> undefined undefined appears in the top right corner. Usually two or even more attempts are needed to get the OCR. Is anybody else experiencing these problems? -- Jan Kameníček (talk) 21:38, 6 November 2022 (UTC)

So finally it has stopped working completely. --Jan Kameníček (talk) 09:37, 7 November 2022 (UTC)
@Jan.Kamenicek: The error indicates a problem in the backend somewhere (i.e, nothing we can fix on our side). But as the Google OCR gadget is no longer supported (it was replaced by the new Wikisource OCR tool) it is unlikely it'll be fixed, or at least fixed quickly, if it is due to anything more than a transient error. I recommend using the new Wikisource OCR tool (because it is supported), or the Phetools OCR (which I am keeping alive for now). Xover (talk) 09:51, 7 November 2022 (UTC)
Thanks for the reply. The new Wikisource ORC tool is a big improvement in comparison with the situation before it was created, but Google has meanwhile developed the tool even better, and it 1) produced better results in most cases and 2) was much quicker. In comparison with this, Wikisource OCR tool does not seem to have been improving since it was created. So if the Google tool is not fixed, it will be a loss. --Jan Kameníček (talk) 10:17, 7 November 2022 (UTC)
@Jan.Kamenicek: The new Wikisource OCR has an option (dropdown at the right side of the button) to select a Google backend rather than a local Tesseract backend. It is, unfortunately not the same Google backend as the Google OCR gadget—Google provides two different APIs that do OCR—but I would expect the two to be roughly on par over time (and the Wikisource OCR tool get these improvements automatically). Does this not provide the results you want?
The Tesseract-based backend will be improved intermittently as the upstream Tesseract project improves, and eventually our local installation of Tesseract will be updated (the update speed depends on infrastructure issues, so it won't be hyper fast most of the time; detailed explanation on request). And if speed is of utmost importance, the old Phetools OCR was pretty fast compared to most alternatives. I also have a local hack that I use myself that beats it in most scenarios, but currently at the cost of being a bit buggy. If I ever get around to fixing it up to be other than a personal toy I'm pretty sure I can make it effectively instantaneous for 99% of pages. Anybody who most cares about speed can drop me a note and I'll keep you in mind if I get around to improving it. Xover (talk) 10:45, 7 November 2022 (UTC)
@Xover: Hm, good point about the Google option in the Wiksource OCR tool, I have forgotten about that! I will try it. Good to hear about future Tesseract improvements too! --Jan Kameníček (talk) 10:55, 7 November 2022 (UTC)
@Xover: I think the two OCR tools do use the same Google service, the Cloud Vision API. It's the Indic OCR tool that uses a different one; it uses the Google Drive API. phab:T295842 is the task for integrating that (there's no reason it shouldn't also be an option, I think). Sam Wilson 11:37, 7 November 2022 (UTC)
Ah, thanks, you're right and I'm getting those confused. But then it's odd that the old Google OCR gadget is failing (it's getting a 400 Bad Request back) when the new one still works. Was there some recent change to parameters or something perhaps? Xover (talk) 11:41, 7 November 2022 (UTC)
It works again. --Jan Kameníček (talk) 14:14, 17 November 2022 (UTC)
This section was archived on a request by: Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:25, 22 November 2022 (UTC)

There have been some suggestions concerning Author:George Walker Bush by an IP at the author's talk page, so I am posting the link here if anybody is interested to have a look at it. -- Jan Kameníček (talk) 11:11, 8 November 2022 (UTC)

I am sorry, it was just a hidden spam of an LTA. My fault, I should have read it better. --Jan Kameníček (talk) 17:41, 9 November 2022 (UTC)
This section was archived on a request by: Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:26, 22 November 2022 (UTC)

displaying <ref> tag in Help:Editing

The <ref> tags are displayed as &lt;ref&gt; in Help:Editing, section What you type in Wikitext, and it confuses new editors. I failed to find out the reason. Can somebody check it, please? -- Jan Kameníček (talk) 17:26, 8 November 2022 (UTC)

Reverted a change I made in 2019 to get highlighting in markup/row.. Thanks for spotting this. ShakespeareFan00 (talk) 19:14, 8 November 2022 (UTC)
This section was archived on a request by: Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:27, 22 November 2022 (UTC)

Final Report of the Northwest Territory Celebration Commission

Is this 1938 PDF a federal work, and so eligible for inclusion here? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:33, 14 November 2022 (UTC)

The Commission who wrote this work appears to have been created by America's federal legislature and individual state governments (such as my own) view it as a federal work. Per page 3 of the work you listed, it also seems to be a creation of the federal government and paid for with federal funds (note that the feds can give out grants and that does not mean they own the work, of course). As a non-lawyer, I think it's reasonable to conclude this is a public domain work of the United States federal government. —Justin (koavf)TCM 12:44, 14 November 2022 (UTC)
@Pigsonthewing: And even if it somehow wasn't, there's no copyright notice on the PDF, so it would be okay anyway. PseudoSkull (talk) 13:59, 14 November 2022 (UTC)
This section was archived on a request by: Thank you, both. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:33, 22 November 2022 (UTC)

Table is broken in Index space

Does anyone know what's going on at Index:A Welsh Grammar, Historical and Comparative? The table showing the pages and their statuses is broken. —Mahāgaja · talk 17:23, 20 November 2022 (UTC)

Just a space before the table seems to fix the display. M-le-mot-dit (talk) 19:13, 20 November 2022 (UTC)
Thanks for fixing it! Weird that it worked for many years and then suddenly stopped, but at least it's fixed now. —Mahāgaja · talk 20:24, 20 November 2022 (UTC)
This section was archived on a request by: Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:34, 22 November 2022 (UTC)

The Antelope (23 U.S. 66) and other imported US caselaw..

This was originally an imported text from a third party site (the link given on the talk page doesn't seem to be active currently.) I've recently further cleaned it up to tackle the reference errors it was seemingly generating by comparing at least 3 versions side by side. I'd appreciate a review/validation of this against a combination of the good sources:- https://supreme.justia.com/cases/federal/us/23/66/, https://law.resource.org/pub/us/case/reporter/US/23/23.US.66.html and https://babel.hathitrust.org/cgi/pt?id=mdp.49015002890615&view=1up&seq=274).

As the work concerned was part of a semi-bulk import, I have a concern that other US caselaw, might be affected by related issues. The good faith efforts of other contributors to standardise, proofread and validate these against known sources would be appreciated. I would trust that using multiple sources to ensure something a qualitativly accurate version is not an issue contributors to Wikisource would have in the absence of uploaded scans. ShakespeareFan00 (talk) 10:22, 1 November 2022 (UTC)

Hmm it seems this bulk import of US caselaw in places is just a mess, Deletion and starting again with KNOWN scanned editions would be a better course of action. ShakespeareFan00 (talk) 14:02, 1 November 2022 (UTC)

  • ShakespeareFan00: See here for the original printing. The U.S. Reports mass-import was very messy, but I think it is now too difficult to go in and determine which are good and which bad, except on an individual basis. Some entries are spelled or titled incorrectly; some are missing large amounts of text; and yet many others are completely fine, and there is no easy way to determine the good from the bad. TE(æ)A,ea. (talk) 21:15, 1 November 2022 (UTC)

Tech News: 2022-45

MediaWiki message delivery 00:32, 8 November 2022 (UTC)

Guarani language ...

An expression on a user talk page got flagged by Google Translate as being Guarani, a South American language family I'd not heard of before.

I found : Internet Archive identifier: brasilianlanguag00cavauoft which seems to be the only English language work on IA that's related.

Most of the other works, seem to be in Spanish (which is understandable). Do we have any contacts in es.wikisource that might be interested in getting a major work like w:Tesoro_de_la_lengua_guaraní into a scan-backed online version? ( Of course if someone very very great expertise wanted to attempt to translate it into English, then they'd need an entire University department backing them though :( ) ShakespeareFan00 (talk) 11:10, 9 November 2022 (UTC)

Sorry, is your question if we want to have someone translated into English a 17th-century bilingual Guarani–Spanish dictionary? My Spanish is good enough that I could probably do a lot of that with some online help, but I really don't see the value in that as an exercise. I do agree that have it at mul: makes sense, tho. For what it's worth: File:The Brasilian language and its agglutination.pdf.Justin (koavf)TCM 11:20, 9 November 2022 (UTC)
For any onlookers who are interested in the topic: Index:The Brasilian language and its agglutination.pdf. —Justin (koavf)TCM 21:48, 10 November 2022 (UTC)

Poll regarding 7th Wikisource Triage meeting

Hello fellow Wikisource enthusiasts!

We will be organizing the seventh Wikisource Triage meeting in the last week of November and we need your help to decide on a time and date that works best for the most number of people. Kindly share your availabilities at the wudele link below:

https://wudele.toolforge.org/3rNRZDjKn1oaXM7h

Meanwhile, feel free to check out the page on Meta-wiki and suggest topics for the agenda.

Regards

KLawal-WMF and PMenon-WMF

Sent via MediaWiki message delivery (talk) 10:06, 14 November 2022 (UTC)

Wikidata has introduced a new concept and one that can have an impact on how we do things, or could do things around here.

Sitelinks to redirect pages on client wikis have been requested via a community vote and were made were available in October 2022. The main goal of sitelinks to redirect pages is to improve interlanguage links across Wikimedia projects when concepts are described with different granularity in different projects.

Wikipedia articles often cover multiple topics on a single page that are represented by separate Wikidata items. For instance, en:Bonnie and Clyde on English Wikipedia covers the concepts described by the Wikidata items Bonnie and Clyde (Q219937), Bonnie Parker (Q2319886) and Clyde Barrow (Q3320282). Redirect pages on English Wikipedia that are used as sitelinks on Bonnie Parker (Q2319886) and Clyde Barrow (Q3320282) allow users to find the article about the duo and even within the article the section that covers Bonnie Parker (Q2319886) and Clyde Barrow (Q3320282), respectively. Using redirects this way allows the articles such as cs:Bonnie Parker on Czech Wikipedia to have an interlanguage link to the corresponding section in the English Wikipedia article that discusses Bonnie Parker (Q2319886).

  1. Notability
  2. Badges
  3. Local policies
    1. Template:Wikidata redirect
  4. Use cases for intentional sitelinks to redirects
    1. Individual entities in a collection of entities
    2. Subclasses of entities
    3. History of an entity and the entity
    4. Location of an entity and the entity
    5. Disambiguation page and family name
    6. Similar disambiguation pages
  5. Redirects that are unsuitable as sitelinks
  6. Notes

d:Wikidata:Sitelinks to redirects

The thing that I see as an immediate desire is our disambiguation pages, as we typically eliminate the pronouns A, An, The, etc. these can be captured where these alternatives are in use. An example being The Young American.

I think that there is some value in us working through the needed concepts one by one, though I am not certain that doing it here for the detail is the ideal, though can be convinced otherwise. From there we can think about how we fix our local guidance, and add to d:Wikidata:Wikisource (the latter for the broader collective of Wikisources, not just enWS). [Noting that I have still to fully embed myself in the conceptual change, so I am not into the knowledgeable answering of questions yet.] — billinghurst sDrewth 21:23, 14 November 2022 (UTC)

Created an example Discovery Discovery (Q211051) which has "The Discovery"as a redirect, and The Discovery (Q16911473) flagged for intentional redirect. — billinghurst sDrewth 00:36, 15 November 2022 (UTC)

Tech News: 2022-46

MediaWiki message delivery 21:54, 14 November 2022 (UTC)

Print page 56 and the blank page opposite are missing (after djvu position 85). They are available on HathiTrust's copy here. Please interpolate and adjust pagelist accordingly. Thanks, Beeswaxcandle (talk) 20:19, 17 November 2022 (UTC)

Starting at page 5, the text of this upload does not correspond to the page images. BD2412 T 21:27, 21 November 2022 (UTC)

Tech News: 2022-47

MediaWiki message delivery 23:21, 21 November 2022 (UTC)

Difference between "Template:Incomplete" and "Template:Under construction"

I recently stumbled across Template:Incomplete (th:แม่แบบ:กำลังก่อสร้าง over at Thai Wikisource) and it seems to have the same wording and information as th:แม่แบบ:ไม่เสร็จ (the Thai version of Template:Incomplete) (I even asked other people about how similar these phrases are). The Thai version of "Template:Incomplete" seems to have no use at this point in time. From that, I want to know if there's any difference between these two templates? If so, how? --Bebiezaza (talk) 17:29, 21 November 2022 (UTC)

My understanding is that Incomplete is a long term template denoting a long term problem, whereas Under Construction is placed by the editor working on it to ask people to hold off for a while on major changes, while that editor finishes it. UC should be a short term template, quickly removed by the placer after the work is done.--Prosfilaes (talk) 18:00, 21 November 2022 (UTC)
I would typically use "under construction" for a page that I am building, and usually manually, whereas "incomplete" I would use for a work's status, be it manual or through page transclusion. "Under construction" can be in all of our namespaces, though we wouldn't use the template much. Whereas the "incomplete" template we use a lot, though for main namespace pages. For example, the work A Cyclopaedia of Female Biography has now been fully proofread and validated and I am stepping through the transclusions, and I have not added all respective links, so for me that page is under construction, and less so incomplete. For other works, we have the first page and there is a lot of proofreading to be done, so the page is constructed, however, content is missing, so that is incomplete. — billinghurst sDrewth 05:05, 22 November 2022 (UTC)

Invitation to participate in Wikisource Triage Meeting (26 November 2022)

Hello fellow Wikisource enthusiasts!

We are the hosting the seventh Wikisource Triage meeting on 26th November 2022 at 10 AM UTC / 3:30 PM IST (check your local time) according to the wudele poll.

There are going to be updates about a few technical projects related to Wikisource and we will be sharing more information during the meeting.

As always, you don't have to be a developer to participate in these meetings but the focus of these meetings is to improve the Wikisource infrastructure.

If you are interested in joining the meeting, kindly leave a message on sgill@wikimedia.org and we will add you to the calendar invite.

Meanwhile, feel free to check out the page on Meta-wiki and suggest any other topics for the agenda.

Regards

PMenon-WMF and KLawal-WMF

Sent using MediaWiki message delivery (talk) 12:09, 22 November 2022 (UTC)

Tech News: 2022-48

MediaWiki message delivery 20:03, 28 November 2022 (UTC)

Just a heads up, as a part of T308098, we are shipping some changes to the ProofreadPage Openseadragon API this week which should make it easier for userscripts to hook into Openseadragon and have their changes persist when a user changes the image from landscape to vertical. There are also plans to make Wikimedia OCR respect this API and use images provided by userscripts for OCR at 860526.
The older version of the API should mostly work for now, however anyone who maintains scripts that use the older API should migrate over to the newer API, since we intend to deprecate and remove the older API once everyone has migrated. A demo of a script built using the newer version of the API is at User:Sohom_data/openseadragon_minimap.js and the documentation for the API has been updated to reflect the changes in the newer version. (I have added a item to Tech News, but it is supposed to be sent out next week, since this version was frozen for translation :( ) -- Sohom Datta (talk) 06:38, 29 November 2022 (UTC)
@Inductiveload: . Wasn't this one of the changes you were awaiting? ShakespeareFan00 (talk) 12:34, 29 November 2022 (UTC)

Could anybody who knows how to do it transfer Index:Inscriptions de l'Orkhon déchiffrées.djvu and all its pages to French Wikisource? See also User talk:Olgatr2020. -- Jan Kameníček (talk) 16:42, 29 November 2022 (UTC)