Wikisource:Scriptorium/Archives/2023-05

From Wikisource
Jump to navigation Jump to search
Warning Please do not post any new comments on this page.
This is a discussion archive first created in , although the comments contained were likely posted before and after this date.
See current discussion or the archives index.

There are 3 diagrams on pages 30 and 31 of this file that need to be extracted, have the background stripped and uploaded. Can someone help out? Ciridae (talk) 12:22, 19 May 2023 (UTC)

@Ciridae: personal speaking, I think that they would be better as manufactured tables utilising some css in the work's index:....css file or use of {{table style}}. Alternatively, create them afresh in a graphics package. Extracting them and putting them as cleaned up graphics adds no real value. Having them as text means they are searchable. Remember we are not doing facsimile copies, we are reproducing an author's work in our medium. — billinghurst sDrewth 23:46, 20 May 2023 (UTC)
This section was archived on a request by: Ciridae (talk) 05:03, 25 May 2023 (UTC)

Tech News: 2023-18

MediaWiki message delivery 01:45, 2 May 2023 (UTC)

OED1, again

There's an old 2013 discussion about the possibility of taking on the first, 1928 edition of the Oxford English Dictionary. At least one thing has changed since then: the 96th year after the publication of OED1, 2024, is now only months away. (But unfortunately the first supplement didn't come out until 1933.) Come to think of it, another thing which seems to have changed in the past decade or so is that access to digital editions of the OED has become even more expensive and limited, on the whole. A high-quality free transcription of OED1 would be a good 80% solution to that problem (at least once the first supplement can be added to it). Hopefully it would impose some welcome competitive pressure on proprietary dictionary-makers, to boot. Are people minded to begin work on an OED1 transcription on English Wikisource, or indeed anywhere else? (I couldn't find any discussion of the issue on English Wiktionary, but maybe I simply missed it.) It would absolutely be no small project, but at least in terms of raw size it would be no bigger than some of the other projects which Wikisource has completed over the past decade. RW Dutton (talk) 04:11, 1 May 2023 (UTC)

I don't have the capacity to actually participate in such a project, though I would love to see it done. But I'd be happy to help with scan-wrangling and setup, not least in the hopes of getting it to use modern standards instead of the ad hoc approach used by our early massive projects. If we get together the interest to tackle it we should also liaise with Wiktionary to figure out the best way for them to reuse our efforts; either by letting them systematically link to it here, or by enabling them to import it afterwards in a useful way. We should also keep Wikipedia in mind, as they often need to cite dictionary and etymology (early attestations) for words for which OED1 would be very convenient. I don't think there's any useful entity to collaborate directly with there though. Xover (talk) 09:26, 1 May 2023 (UTC)
Most of the OED1 has been PD since before Wikisource. It's valuable, but in sheer size I'm pretty sure it dwarfs the Encyclopedia Britannica project, and unlike the plain text of the encyclopedia, it's got an idiosyncratic phonetic system and complex formatting. I'm all for it, but it's a pretty overwhelming project larger than any done before on Wikisource.--Prosfilaes (talk) 09:45, 1 May 2023 (UTC)
Why not having the w:Concise Oxford English Dictionary, first published in 1911? Yann (talk) 12:04, 1 May 2023 (UTC)
It's not an either/or but Wiktionary wants the obscure words and the extensive citations. At this point in time, another PD standard English dictionary transcription doesn't feel really worth it to me, unless it's the big one.--Prosfilaes (talk) 13:15, 1 May 2023 (UTC)
Just my two cents: I have transcribed a couple of (small, and much much smaller than OED1) dictionaries at esWS. I HIGHLY suggest that you use the ; and : syntax that is meant for definition lists, and do the formatting exclusively by IndexStyles. You can check on my biggest proud there, this etymology dictionary. I also used a template for the abbreviations, to make the code cleaner. EDIT: Also the worflow is massively improved if you make some post-OCR processing script that does the slow and hard part of adding the syntax, standardizing it and add the abbreviation templates, among other things. Ignacio Rodríguez (talk) 23:41, 5 May 2023 (UTC)

Translation: namespace and Wikidata

We seem to have an issue that the pages/works in our Translation: namespace are having their own wikidata items created as unique versions. Can I emphasise to our community that the Translation: ns is set up as a commodity to allow for translations, as there was that desire by the community, and there was no other real wiki where it could happen. The pages in Translation: ns are not true publications as they miss the requirements and notability for publication. They are dynamic documents with no date or place of publication, no translation authors, no copyright, no authority in translation.

There could be the argument made that these pages should never be listed at Wikidata as they fail the notability. I think that such approach is a little harsh, so if we are to link them at wikidata I have been doing so on the version from which the translation has been taken. So for example, a Russian newspaper article that is transcribed at ruWS has the translation here listed against that version, not its own. One can essentially equate this with how Wikipedia articles are all linked to the same item at Wikidata. — billinghurst sDrewth 01:38, 4 May 2023 (UTC)

Translation namespace has many works that are not at other wikisources and thus out of step with our guidance at WS:Translationsbillinghurst sDrewth 10:37, 5 May 2023 (UTC)

Tech News: 2023-19

MediaWiki message delivery 00:36, 9 May 2023 (UTC)

Comment

I quick insource: search shows no evident uses of .tipsy onsite. — billinghurst sDrewth 03:36, 9 May 2023 (UTC)

What is the difference between this and WikiBooks

Pease help 98.253.79.27 15:35, 11 May 2023 (UTC)

On WikiBooks, they are writing new books. On Wikisource, we are converting previously published books and documents. --EncycloPetey (talk) 16:06, 11 May 2023 (UTC)

I need an Aux item without auto-centering

The ToC for History of the Literature of the Scandinavian North does not list the Index, which I need to add to this page.

However, all of the AuxToC elements I can find auto-center on the page, which I do not want. Can someone help? --EncycloPetey (talk) 01:53, 11 May 2023 (UTC)

@EncycloPetey: You can manually wrap it with the class "wst-aux-content" and then style that from the work's Index styles. That's what I did for The Satyricon of Petronius Arbiter: the toc has manual classes in /17 that are styled by rules in Index:…/styles.css (that, admittedly, I cribbed from CalendulaAsteraceae).
PS. That kind of toc I'd have done with a raw table. It feels a little wrong but it's going to be more robust and easier to do, and it gives you the containers you can hang classes off without adding ugly raw HTML to the page. Xover (talk) 05:51, 11 May 2023 (UTC)
Under normal circumstances, I'd use a table, but the hanging indents and over-right page numbers led me to use basic formatting. I'd have used a template, but there is no way to use {{dent}} and adjust the right-hand margin}}.
@Xover I'd love to be able to use a manual wrap, but I have no clue how to do that. I can get the light green box to display, but not the notice about not being in the original. --EncycloPetey (talk) 16:01, 11 May 2023 (UTC)
@EncycloPetey: I led you astray. That trick won't work with a straight up div. I converted it to a table layout just to illustrate how it could be done. Can you use that? Alternately I'll have to try to figure out a way to do it with the div approach, but that'll have to wait until my brain functions. Xover (talk) 18:42, 11 May 2023 (UTC)
I'd like to avoid using a multi-page table unless it is absolutely required. My experience is that the syntax to create complex multi-page tables changes periodically, at which time well-meaning editors go through and update syntax, causing sections of the table to no longer transclude in the mainspace. --EncycloPetey (talk) 19:51, 11 May 2023 (UTC)
@EncycloPetey: Note that I'd still recommend using table syntax for this, even in light of their complexity when spanning multiple pages. But I've revert to the original and then found a way to make the Index entry an auxtoc one. It's not exactly elegant, but I think it's a reasonable tradeoff for a once-off. Does it look acceptable to you? Xover (talk) 05:42, 12 May 2023 (UTC)
I agree and have done mamy tables of that form and have found that the approach with our index:(workname).css has been really useful in this regard. Having the css classes makes the table so much cleaner. — billinghurst sDrewth 10:04, 12 May 2023 (UTC)
@EncycloPetey: See the work My Life in Two Hemispheres which is that exact form and the implementation Index:My Life in Two Hemispheres, volume 2.djvu/styles.css. Adding the green colour would be the next simple addition. — billinghurst sDrewth 10:13, 12 May 2023 (UTC)

 Comment @EncycloPetey: It is a while since I have done it, however the green colour used to be in our global styles through subheadertemplate for background-color: #E6F2E6;. The respective header colours should all be readily callable through a global class, though I know that Xover has his arguments against global classes. There needs to be a happy median to allow easy usability. — billinghurst sDrewth 02:45, 12 May 2023 (UTC)

The problem here is that applying both the background colour to the whole line and the "(not in original)" text requires there to be some structure (html element) that we can target and add it to after. In a table we can just tack it onto the table cell (td), but since this toc is using a div that contains (wraps) both the chapter title and the page number there's no structure there to add it to. The fix is to add that structure; the challenge is doing so without adding too much ugly html salad or interfering with the other formatting. Xover (talk) 05:20, 12 May 2023 (UTC)

I've implemented {{yesno}} in Lua so it can use the logic of Module:Yesno rather than re-implement it. Test cases are at Template:Yesno/testcases. Thoughts? —CalendulaAsteraceae (talkcontribs) 09:50, 14 May 2023 (UTC)

@CalendulaAsteraceae: We import both the template and module from enWP. Modifying it locally means we either have to maintain it completely locally, or we need to re-merge every time we resync from upstream (and MW gives us no tools to do that). What's the value proposition to offset that cost? Xover (talk) 16:55, 14 May 2023 (UTC)
@Xover: That is useful context, thank you. Given that context, I don't think it's worthwhile. —CalendulaAsteraceae (talkcontribs) 20:30, 14 May 2023 (UTC)

The header links (previous / next); subsection links, and other navigation links throughout this work are a complete mess. I found that the "main page" for The Mysterious Island pointed (via redirect) to part of Twenty Thousand Leagues Under the Sea. In fact there is no main page for The Mysterious Island. The "title" link, for another work, points to an internal part of the work, and not to the "title".

I may be able to tackle the problem this weekend, but it will likely take several hours, and I don't know whether I will have the time then. Given the importance and probable high traffic of this work, someone might want to go through and correct all the sections, pages, and links before then.

Incidentally, I only discovered the problem after someone noticed that the end of The Mysterious Island wasn't transcluded at all, and added the final pages. So there may be other parts where sections of the works have not actually been transcluded. The work needs a thorough going-over to fix all mainspace pages in all respects. --EncycloPetey (talk) 18:15, 3 May 2023 (UTC)

To be clear, this does not appear to be the fault of any one person's actions. I see at least half-a-dozen experienced editors with edits made among the pages. It is more likely the result of lots of individual small-scale changes without anyone checking that the large-scale transclusion makes sense. --EncycloPetey (talk) 18:47, 3 May 2023 (UTC)

@EncycloPetey: Very much agree, this issue is something of our own making, and it is only more obvious as we get mature and start addressing these serious major undertakings. Series works, and these posthumous collections of authors are problematic for our display, and we have not handled any of it brilliantly or uniformly. I separately noted this to a user for Loeb Classical Library and its subpages. I think that it is time that we review our approach to these collated works. I know that I harp on about the use of the Portal: namespace, though to me for the work identified the overarching "Works of Jules Verne" maybe better as a portal, and each of the individual components could be set up as the works themselves.

I note that we would only take this approach to the volumised works where they are these latter collations that are essentially series of separate and combined volumes. — billinghurst sDrewth 01:47, 4 May 2023 (UTC)

I agree about this being confusing, see the recent discussion about The Complete Works of Lyof N. Tolstoï and The Novels and Other Works of Lyof N. Tolstoï. Among the issues are:
1. Print / Bounded and hence typically index page vs. thematic divisions, especially where the collected work is itself made up of subcollections, and how that aligns with wikidata and previous / next
2. Creation and display of large number of works e.g. poems / letters
3. Handling portions still under copyright, leading to links to scans, copyright renewal tags and transcriptions in Main, note that these are often not tagged with a license violating our copyright policy
4. Headers aren't really designed to handle Collection --> Volume --> Section hierarchy where we might want to indicate per volume Author / editor as well as section / contributor
5. Looseness around licensing / categorization as they are now subpages
MarkLSteadman (talk) 02:32, 4 May 2023 (UTC)
With respect to moving to Portal, it would be good to clarify our policy around metadata / linking / authority control, / redirects and how to classify them in the portal hierarchy. E.g. Works of Jules Verne has LCCN 14001405 and LOC Classification PQ2469 so presumably this would be a child and linked from Portal:French literature? MarkLSteadman (talk) 02:55, 4 May 2023 (UTC)
@EncycloPetey While I appreciate your comment "this is not the fault of any one person's actions", some of these large works aren't initially set up well, and, as you say, can take a fair amount of time to fix. Personally, I found the page Works of Jules Verne so messy, that I never realised that Works of Jules Verne/Volume 5 was where the work actually started, when it was originally entered into the MC as proofread, but requiring the transclusion be split into chapters (I think the entire work was originally transcluded onto two pages). As a general rule, if a work seems "serious", with multiple experienced editors having made changes, then I am quite hesitant (and perhaps for other non-admin Wikisource users also) to clean anything up, besides the express splitting request in the MC, lest I receive an angry rant from someone about how I shouldn't have made what I thought were "improvements"... Perhaps removing the contents page from Works of Jules Verne, so people actually click on Works of Jules Verne/Volume 5 would be a start, or moving the auxTOC's to within the work.
I also could be mistaken, but I didn't think there was a "main page" for the Mysterious Island in Volume 5, it was just a heading, unless it was implied I should have split page Page:Works of Jules Verne - Parke - Vol 6.djvu/23 into sections, and transcluded "s1" as just the heading, onto its own page. Is there a convention for this? Usually, e.g. with the HG Wells series that is running, there are proper title and contents pages at the start of each work, rather than requiring an auxTOC like for Jules Verne.
Also, the Mysterious Island was in two volumes, and when splitting the translusion, volume 6 hadn't been proofread yet (unless this wasn't what you were talking about).
Regards, TeysaKarlov (talk) 21:50, 5 May 2023 (UTC)
I was not talking about Volume 6; only the parts that have been proofread. --EncycloPetey (talk) 21:58, 5 May 2023 (UTC)

 Comment Compendium works that are split into volumes do not need to be (should not be?!?) be reproduced here as subpages of the respective volumes. It just makes things harder than they need to be, to us the volume index page can still be created, without making any works subsidiary to a volume hierarchical structure. — billinghurst sDrewth 08:08, 6 May 2023 (UTC)

I think it will be hard to make it a should not as the inclination to mirror the index pages is so strong. We see this with magazines were issue number would be perfectly fine being placed in Volume / Issue hierarchies. In this case with volume introductions people naturally incline to Volume 2/Introduction. MarkLSteadman (talk) 04:32, 9 May 2023 (UTC)
It is always tricky This becomes very ugly and very silly quite quickly for no apparent win ...
1. Work of Jules Verne/Volume N/Introduction
2. Work of Jules Verne/Volume N/Title 1/Chapter NN
or
3. Work of Jules Verne/Volume N (one could argue that ToC and introduction can all sit on the /Volume N page)
or
4. Work of Jules Verne/Introduction to Volume N
or
5. Work of Jules Verne/Title 1/Chapter NN
or
6. Title 1 (Work of Jules Verne)/Chapter NN
What are we truly trying to achieve? with the hierarchical mess of volumes which are essentially worthless. Yes, we would need to give good guidance, however, with compendium works there is always time to assist and process. Whereas the volumes of serials are important, that is a whole publishing history, usually of one off editions. We need to separate people's brains from thinking that they are the same just because they both utilise the word volume. — billinghurst sDrewth 04:56, 9 May 2023 (UTC)
Any chance we could start transforming some of your thoughts into prescriptive form, as an early draft of specific guidance on this? Just from the above I see you and Mark have managed to think this issue through to a much grater degree than I have, so speaking just for myself it would be a great help in trying to think about it (easier to have something concrete to agree or disagree with). Nothing fancy, just your thoughts phrased as if they were guidelines: this sort of work should be treated thus, but these other kinds should be done up in this other way.
The only things in this area I have a somewhat formed opinion on are The Satyricon of Petronius Arbiter that was split into vol. 1 and vol. 2 for printing but is obviously a single work (the page numbers even continue across the two volumes, and which should be transcluded transparently as a single work; and things like Johnson's The Plays of William Shakespeare (1765)—that is a cohesive set of volumes with critical commentary on Shakespeare's plays, unlike Portal:The Yale Shakespeare that are individual editions merely published in a series with volume numbers (well, retconned, but...)—that also should live under a single top-level page somehow (not necessarily with a "…/Volume n/…" path component, but possibly). Xover (talk) 06:26, 9 May 2023 (UTC)
My initial thoughts are we have a four-fold split:
1. Volumes in a publisher series, i.e. those works which are published under different titles, generally by different authors at different times, but having the same publisher / editor / presentation. For these works I think some sort of non-mainspaced based organization makes sense but thinking the exact relationship between listing on the publisher's vs. creating a portal page with the appropriate categories / authority control (is this allowed under out policy since we then have "author-like" authorities and "work-like" authorities / licensing (should the whole collection be tagged with an appropriate license?) is unclear to me as well as the cross-namespace redirecting.
2. Volumes in a serial publication, i.e. those works published under the same title but on a regular cadence. Currently these are generally held in main which makes sense (they have a single name and "creator") but here the main issues are all-basically related to handling the varying publication date / editors properly (e.g. indicating the varying editors and dates in the headers, license tagging). It certainly make sense to go from Volume N to Volume N+1 from within the publication and often the volumes have their own TOCs. So generally Publication Name (listing just the volumes / dates / editors) --> Volumes (listing per volume TOC or sub issues) makes sense with Volumes doing previous / next in order. These were published and sold separately at different time.
3. Volumes in a collective work. i.e. those works which are now more uniform in content than case 1 with the scope being generally pre-defined up front, having indices / general table of contents, etc. These are the works between case 1 and case 4. At the most case 4-like you have works first published by authors such as Essays: First Series and Essays: Second Series, Tales of My Landlord (1st Series)/Volume 1, on the other side we have Sacred Books of the East, The Harvard Classics, etc. where now we have multiple translators or authors, the volumes being dual titled on the title page,
4. Volumes in a single work, i.e. those that only have a single title, continuous pagination / chapter numbering etc. In main space and generally not a problem.
My suggestions for a way forward would be:
a. Getting guidance for 1. and 2. started as proposals for revising the appropriate sections of the documentation to be able those in a more prescriptive way (e.g. defining how serial publications should be tagged for licensing, where such collections should live in Portal)
b. Writing up more suggestive guidance for case 3 outlining the various options as laid out in the discussion so far to be added as well
c. For this specific work, maybe right up some proposals on the talk page for the work and then vote?
MarkLSteadman (talk) 16:52, 13 May 2023 (UTC)
FWIW options 1 and 2 above seem the most logical. We can always add redirects. I am glad this issue comes up, as I have had a question about it a few weeks back, but I didn't get an answer. Yann (talk) 08:39, 9 May 2023 (UTC)
Politely disagree. What does a volume level do here? I see no value in such. All published references are not going to be to the compendium, but to the work. All it does is add complexity to the title, for no benefit. — billinghurst sDrewth 11:48, 15 May 2023 (UTC)

Tech News: 2023-20

MediaWiki message delivery 21:45, 15 May 2023 (UTC)

A question about project scope

Does Wikisource consider a text like Five tips for reporting a scam (originally from https://www.usa.gov/features/five-tips-for-reporting-a-scam) to be in project scope accd. to Wikisource:What Wikisource includes? This is just one of numerous such web pages uploaded as PDFs to Wikimedia Commons by one specific user (see c:Special:Contributions/StuckInLagToad) and then added to Wikisource. I'm a Commons admin, and if it weren't for the corresponding Wikisource pages, I'd consider such PDFs as out of scope for Commons. --Rosenzweig (talk) 14:29, 17 May 2023 (UTC)

  • Rosenzweig: I’m fairly active at WS:PD (where scope-related deletion discussion are held), and personally would consider these in scope, although it’s a close call. There is a related discussion about whether publication on government Web-sites is sufficient to be in scope going on right now. TE(æ)A,ea. (talk) 14:57, 17 May 2023 (UTC)
  • Not really as it is essentially a dynamic webpage without clear authority, and is essentially an extract of part of the website as it isn't standalone work. If it was a webpage elsewhere on the web, we wouldn't so that it is a government page makes little difference. It is a grey zone, and in this space I would also be considering the scope of Commons (educational) for pertinence. That said we took a whole lot of NARA posters and like documents, though they did have a bit more of a historical bent, and are not solely transactional. — billinghurst sDrewth 22:51, 17 May 2023 (UTC)

Library back up project

Hi, Some people started this on Commons: c:Commons:Library back up project. The idea is to upload books to preserve them, as Wikimedia is probably going to last much longer than other book hosting websites. Books that are still in copyright are uploaded, deleted just after, and added in the relevant "Undelete" categories. Just FYI as there is obviously a connection to Wikisource. First books added were in Chinese and Japanese languages. I added some more in English, French, and Indian languages. Yann (talk) 20:36, 5 May 2023 (UTC)

You were aware of the the efforts Fae made in terms of mirroring IA hosted works on Commons?, That might be something to continue as well. ShakespeareFan00 (talk) 21:39, 5 May 2023 (UTC)
Main problem with Fae's uploads is that they're mostly pdf despite there being a djvu option available and often the poorer quality scans were selected by the bot, which means that the OCR is dodgy. Beeswaxcandle (talk) 03:45, 6 May 2023 (UTC)
Yes, all files I uploaded from IA were not uploaded by Fae. Yann (talk) 13:06, 6 May 2023 (UTC)
Keep going on English Language works from IA. Areas of interest I have very long-term would be ancient English law reports, but that's a very very long term project, after the 18 months or so it will take to clean up lint-errors. ShakespeareFan00 (talk) 13:11, 6 May 2023 (UTC)
Re: Fae, also the widespread uploading of non-US origin works that are still in copyrighted in their source country breaking Commons's copyright policy. MarkLSteadman (talk) 13:14, 6 May 2023 (UTC)
That was something Fae was working on resolving, when certain ill-considered comments at Commons caused their departure. ShakespeareFan00 (talk) 13:17, 6 May 2023 (UTC)
Just mentioning that determining copyright status automatically for English language works can be non-trivial. But in general, manually getting high-quality scanned files directly out of the digital collections from various libraries makes sense but this is likely to create a third or fourth copy of the same mediocre google books scans that are already hosted at Google, HathiTrust, and IA. MarkLSteadman (talk) 13:41, 6 May 2023 (UTC)
There is a difference between uploading files with the wrong license, and uploading files still in copyright for the purpose of long term safekeeping. The later are deleted after being uploaded, and added in the relevant undelete categories. Yann (talk) 15:31, 6 May 2023 (UTC)
My point was that knowing what undelete category to apply en masse is tricky, unless going with publication date + 140 or something. MarkLSteadman (talk) 02:42, 7 May 2023 (UTC)
Commons uses the publication + 120 years rule, when the author'(s) death date(s) is/are unknown or uncertain. Seeing the life expectancy at the time, this is quite sensible. Yann (talk) 12:25, 7 May 2023 (UTC)
@ShakespeareFan00: Please tell me if you have a list of files. Yann (talk) 20:54, 15 May 2023 (UTC)
I don't have a specfic list, but there were some suggestions on c:Commons:IA books that Fae didn't take up. If I think of some specific areas I'll let you know. ShakespeareFan00 (talk) 21:10, 15 May 2023 (UTC)
I've thought of some volumes that should be hosted on Commons:-
The ones listed as external scans in Template:Ruffhead_volumes are a prime candidate.
ShakespeareFan00 (talk) 17:20, 17 May 2023 (UTC)
@Yann: - I complied a list of "The English Reports" based on work @Technolalia: did, They are also candidates for the Library Backup project :) If you can add to the list on the portal even better. ShakespeareFan00 (talk) 17:39, 17 May 2023 (UTC)
I will look at the Ruffhead volumes for a start. Where is your list of "The English Reports"? Yann (talk) 19:13, 17 May 2023 (UTC)
Portal:The English Reports (Some are Google Books entries though...) ShakespeareFan00 (talk) 19:22, 17 May 2023 (UTC)

Thanks for the post. I want to highlight that the project aims to systematically upload all old books from libraries. This include those work near PD but not yet in PD. They can be deleted after the upload and be restored later.

I think the most imminent threat to library preservation is the Russia-Ukraine war. We should prioritize Ukrainian libraries. If Russian bomb hit Ukrainian libraries the book could all gone.

According to her, the Russians have damaged or destroyed almost 60 Ukrainian libraries since the beginning of the war.

[18]

Do any of you know about Ukrainian libraries websites with scans? Please provide.

The next priority would be Russian libraries. If the war escalates, Russia could be the target. I know many people hate Russia, but their books are innocent and need preservation too. --維基小霸王 (talk) 04:45, 22 May 2023 (UTC)

This is just a mess.. Someone needs to actually sit down and repair the relevant citation template, because its completely *****d up in rendering here. ShakespeareFan00 (talk) 20:26, 21 May 2023 (UTC)

Why does someone need to do that? Document in WS: ns, and one part of a discussion whether to keep that subset of documents or not. You can just ignore it and move on. Don't come stamping your foot about things that show up in error reports, they are just error reports, they are not our boss, they do not set agenda nor priorities. — billinghurst sDrewth 21:31, 21 May 2023 (UTC)
@ShakespeareFan00: I fixed this instance; it was a pretty simple error. No clue how to stop the bot from doing it in the future though. — Dcsohl (talk)
(contribs)
20:23, 22 May 2023 (UTC)

Tech News: 2023-21

16:55, 22 May 2023 (UTC)

Page ns104 pages out of sync with Index-

The scans are - File:The Federalist (Ford ed, 1898).djvu

However there seem to be some extant pages such as Page:The Federalist (Ford).djvu/1 and others?

What is the CORRECT index name, so that pages aren't being created under the "wrong" index? Thanks.ShakespeareFan00 (talk) 06:03, 24 May 2023 (UTC)

Probably because c:File:The_Federalist_(Ford).djvu is a redirect to c:File:The Federalist (Ford ed, 1898).djvu. Every page at Index:The Federalist (Ford).djvu should be redirected. Ignacio Rodríguez (talk) 00:52, 25 May 2023 (UTC)

What is the practice of adding {{Blocked user}} to a user page? There are some in the Category:Blocked users, but it must be just a tiny fraction of all the blocks (including various single use spamming accounts). I am asking as I have noticed that User:Shāntián Tàiláng (who has also been blocked in several wikis) started adding the template to dozens of user pages of blocked spammers (like User:Elvismartin1515). -- Jan Kameníček (talk) 16:37, 10 May 2023 (UTC)

@Jan.Kamenicek: It typically isn't needed as the bulk of what we block are spambots, and very occasionally an LTA. I would only be using it in a situation where we have blocked someone through community discussion. I politely asked the user to stop adding it, and deleted those labelled pages. We definitely don't need someone just coming in and needlessly (cluelessly?) those tags. — billinghurst sDrewth 11:44, 15 May 2023 (UTC)
this is a vindictive practice at other projects, to provide a scarlet letter for editors who will never be unblocked. i.e. "all your user pages belong to us". and we see a practice of exporting vindictiveness across wikis. --Slowking4digitaleffie's ghost 18:30, 25 May 2023 (UTC)

Question

Hello!

A long time I asked for help to upload a book from Haithtrust: Index:Brazilian short stories. Some user went to help and uploaded the individual pages. But a few days ago I managed to download and upload the PDF rom Google Books: File:Brazilian Short Stories.pdf. Some admin could update the Index page?

On the same author, there's this book translated by Aubrey Stuart. Someone has any clue who he was? Since the English-language text was published in Brazil, I want to be sure that it is really PD right here.

Thanks, Erick Soares3 (talk) 14:11, 18 May 2023 (UTC)

Yes, as it is published in 1926, it is in the public domain in USA, and therefore OK for Wikisource. Yann (talk) 14:38, 18 May 2023 (UTC)
Yes, but following what you said at Commons, I also need to be sure that the translation is PD on Brazil, since the book was published at Rio de Janeiro. Erick Soares3 (talk) 16:43, 19 May 2023 (UTC)
And after doing some research, there's basically nothing on the translator - I'm not sure if I should just assume that he died long enough to be PD on Brazil. Erick Soares3 (talk) 17:14, 19 May 2023 (UTC)
If it is not in the public domain in Brazil, it could be uploaded to Wikisource. If it is in the public domain in Brazil and in USA, it can be uploaded to Commons. Yann (talk) 18:41, 19 May 2023 (UTC)
@Yann: the thing is: there's no reference at all of when the translator died, so I'm not sure if it is a case of assuming that he died +70 years ago (would be highly improbably that any heir would complain). Erick Soares3 (talk) 13:35, 20 May 2023 (UTC)
@Erick Soares3: Umm, we don't publish works here on the basis that someone doesn't complain. If the translation is not in the hoe country and the US, then it cannot be hosted at Commons (their rules); if was published to put it into the public domain in the US then we can host it. — billinghurst sDrewth 15:32, 20 May 2023 (UTC)
@Billinghurst: my issue is more in the line of what we should do when we don't have enough information about the translator (e.g. when he died). We just don't publish it here? I attempted to research in a Brazilian newspaper archives about him and ended empty-handed. Erick Soares3 (talk) 20:07, 20 May 2023 (UTC)
@Erick Soares3: English Wikisource reproduces English language texts based solely on US copyright provisions, and it would see that the translated work was published to put it out. With regard to author research, please document what you can on the author's talk page, positive and negative searches. That is what I do and it all helps, especially as reference when populating Wikidata. It also helps us review when we move works to Commons from here. Possibly not for a long time in the case of the identified work. — billinghurst sDrewth 23:50, 20 May 2023 (UTC)

 situational coment We don't have a good methodology for identifying a more contemporary author where we have no death date and we are hosting a scan of their work. We cannot easily identify when WD may get the requisite death information, to inform us when something may be transferrable. @Xover: can you think of a way that we can label/utilise {{do not move to Commons}} and author death date populate or not. We possibly can run an author capture for all works for which we hold scans, then run a check against a list of authors using Petscan looking at those items in WD for the presence/absence/test of that date. Guessing that such a check is suitable once a year in line with when we do our start of year move to Commons clean-up. — billinghurst sDrewth 23:59, 20 May 2023 (UTC)

Hmm. Tricky to automate. {{do not move to Commons}} doesn't know who the author is, unless we put a major effort into connecting all our scans → editions → works → authors at Wikidata. Templates/modules are also bad at catching changes (no death date → death date added). If we had machine-readable author information on all our File:s we could probably bot-script a periodic task that lists all files that currently has an author with a death date in the likely range for pma. 70 expiration. If we trust the |expiry= on {{do not move to Commons}} we could then filter out those on the assumption someone has manually checked it already. But at that point I think we're probably better off just manually checking all files with {{do not move to Commons}} which does not have |expiry= set. A tracking category for that should be trivial to add. Xover (talk) 05:41, 21 May 2023 (UTC)
@Xover: Category:Media not suitable for Commons/not listed (distinct from Category:Media not suitable for Commons/test). —CalendulaAsteraceae (talkcontribs) 04:20, 25 May 2023 (UTC)

I think that this project is a waste of space on Wikisource. Why host it? As it is, it's useless. I downloaded the zip file, in which the book is in single leaves and various formats. This is all image work. I would not suggest recreating the text with our fonts.

If I upload the single pages to the commons, inserting the them as is, is problematic because the pages are in landscape layout. Separate the text as an image, and place it above the images for a portrait layout.

Can anyone suggest what else can be done? — ineuw (talk) 20:26, 21 May 2023 (UTC)

Here is how a different Book Dash work was done A Beautiful Day and Zanele Situ: My Story and there are other problematic transcluded works like When I Grow Up. MarkLSteadman (talk) 13:32, 25 May 2023 (UTC)
Ugh. That's rather awkward, yes. I don't think we should add any more works like this, and certainly not encourage adding them. But what to do about the existing ones? Deleting them seems… harsh. But fixing them seems impossible with our current platform functionality (decent webfont support at a minimum, but there's more that would be needed to do it justice).
PS. Just to be clear: I think it'd be awesome if we could host these works. I just don't see any way we could, currently, that isn't doing both the work and our readers an injustice. Xover (talk) 18:10, 25 May 2023 (UTC)

Text image over art image is what I imagined. Something close to A Beautiful Day. — ineuw (talk) 08:06, 27 May 2023 (UTC)

Selection of the U4C Building Committee

The next stage in the Universal Code of Conduct process is establishing a Building Committee to create the charter for the Universal Code of Conduct Coordinating Committee (U4C). The Building Committee has been selected. Read about the members and the work ahead on Meta-wiki.

-- UCoC Project Team, 04:21, 27 May 2023 (UTC)

Policy against just dumping OCR raw in ns104?

It's already an unwritten guideline that contributors shouldn't just dump raw OCR into Page: namespace, but I am not sure if this had been phrased into a formalised guideline/policy.

What are the thoughts of other contributors? ShakespeareFan00 (talk) 08:26, 25 May 2023 (UTC)

I believe you're talking about people marking raw OCR text as "proofread" in Page namespace?
In that case, I agree that it's pretty annoying because that's a lot of rework for someone to go through and recheck, and it breaks our workflow if that person then marks it as "validated". I don't mind if raw OCR text is merely saved in Page namespace as I expect someone will end up proofreading it eventually.
But in cases where someone marks raw OCR text as proofread, I think we should have a policy to revert those edits. And a talk page warning to the person doing that. Ciridae (talk) 08:40, 25 May 2023 (UTC)
@Ciridae: We have fairly clear established standards for using page statuses, and inappropriate page statuses can definitely be reverted and should be taken up with the contributor on their talk page (politely). In farthest consequence we can block people for this, like other non-constructive behaviour, if lesser measures fail to address the problem.
But I'm pretty sure SF00 is referring to filling all the Page: pages in an Index: with raw OCR. We have no policy explicitly prohibiting that, which has led us to have ~1 million such pages (more than 30% of the total number of pages in Page: namespace). And if you don't mind such pages you are, probably, in the minority: I have no empiric data but my experience suggests most contributors do not want to work on texts that are already filled like that. Xover (talk) 18:37, 25 May 2023 (UTC)
There's no policy against it (otherwise we wouldn't have ~1 million of them sitting around), and there's no strong precedent that they are deleted when nominated. Personally I think we should ban this, both because having such pages sitting there is problematic in themselves and because it encourages bad practices (dump the raw OCR, transclude it, and then just split; nobody wants to work on such texts so they sit there forever in that state making enWS look like a ghetto). But nobody much has expressed support for such a ban so far. I expect they'll come around when the number of raw OCR dumps exceed our number of actually Proofread pages, but then I am an eternal optimist. Xover (talk) 18:23, 25 May 2023 (UTC)
I was certainly finding in some of the delinting efforts, that more than a few of the 'unproofread' pages were raw dump from OCR. I generally at least try to do a little cleanup on a New page, even if I save it as un-proofread.
Of course match and split pages that haven't been proofread yet, is a different issue, and those SHOULD be retained as typically, there is some kind of standards being applied, even if it's not a direct scan match :)
ShakespeareFan00 (talk) 18:32, 25 May 2023 (UTC)
the text dumpers will always be with us. it is unclear to me that a policy with warnings and blocks is better than trying to pivot them to our better proofreading practices. a million page backlog is a feature not a bug. --Slowking4digitaleffie's ghost 18:36, 25 May 2023 (UTC)
Nobody is suggesting warning templates and blocks as a primary feature. But so long as we permit this practice we have no basis on which to ask them to behave differently, much less use a stern tone of voice when doing so. If we prohibit this practice we can simply tell them we don't permit raw OCR here and channel their energies into actually proofreading and perhaps gain a productive long-term contributor in the process. A million page backlog of raw OCR already exists: it's called the Internet Archive. Importing it here will only serve to drown the project in crap and make all our contributors leave in disgust. Xover (talk) 18:49, 25 May 2023 (UTC)
actually internet archive has 4 million books,[21] so let’s say 100 million pages. So we are a small percentage of scanned pages available. We ran this experiment, where german and english wikisource were the same size in 2008; and here we are 15 years later, and english has 162 times non-proofread pages, but also 5.9 times proofread and 2.3 times validated. non-proofread is flat over the last year, as proofread and validated increase, i.e. not out of control. i would like to keep up with the french increase in proofreading, but that would suggest recruiting more editors. --Slowking4digitaleffie's ghost 17:49, 27 May 2023 (UTC)
My other concern is that currently there is no way of determining between 'raw' pages on which no effort has been made and those that aren't yet at proofread standard. By saying 'raw' pages aren't acceptable, a non-proofread page should have going forward have had at least some human (or bot) cleanup on it. I will also note that sometimes a 'raw' OCR dump is one of the first things I replace on a Page before proofreading it given that OCR technology has improved over the 15 years of so since Wikisource started. ShakespeareFan00 (talk) 19:13, 25 May 2023 (UTC)
In general I agree. Luckily we are getting to the point where we at least have a source / scan because tracking down a high-quality scan of whatever particular version was dumped is a huge pain.... At least the headers and footers should be cleared up and a minimum of effort put into cleaning up the nonsense from OCR like a cat walking on the keyboard. MarkLSteadman (talk) 21:00, 25 May 2023 (UTC)

 Comment The opening statement is very confusing as it doesn't give circumstance. Are we talking about a page that is backed with a scan? Are we talking about the extraction of the text from the pdf/djvu layer? Are we talking about someone doing a paste of text from another source to the the Page: with scan. That clarity would help to make comment.

  1. There is no issue with anyone loading a scan-backed page in Page: ns and marking it as not proofread. Zero. I will regularly do it for biographical works as often I want to be able to set up a search on them so I can find individual biographies to reproduce as needed, rather than a p. 1 to end scenario.
  2. There is a problem when anyone marks a page as Proofread without having proofread it.
  3. There is an issue where people transclude pages that do not exist, and
  4. there can be a problem though not always where they transclude pages with not proofread pages.

With the last two dot points, it is incumbent on us politely talking to that person and explaining to them our processes. My exception to the last dot point, can be in our biographical works, I can proofread a section of a page and transclude it, however, the whole page itself is not proofread, so not had its page status changed. Where these things are problematic and not going to be quickly resolved, then those page creations should be deleted. De-lintering pages should not be our primary concern. They are indicator errors, they should not drive perfectly visible and consistent pages where they exist. That a page appears in a de-linter list should not be a source of criticism here where it is displaying fine. Please do not be overly judgemental about people's so-called behaviour unless it is clearly problematic. We experts should not be pretentious of newbies, simply supportive. — billinghurst sDrewth 08:10, 26 May 2023 (UTC)

Thank you for addressing the concern I had. My concern was mostly about 'mass' creation of non-proofread Page: namespace's (so there should nominally be a scan), where there wasn't at least some attempt to clean-up some of the
more glaring scan errors or omissions, or even what has been previously described as a 'plain-text' proofread.
I appreciate Wikisource isn't applying the same level of pedantry as Distributed Proofreaders though :)
ShakespeareFan00 (talk) 08:19, 26 May 2023 (UTC)
If you are seeing Page: ns pages showing up in the linter lists, at various stages of proofreading then you should be asking through a phabricator ticket that the Linter process allows filtering based on page status so you can focus on Status 3 and 4 to fix. You should not be focusing on anything not proofread. It is no different to commons typos that escape the view of proofreaders like "bom" for "born" where I focus my efforts on status 3 and 4 pages, and ignore status 1 and 2. Demand for our project its usability. — billinghurst sDrewth 09:12, 26 May 2023 (UTC)
We seem to agree where the focus should be. I will certainly consider raising a request for a specfic feature as you suggest, unless there is someone here that would like to create a 'Page-status' highlighter as a local user script, that puts a suitable background-color on the table-cells or links?
In respect of proofread Page: with Linter-errors:-
ShakespeareFan00 (talk) 09:40, 26 May 2023 (UTC)
In respect of filtering by 'page status' https://phabricator.wikimedia.org/T337543 , I'm not holding my breath given that the response to my previous ticket regarding limiting the reporting to 'content' namespaces did not generate the desired functionality requested.
ShakespeareFan00 (talk) 09:40, 26 May 2023 (UTC)
@ShakespeareFan00 A Page-status highlighter should be fairly trivial to implement, I can take a look at this over the week. Sohom Datta (talk) 14:40, 29 May 2023 (UTC)
@ShakespeareFan00 User:Sohom_Datta/page-status-highlighter.js is a quick and dirty script that I created to do it. Sohom Datta (talk) 15:40, 29 May 2023 (UTC)

Tech News: 2023-22

MediaWiki message delivery 22:03, 29 May 2023 (UTC)

Bulk De-linting.

Following the strong reactions, I have received on my User talk pages concerning efforts to repair and reduce the number of LintErrors remaining in Wikisource, I'm abandoning the current effort until there's some kind of guidelines written on how to it should be done responsibly if at all.

I also have one request requiring admin action, Can an admin 'suspend' the AWB permission I have, as I am not sure I actually need that access to continue with normal proofreading/validation efforts? ShakespeareFan00 (talk) 23:50, 29 May 2023 (UTC)

Transcribe Text, Preload + Tesserac + Main namespace questions

  1. Would it be possible for the user to disable the blinking of the Transcribe Text icon, on the toolbar (Vector legacy skin 2010)? I am correcting hundreds of proofread pages and it's distracting, disturbing and interfering. Earlier, I thought that it only blinks when I log in and start editing. Now it's blinking on every page regardless of the status.

# Preloading and Tesserac scanning are both slow. I know that these are not caused by computer hardware or my internet speed. Do I have any additional options to speed up the process from its current speed? Is it because of Wikipedia servers? I am looking to explore the reasons before asking my ISP.

  1. Am I permitted to add to each Main namespace page I created, and where is appropriate, the {{default layout}} template with the "4" value? I have no other way to indicate my intention for those who just want to read the page and unfamiliar with the displays. — ineuw (talk) 20:15, 2 May 2023 (UTC)
@Ineuw: Please present pages where you are having issues. I know of no blinking icon in my transcription work. Preloading what? How? Where? Examples please. First time load? Everytime load? First contributors are able to choose a default layout for a work where it makes sense for the work. It is not meant to be about personal preference but what best suits the work. unsigned comment by billinghurst (talk) .
From what I've seen, the pulsing blue dot only appears the first time you proofread a page in a new browser. Simply logging out and back in doesn't trigger it, but using a new browser (or using your browser's anonymous mode) does. You can clear it by actually clicking on it, but unfortunately that has the side effect of actually transcribing the page you are looking at—undesirable if you are validating (and depending on your style, maybe undesirable in general). But after you've cleared it it should be gone for good, unless, again, you switch browsers or always use private browsing. If it's not, that sounds like a phab ticket to me. — Dcsohl (talk)
(contribs)
14:10, 18 May 2023 (UTC)
@Billinghurst, @Dcsohl: I didn't answer because the dot disappeared and felt foolish. Now, it's back again, and it pulses when the mouse pointer is near. It seems to appear only on created but untouched/unedited pages. It's not the dot that bothers it's the pulsating. It's quite disturbing to my vision. — ineuw (talk) 13:48, 2 June 2023 (UTC)
@Dcsohl: what you say is correct. I recreated my Firefox profile prior to posting. I must have done the same before my original post. I rebuild the profile fairly often because of Firefox synchronization issues. — ineuw (talk) 16:28, 4 June 2023 (UTC)
It was a very poor design decision to put that setting into a browser cookie rather than a bit of information on your account. You shouldn't have to see that every time you erase cookies or switch browsers, but here we are. — Dcsohl (talk)
(contribs)
20:02, 6 June 2023 (UTC)
Thanks for the info. I mention it, because it's a lot more serious for anyone disturbed by blinking lights like me. Now that I know what causes it, and how to stop it, I'll manage. — ineuw (talk) 00:43, 9 June 2023 (UTC)

Caved in and reported it when I saw it earlier. — ineuw (talk) 01:42, 9 June 2023 (UTC)

I've put in a patch to link it to a specific user account instead of a browser profile. We do need a pulsating dot (or some kind of indicator) since it draws attention to the tool making it more likely for people who do not know about the tool to use it. Sohom (talk) 06:13, 9 June 2023 (UTC)
Instead of a pulsating blue dot I think a tooltip would be less distracting, particularly one that could be dismissed without activating the tool. I see tooltips like this on lots of sites when they want to call attention to new features. — Dcsohl (talk)
(contribs)
16:34, 9 June 2023 (UTC)
@Dcsohl I feel like the popup overlay will be much more distracting/intrustive for user, especially since it covers a part of the page image that you might want to proofread.
Wrt to being able to dismiss it without activating it, I can look into that. Sohom (talk) 01:32, 11 June 2023 (UTC)
It’s true it wold be more intrusive, but it was the easiest way I could think of for there to be an [x] that could be used to dismiss it without triggering the transcription process, and if it only happens once for a user across all browsers… anyway, we greatly appreciate you looking into this! — Dcsohl (talk)
(contribs)
14:20, 11 June 2023 (UTC)
The [x] idea sounds good, but I modified the patch to close the popup and remove the dot if you click away from the popup, since that seemed the most natural motion that somebody would do if they did not want to activate it. :) Sohom (talk) 18:00, 11 June 2023 (UTC)

┌─────────────────────────────────┘
I asked to close this issue because the problem was resolved. I remembered the steps after seeing this forgotten page of links. — ineuw (talk) 09:56, 20 June 2023 (UTC)

This section was archived on a request by: — billinghurst sDrewth 00:43, 28 June 2023 (UTC)

ocrtoy-no-text

Scans fail to be displayed in the proofreading extension. I started experiencing the problem a couple days ago, when it often took a long time for the thumbs to be displayed, and I often had to reload the page several times. Now they seem to stop being displayed completely. I only receive a message <ocrtoy-no-text>. -- Jan Kameníček (talk) 11:00, 28 May 2023 (UTC)

@Jan.Kamenicek: I'm seeing broken image loads too today. I suspect an infrastructure issue and am trying to raise the WMF operations people.
PS. The weird error message you're seeing is from my OCR script. It tries to prefetch OCR on page load, and what you're seeing is a non-localized error text that just means the OCR backend produced no text (probably because it too failed to load the page image). Xover (talk) 11:28, 28 May 2023 (UTC)
@Jan.Kamenicek: I don't suppose you can pinpoint when the problem started with any more precision? It can help the server admins figure out where the root of the problem is. Xover (talk) 12:11, 28 May 2023 (UTC)
@Xover: I am sorry, I cannot. It must have been a few days ago when I first noticed the images of scans take longer time to appear, but I thought that it is just due to some problems with connection and so I did not think about it much :-( --Jan Kameníček (talk) 12:56, 28 May 2023 (UTC)
@Jan.Kamenicek: Can you check now and see if it's better? I'm seeing somewhat slow image loads, but no images actually failing to load altogether, and since image loads are usually quite slow here we're at least within reasonable distance of normal. Xover (talk) 13:26, 28 May 2023 (UTC)
@Xover: The image appears, than almost immediately disappears, and after some time appears again. So the work is slowed, but at least possible. --Jan Kameníček (talk) 17:21, 28 May 2023 (UTC)
@Xover: Now it goes well, so hopefully the problem has been solved. Thanks! --Jan Kameníček (talk) 18:19, 28 May 2023 (UTC)
FYI, I have also noticed slow or no thumbnails for PDF files on Commons, so this is most probably not related to WS. Yann (talk) 15:19, 28 May 2023 (UTC)

@Xover:I am experiencing the problem today again. Images upload very slowly, sometimes only after a few refreshes, and the message <ocrtoy-no-text> sometimes appears too. --Jan Kameníček (talk) 15:52, 3 June 2023 (UTC)

This overall should be fixed ? Let us know if you still have issues loading OCRed images. Sohom (talk) 20:04, 11 June 2023 (UTC)