Wikisource:Scriptorium

From Wikisource
Latest comment: 2 hours ago by Jan.Kamenicek in topic Translations
Jump to navigation Jump to search
Scriptorium

The Scriptorium is Wikisource's community discussion page. Feel free to ask questions or leave comments. You may join any current discussion or start a new one; please see Wikisource:Scriptorium/Help.

The Administrators' noticeboard can be used where appropriate. Some announcements and newsletters are subscribed to Announcements.

Project members can often be found in the #wikisource IRC channel webclient. For discussion related to the entire project (not just the English chapter), please discuss at the multilingual Wikisource. There are currently 470 active users here.

Announcements

[edit]

Proposals

[edit]

Bot approval requests

[edit]

For meta:Global reminder bot - the bot will rarely run here, but this wiki requires explicit authorisation, so putting it here. Please ping me in a response. The bot flag is NOT required. Leaderboard (talk) 09:34, 7 November 2024 (UTC)Reply

Repairs (and moves)

[edit]

Designated for requests related to the repair of works (and scans of works) presented on Wikisource

See also Wikisource:Scan lab

The existing scan is incomplete, so I will be replacing it with a complete version. To support this, please carry out the following page moves.

  • Index page name = Index:Mathematical collections and translations, in two tomes - Salusbury (1661).djvu
  • Page offset = 1 (i.e. /10 moves to /11)
  • Pages to move = "10-456"
  • Reason = "inserted missing pages"

Thanks Chrisguise (talk) 16:28, 4 September 2024 (UTC)Reply

@Chrisguise: Done Xover (talk) 16:47, 4 September 2024 (UTC)Reply
Thanks. I've uploaded the new file. Chrisguise (talk) 18:25, 4 September 2024 (UTC)Reply
Hi, really sorry about this but I missed a couple of variations when I requested the move above. Could you please do the following:—
  • Index page name = Index:Mathematical collections and translations, in two tomes - Salusbury (1661).djvu
  • Page offset = 1 (i.e. /115 moves to /116)
  • Pages to move = "115-274"
  • Page offset = -1 (i.e. /409 moves to /408)
  • Pages to move = "409-454"
  • Reason = "realigned pages"
  • Delete = /705 & /706
  • Reason = "pages not in work"
Thanks Chrisguise (talk) 14:21, 5 September 2024 (UTC)Reply
Hi (again). Could you hold fire on this request. There's something odd going on with the index page. When I click on some pages they show a page image from the new file (gold trim to covers is visible) and some show the original file (plain brown cover edges visible). I've tried purging the Commons page where the file resides and the individual pages on WS, and things seem to be improving (slowly) but everything has clearly not properly updated yet. Chrisguise (talk) 14:42, 5 September 2024 (UTC)Reply
@Xover Whilst there are still one or two pages of the old scan showing, they do not affect the pages needing to be moved. Could you do the two moves and deletion set out above? Thanks, Chrisguise (talk) 15:41, 10 September 2024 (UTC)Reply

Recently found that a near-complete scan of the fanzine this story appeared in (and confirming that it was printed with no copyright notice) was on the internet, so the existing partial scan can be moved to the new index.

-ei (talk) 00:38, 26 October 2024 (UTC)Reply

Other discussions

[edit]

Looking for Indiana/Indianapolis pages to transcribe or proofread

[edit]

In preparation for the WikiConference North America 2024 editing challenge, I'm looking for any pages within the Indiana or Indianapolis (where the conference will be held) to transcribe or proofread. Can someone point me to outstanding tasks or categories which fall within this scope? OhanaUnitedTalk page 14:33, 1 September 2024 (UTC)Reply

Two other ones, more specifically about Indianapolis: Early Indianapolis, 1919 (27 p.), and Centennial History of Indianapolis, 1920 (72 p.). — Alien333 ( what I did
why I did it wrong
) 22:34, 1 September 2024 (UTC)Reply
Works by Jacob Piatt Dunn could be of interest. See c:Category:Jacob Piatt Dunn. —Justin (koavf)TCM 19:26, 2 September 2024 (UTC)Reply
Thanks for these ideas. I think I will use Index:A standard history of Lake County, Indiana, and the Calumet region (IA standardhistoryo01howa).pdf as demonstration and tutorial. Can someone proofread a few pages for this book so that I can also demonstrate validation to the audience? @Alien333: I like your suggestion for Early Indianapolis, 1919 as the total number of pages is small enough that it can be tackled by all conference participants. Can you set up that publication? OhanaUnitedTalk page 22:25, 2 September 2024 (UTC)Reply
Done, here it is: Index:Early Indianapolis.djvu. (You may want to familiarize yourself with WS:SG and H:T for formatting.) Cheers, — Alien333 ( what I did
why I did it wrong
) 23:19, 2 September 2024 (UTC)Reply
The text doesn't seem to be automatically transcribed. Are there additional steps needed to OCR the text? OhanaUnitedTalk page 02:54, 3 September 2024 (UTC)Reply
There's a button on the top right, marked "Transcribe text". — Alien333 ( what I did
why I did it wrong
) 11:23, 3 September 2024 (UTC)Reply
If you update the file description page with the actual IA identifier I can regenerate the DjVu with a OCR text layer (I have custom tooling for that). Xover (talk) 12:27, 3 September 2024 (UTC)Reply
Done, assuming you only wanted it to be written somewhere. — Alien333 ( what I did
why I did it wrong
) 12:48, 3 September 2024 (UTC)Reply
@Alien333: Done Xover (talk) 14:11, 3 September 2024 (UTC)Reply
(@Koavf: Early Indianapolis was supposed to be for an editing contest next month. Don't do the next one we set up, will you :) ?)
@OhanaUnited: We're going to have to set up another one, as Koavf already did this one. Fine with Centennial History of Indiana?
@Xover: For curiosity's sake, what do you use for OCR? — Alien333 ( what I did
why I did it wrong
) 15:55, 3 September 2024 (UTC)Reply
Oh sheesh. What a moron. :/ Sorry guys, I just totally misread this. If you really want, I wouldn't be offended if you deleted my work. What an idiot. —Justin (koavf)TCM 15:57, 3 September 2024 (UTC)Reply
I think A standard history of Lake County, Indiana, and the Calumet region should have sufficient number of pages for proofreading and Early Indianapolis for validating. It actually makes the contest verification process simpler by splitting tasks into different books.
@Koavf: it's fine. We will let the editing contest participants focus validation on Early Indianapolis.
@Alien333: I don't see the button that says "transcribe text". Do you need specific userrights to use this tool? Or only on pages with no text transcribed? We are (ok, it's just me at the moment) planning to do an introduction workshop to different sister projects and the newbies are going to ask the same kind of questions as myself. OhanaUnitedTalk page 17:30, 3 September 2024 (UTC)Reply
Once again, I have failed upward. Thanks for your graciousness: I'll be sure to not touch that other work. —Justin (koavf)TCM 17:32, 3 September 2024 (UTC)Reply
Errm, are you sure that when you edit a page in Page: namespace, you don't see anything that looks like what's described there? It's not a very visible button, but normally it should be there. This is supposed to work with all skins, and to need nothing special.
If we're lucky, Xover or someone else will graciously do the OCR beforehand, but this really, really isn't supposed to happen. — Alien333 ( what I did
why I did it wrong
) 17:40, 3 September 2024 (UTC)Reply
Nope. I don't see anything like that on the top right. I remember seeing this button during demonstration session at Wikimania Singapore last year. I see buttons like page logs, analysis, search and subpages. I'm on Firefox and even tried Chrome (and god forbid, Microsoft Edge). But definitely nothing related to OCR on my end! That's why I thought I didn't have the userrights to do OCR yet. OhanaUnitedTalk page 05:16, 4 September 2024 (UTC)Reply
Would you mind uploading (locally) a screenshot of what the editing window looks like for you in Page: namespace? If the OCR button is missing you might have other problems. — Alien333 ( what I did
why I did it wrong
) 07:33, 4 September 2024 (UTC)Reply
Here is how it looks on my end. OhanaUnitedTalk page 21:53, 4 September 2024 (UTC)Reply
"facepalm" Aaand in the end it was just a question of enabling the editing toolbar (also called 2010 wikitext editor, in the Editing section of preferences). I'd forgotten that that toolbar was not default. Note: while we're at it, gadgets that are not default but that I think greatly facilitate editing experience and should be recommended to new editors (by label):
  • Preload useful templates such as header, textinfo and author in respective namespaces.
  • Add a toolbar button to check for and insert a paragraph-breaking {{nop}} at the end of the previous page.
  • Running headers: Load running headers from surrounding pages
Alien333 ( what I did
why I did it wrong
) 22:25, 4 September 2024 (UTC)Reply
Yes. It is now showing on my end. Why isn't the editing toolbar enabled by default? And is there a particular "preferred" OCR? Or pick the one that does the best job for that page?
I think what you described regarding headers and textinfo may be too advanced for complete newbies to the project. The purpose of the tutorial session and the editing challenge is to get existing editors to try their hands on editing in other sister projects. OhanaUnitedTalk page 16:56, 5 September 2024 (UTC)Reply
For normal text, use the Google OCR mode, it's god the best accuracy, most of the time. It has a few drawbacks, in which cases it's better to use Tesseract:
  • It's quite bad at locating columns of text, e.g. gives "texta textb textc textd" in a column layout like this: texta
    textc
    textb
    textd
    rather than the correct "texta textc textb textd". This is also a problem for TOCs.
  • It always transcribes Small Caps as CAPITALS, which means you have to retype it, whereas Tesseract at least tries to render it in the correct case, so if you've got lots of smallcaps you might want tesseract.
Alien333 ( what I did
why I did it wrong
) 17:38, 5 September 2024 (UTC)Reply
Just finished sister projects presentation which included Wikisource. 3 new editors joined during the session, including one who's fluent in multiple languages and participated in non-English Wikisource language projects. There will be more activities by new editors between now until Sunday to participate in the editing challenge. OhanaUnitedTalk page 18:51, 4 October 2024 (UTC)Reply
The first 48 pages of Index:Constitution of the state of Indiana and of the United States (IA constitutionofst00indi 0).pdf have the 1851 constitution of Indiana, with footnotes. We are always looking to back mosre copies of constitutional documents with scans of published copies. --EncycloPetey (talk) 21:17, 4 October 2024 (UTC)Reply

We have finished the challenge. About 6-8 editors contributed to Wikisource part of the challenge. And out of those contributors, 2-4 are brand new editors. During the tutorial session, I have observed Japanese, Korean and Spanish Wikisource being edited as well. Next year's event will take place in New York City, and you will find me bugging you guys again this time next year :) OhanaUnitedTalk page 21:24, 21 October 2024 (UTC)Reply

Why are unregistered users (IP editors) unable to change the page status of index pages?

[edit]

Hello all,

Before I get to my real question, a little context:

First, I am aware that according to Wikisource policy: "Page status is visible to all users, though can only be changed by users logged into their Wikimedia user account. As such, IP editors cannot change page status."

Second, for those who aren't aware, when what I consider trusted IP editors contribute to the MC, I often put in bot requests for the pages to be marked proofread, see, e.g. Wikisource:Bot requests/Archives/2022, Wikisource:Bot requests/Archives/2023, Wikisource:Bot requests/Archives/2024 and Wikisource:Bot requests. Besides context, these archives also provide a non-exhaustive list of many of the contributions of IP editors, for the sake of judging the extent/value/accuracy of IP editor proofreading. I say non-exhaustive, because many of the works of IP editors have also been marked proofread manually, and it is this "issue", of manually checking pages which have essentially been proofread, which is motivating this post.

Third, I am aware that according to Wikisource policy: "This means that administrators have access and rights above and beyond the range of normal Wikisource contributors. The added rights generally fall into two categories: maintenance and fighting vandalism." and so appreciate that admins are necessarily going to be a little defensive. But I also think this can become excessive, and that maybe a middle ground can be sought elsewhere (i.e. this discussion).

If you have got this far, onto the real question: how do any and all Wikisource users feel about allowing IP editors to mark pages "without text", "problematic" or "proofread", assuming there are no technical issues prohibiting this change?

At this stage, I do not want to excessively argue the issue, as Wikisource seems to descend into unending discussions far too often. But my reasoning generally follows from a) There are a large number of pages marked proofread by non-IP (logged in) users which are very far from perfect, such that IP edits are of a higher quality than a not-small number of the total proofread pages on Wikisource. However, far from perfect non-IP edits are not subjected to a manual checking of each page marked proofread (besides being tagged as not patrolled), but IP edits + bot requests can easily be opposed on such grounds. b) While I accept that allowing IP editors to mark pages "proofread" could allow a user to mark a page as proofread, and then log in to validate, I still believe most validation is essentially good faith, i.e. it is not easy to detect or oppose when a non-IP (logged-in) user actually validates a page, compared to when they just mark the page as validated for the sake of it.

Hopefully I haven't overlooked anything obvious and thus wasted my time in typing this.

And to summarize, opinions on allowing IP editors to mark pages "without text", "problematic" or "proofread" are welcome.

P.S. Although this post is in a location I actually frequent, if you do have any questions for me specifically, please maintain the good practice of pinging me.

Thanks, TeysaKarlov (talk) 05:00, 21 September 2024 (UTC)Reply

@TeysaKarlov: Do you mean Index: pages, or Page: pages, as the title mentions index, but your message seems to be about pages. — Alien  3
3 3
07:00, 21 September 2024 (UTC)Reply
@Alien333. Sorry for any confusion, the title should have read "of an index's pages" or "of indices' pages", or however it should have best been pluralised and made possessive. To clarify, everything refers to Page: pages. TeysaKarlov (talk) 20:40, 21 September 2024 (UTC)Reply
If IPs have the option to mark page status, this allows any change from any status to any other status. This has been a source of misuse in Wikisource's past. --EncycloPetey (talk) 17:06, 21 September 2024 (UTC)Reply
@EncycloPetey Sorry, time got away from me a little with this, but would you be less concerned if IP editors could only change the page status (without text, problematic or proofread) up until any logged in editor marks the page either proofread or validated. For example, once the logged in editor upgrades the page status, that page becomes indefinitely locked to an IP editor, but can still be changed by other logged in users. Or do you also have other concerns? Regards, TeysaKarlov (talk) 07:27, 1 October 2024 (UTC)Reply
As an example: setting a page to "Without text" also blanks the OCR, which causes problems for other IP editors if there was supposed to be text on the page. Under your proposal, the page would remain blank until a logged user checks it. This means having to additionally patrol "without text" pages, which are normally not much of a problem. This is just one instance which it creates a lot of additional work in a small community. We also have IP editors who hop from address to address. If you find such an IP editor has made a mistake, there is no means to alert them unless you get lucky and message them while they are remaining at an IP. These are just two points that come to mind. I would want additional feedback about other common issues experienced editors have seen or forsee. --EncycloPetey (talk) 07:43, 1 October 2024 (UTC)Reply
@EncycloPetey Thanks for the response, but could you perhaps clarify a few things:
"Without text" also blanks the OCR, which causes problems for other IP editors if there was supposed to be text on the page. - Are you saying that IP users cannot press transcribe text? If this is currently the case, is there any reason changing this would cause further trouble?
This means having to additionally patrol "without text" pages, which are normally not much of a problem. - Why do you have to patrol the pages if the OCR is blanked, given that they aren't likely to have been transcluded into mainspace until they are proofread? The only issues I see in this regard is if one IP user messes with what another IP user is doing, and I am not sure how that creates more work for logged in users.
Thanks, TeysaKarlov (talk) 20:58, 1 October 2024 (UTC)Reply
But allowing such a change means that proofread pages that have been transcluded could be blanked by IPs with the touch of a button, whether purposefully or accidentally. There is nothing in the process of proofreading that distinguished between transcluded pages and untranscluded pages. And if you're an IP, and you come across a blanked page, would you know to use an OCR tool, even if it existed? Would you know to look into the history of the page to see if it had been blanked? Or would you start typing the text again from scratch, or perhaps even skip working on the page? Someone blanking the page might blank raw OCR, or proofread text, or validated text. --EncycloPetey (talk) 21:11, 1 October 2024 (UTC)Reply
@EncycloPetey Given you seem to be especially concerned about without text pages, how does this modified example of an implementation sound: IP editors can mark a page proofread or problematic, but not without text or validated. However, once a logged in user makes any edit on the page (or if you would prefer, even just patrols the page), then the page status is locked to all IP editors, and can only be changed by a logged in user? Regards, TeysaKarlov (talk) 20:30, 2 October 2024 (UTC)Reply
I think it would work if administrators were able to grant IPs a temporary rights to change page status. Rights granted to a specific IP for a designated length of time, after seeing that the IP is active and contributing to Wikisource. The grant would be time limited because IPs come and go. We would not want a permanent change that would then have to be tracked. --EncycloPetey (talk) 18:07, 1 October 2024 (UTC)Reply
The main problem with this is that even our most prolific IP editor is moving between several IP addresses as they work on the Monthly Challenge works. They tend to be on the same address for a session, but the length of the session is variable. So, the period for temporary rights is difficult to predict. Beeswaxcandle (talk) 06:17, 2 October 2024 (UTC)Reply
But then the option is: patrol every page individually versus spot an IP who's back at it and flag them as OK. We typically have only a two or three such repeat IPs on any given day. --EncycloPetey (talk) 20:40, 2 October 2024 (UTC)Reply
@EncycloPetey, @Beeswaxcandle Out of curiosity, does anyone have any further comments about the above, given that at least one IP user is back proofread pages in force? Also pinging @Xover, in the hope of finding out whether any of the above is/isn't viable (i.e. allowing IP editors to mark pages without text/problematic/proofread, so long as no logged in user has made edits to the page, or, temporarily flagging IP's as okay in some way or another). Of course, anyone else who can comment on technical viability, feel free to. Thanks, TeysaKarlov (talk) 20:58, 11 October 2024 (UTC)Reply
Has abuse of IP status been a frequent problem, in the past when it's been allowed? If most IP editors are good-faith, how about tools to revert an editing session or set of editing sessions from IPs? A single bad-faith editor will probably use a predictable set of IPs and also have a predictable pattern in their edits. I find it hard to imagine someone abusing IP editing by marking random pages from random IPs, with no machine-recognisable pattern.
Especially if IP marking is allowed, making the interface a bit easier to learn might be a good idea. We could test it on wiki-naive people and see if they understand it immediately. Making it easier to get started would win us more new people, it always does, in pretty much any context. HLHJ (talk) 00:15, 13 October 2024 (UTC)Reply

Accents in translated foreign works

[edit]

For works translated into English with accents in the original names or phrases, should we keep those accents? Curuwen (talk) 00:42, 28 September 2024 (UTC)Reply

If you are working on an original translation, then it's up to you. Otherwise, you will need to follow the convention used in the edition that you are proofreading. —Beleg Tâl (talk) 00:49, 28 September 2024 (UTC)Reply
For the actual title or a work or an Author's name used in a wikilink, often a redirect is used (e.g. Author:Moliere to Author:Molière, Quran to Qur'an, etc.) MarkLSteadman (talk) 02:41, 30 September 2024 (UTC)Reply
There are key-combo settings, browser extensions and other tools that make them easier to type, if that's useful?
Diacritics are phonemic in many languages; Kuchen and Küchen are not the same, and an English speaker I knew who walked into a German bakery and tried to buy kitchens caused genuine confusion, resolved only when she gave up on the phrasebook and pointed at the cakes she wanted. Proper names, placenames, and phrases can change meaning according to the diacritics; you might accidentally call someone something insulting by leaving them off. Even if the meanign is clear, someone reading a word without the accents will almost certainly mispronounce it (many languages have hilarious malpropism-phrases stereotypically spoken by language learners who don't understand the accents yet). Rendering "Utúlie'n aurë!" as "Utulien aure!" pretty much guarentees than an English speaker will hideously mispronounce it as "ewe-tlee-en our" or "you-tyoo-lee-en ow-ree" or some such; the diacritics cue stress and syllabification. In extreme cases, the lack of accents can actually make a text unintelligible, as the famous poem Lion-Eating Poet in the Stone Den illustrates. HLHJ (talk) 03:29, 13 October 2024 (UTC)Reply
First of all, aurë entuluva! I agree that lack of accents can cause major confusion. I will err on the side of including accents then. Thanks for your examples. Curuwen (talk) 03:43, 13 October 2024 (UTC)Reply
Hantan tyen! Ironically, in a post just a little further up this page, I spelt "naïve" as "naive". The new javascript reply windows unfortunately do not have the special-character menu that a normal editing window has. Good luck with it! HLHJ (talk) 17:21, 13 October 2024 (UTC)Reply
Curuwen, I've found it (unless this is a default and you already have it). Click "preferences" at the top of the screen, and under the "Editing" tab, tick the box labelled "Enable the editing toolbar This is sometimes called the '2010 wikitext editor'." Click "save". Now you should have a "Special characters" drop-down menu at the top of every editing window. Should make it easier. Unless it was already enabled. HLHJ (talk) 01:56, 15 October 2024 (UTC)Reply

Keymapping issue help needed.

[edit]

For a few months now I have been unable to manually enter the following keys in my editing interface. For { I see ̪ and for } I see ˈ . Also < sometimes but not always gives some other character (its working ok at the moment so I can't tell what character it was displaying.)

This isn't a fault with my keyboard because if I open notepad the characters display as expected, and it is not specific to a browser because I see the same issue in edge and firefox, so perhaps it is some issue with my interface / profile settings.

Can someone advise how I can reset my interface to see if that clears the issue? I'd like to retain any customisations I may have made in the past (but can't remember what they would have been) so is some kind of backup possible first? Thanks Sp1nd01 (talk) 19:26, 7 October 2024 (UTC)Reply

Are you on a Mac? There are language settings on Macs that allow one keyboard to type as if it were a specialist keyboard in another language. I have (for example) sometimes accidentally set my keyboard into Czech mode, which can cause issues like the ones your describing. How you change the language on a Mac will depend on your current OS and on certain other settings. It may be as easy as clicking on the flag in the top of your monitor window and switching to the flag of your usual language. --EncycloPetey (talk) 19:41, 7 October 2024 (UTC)Reply
I forgot to mention I use Windows 10, but I have just checked the Language settings and Keyboard Settings and the only Language installed is English so I don't think that is causing the problem unless there is some additional setting hidden somewhere. It's not a major problem for me I just have to do additional copying and pasting when I need to use those characters. Thanks Sp1nd01 (talk) 08:12, 8 October 2024 (UTC)Reply
Does this happen on any website, or just Wikisource (how about other Wikis)? I don't think the issue is with your user js, but it's possible it might be some combination of the gadgets or your editing preferences you have enabled. Or do you use the same browser extensions on both Edge and Firefox, or programs like WinCompose or AutoHotKey, which can change your input based on which program you're using? --YodinT 09:08, 8 October 2024 (UTC)Reply
This happens when you use "Internal Phonetic Alphabet - SIL". Click in the bottom right on the edit field to see the little keyboard icon, or ctrl-M to return to "Native keyboard". You may also "Disable input tools". // M-le-mot-dit (talk) 09:32, 8 October 2024 (UTC)Reply
This was always just an issue on Wikisource for me, I had not seen it anywhere else.
On editing it was showing the "Internal Phonetic Alphabet - SIL" option, I had not noticed that option before so I don't know why it had changed unless I may have accidentally click on it at sometime. I have changed it back to "Native keyboard" and all is now working normally again. Many thanks everyone for the assistanceǃ Sp1nd01 (talk) 10:13, 8 October 2024 (UTC)Reply

Three part story

[edit]

I have a three part story where each third was printed in the paper each week. It was never published as one piece, would we stitch together the three parts into one entry, as well as keep the three parts? RAN (talk) 03:26, 9 October 2024 (UTC)Reply

We just had a discussion about handling this kind of publication. Check the archives for "Serialized works in periodicals (voting)". --EncycloPetey (talk) 04:11, 9 October 2024 (UTC)Reply
See also Wikisource:Style_guide#Serialized works in periodicals. --Jan Kameníček (talk) 20:09, 12 October 2024 (UTC)Reply

Redirect

[edit]

Do we ever have a redirect at Portal:Person when the person is at Author:Person? --RAN (talk) 03:40, 9 October 2024 (UTC)Reply

No. We do not use cross-namespace redirects. --EncycloPetey (talk) 04:11, 9 October 2024 (UTC)Reply

Shortcuts

[edit]

Today, I've noticed keyboard shortcuts were available ("ctrl-i" for italics and "ctrl-b" for bold). Are there new others? // M-le-mot-dit (talk) 09:39, 9 October 2024 (UTC)Reply

ctrl-u yields an underline. Cremastra (talk) 20:15, 9 October 2024 (UTC)Reply

Substitution of images

[edit]

If we have the original image, say from the Library of Congress would we display that one, cropped the same way, to replace the poor high contrast scan in a news article entry? RAN (talk) 19:07, 12 October 2024 (UTC)Reply

Well, imo it can sometimes be done in this way if the only difference is the quality of the scan, i.e. if it is absolutely the same image, cropped in the same way, of the same colours (as the same image reprinted in various books can have small differences in colours caused by different reprinting techniques), etc. Can you give an example? --Jan Kameníček (talk) 19:19, 12 October 2024 (UTC)Reply
BTW: Images reprinted in newspapers usually have a special "newspaper" low-resolution appearance, which should imo be preserved and such images should not be replaced by "better" images e. g. from books. --Jan Kameníček (talk) 19:23, 12 October 2024 (UTC)Reply
  • It has long been the practice to replace lower-quality images with higher-quality originals, where the only reason the lower-quality version is lower-quality is because of printing restraints. This is the case, for example, where an image (originally in color) is printed in black-and-white in a book (or newspaper) without color printing. See, e.g., The Vampire (Summers). TE(æ)A,ea. (talk) 19:48, 12 October 2024 (UTC)Reply
    I am afraid that the link is exactly the example of a bad practice, because the two pictures have been cropped differently. The picture that was added to our transcription is missing parts of bodies of the women on the right. --Jan Kameníček (talk) 19:53, 12 October 2024 (UTC)Reply
I'd just replaced a very poor overinked impression of a woodblock with a much better impression of the same woodblock from another edition of the book, here. If this is a problem, please let me know. HLHJ (talk) 03:35, 13 October 2024 (UTC)Reply
This is imo absolutely OK, because the bad quality is not connected with this edition as such, but only with this particular specimen. --Jan Kameníček (talk) 10:06, 13 October 2024 (UTC)Reply
Great, thanks. Physical impressions of images (aquatints, etchings, etc.) all tend to deteriorate with number of impressions, and are sometimes altered quite significantly when they get retouched. And I've seen 21st-century reprints which completely mess up the original images, reprinting high-res glossy photo plates at 300dpi, or with glaringly unoverlookable compression artifacts, so that you can't actually see the features described in the captions. I've even seen modern books that managed to overtrim their margins and chop off the edges of the misaligned text. Obviously, where possible, we should start from a good printing job! HLHJ (talk) 17:37, 13 October 2024 (UTC)Reply
As the uploader of the image in question, I am not happy with it wither. However, the image is protected in the EU, so options for locating copies were limited, and I used what was available. If a superior copy becomes available, it should be used instead. --EncycloPetey (talk) 20:05, 15 October 2024 (UTC)Reply
One specific example that is routinely replaced is images of signatures. —Justin (koavf)TCM 01:17, 14 October 2024 (UTC)Reply
Not a good practice imo either. A person's signature usually changes a bit with time, and so replacing a signature with its older or newer version is quite misleading too. It is always much better to extract its actual version from the document. --Jan Kameníček (talk) 05:55, 14 October 2024 (UTC)Reply
In the only example I can find, it was swapped out for the svg version of the same signature from the same source from the same date. --RAN (talk) 21:34, 14 October 2024 (UTC)Reply

Regex

[edit]

I'm not managing to add regexes to my editing window as I documented at Wikisource:Regex. The UI is just absent. Any advice? HLHJ (talk) 21:10, 13 October 2024 (UTC)Reply

I think that installing w:en:User:Ebrahames/Advisor at User:HLHJ/common.js will give you a regex search and replace box. —Justin (koavf)TCM 01:03, 14 October 2024 (UTC)Reply
Thank you! Just a box? I'd really like the ability to write my own regexes and save them and run them with a single click, which is what the Meta:TemplateScript promises... Has anyone got it to work? HLHJ (talk) 02:18, 14 October 2024 (UTC)Reply
Have you switched on the gadget in your preferences? Beeswaxcandle (talk) 06:08, 14 October 2024 (UTC)Reply
...No. Thank you.
I have expanded Wikisource:Regex accordingly; it now has step-by-step instructions for making it work. I will add some less trivial cookbook examples if people think the documentation useful.
To everyone: Regex is like a really advanced search-replace, where you can include patterns like "any number from 1 to 300" or "whatever letters came before the first number in the line" or "all lines containing only one character". Is there any repetitive editing task you'd like to semi-automate? Please, suggest it, and if possible, I'll write up how to do it. HLHJ (talk) 19:31, 14 October 2024 (UTC)Reply

OCR rotated?

[edit]

I can rotate pages to proofread them, but the OCR always runs on the unrotated page. Could it be made to run with the page oriented as in the display, please? It would save effort on pages where all the text runs vertically. HLHJ (talk) 02:20, 14 October 2024 (UTC)Reply

:With the toolbar enabled, in the upper right hand corner just above the scan page, there are buttons that will zoom and rotate. With the toolbar disabled, there are key combinations that will rotate it -- but I have only ever accidentally done this and have forgotten which key com does this. It would be handy to remember it.--RaboKarbakian (talk) 10:56, 14 October 2024 (UTC)Reply

Sorry, right answer to a different question. The way I handled this is to upload a rotated page to commons, and then manually putting the address (to the original image, link located under the thumbnail at Commons) into the ocr at the web site. The web site can be found by choosing "Advanced options" or some such when using the ocr button. At the ocr page, they have crop options but not rotate, unfortunately.--RaboKarbakian (talk) 11:00, 14 October 2024 (UTC)Reply
About the forgotten keys, when you click on the facsimile, you may use
  • r rotate cw.
  • R rotate ccw.
  • s pan up
  • w pan down
  • a pan right
  • d pan left
  • f mirror horiz.
M-le-mot-dit (talk) 13:06, 14 October 2024 (UTC)Reply
That is very ingenious. I mean no offense when I call it a kludge. But for many applications, transcribing the text by hand without any OCR would be faster. It seems like changing the software to rotate the image and then send it to the OCR would save a lot of human time.
Actually, why don't we have an OCR that generates markup and Wikisource templates (and links to Commons images auto-uploaded using ws-image-uploader) instead of just plain text? We could feed it the raw images, the image metadata, and the OCR text; with have a good body of training data. The Transkribus software all seems to be under compatible licenses. Shall we ask the WMF for this? If we could double or triple our digitization speed, it would be well worth it.
I have just been formatting a TOC with {{dotted TOC line}} templates, and it was very repetitive and took about an hour, and nearly as much again trying to get the syntax working. And it's still indenting in the wrong direction and I've probably messed it up slightly in several other ways. It's the sort of thing software could do better than I can. HLHJ (talk) 02:56, 15 October 2024 (UTC)Reply
That Table of Contents looked very good! After accomplishing few of those, I suspect you should be able to regex them. There are a lot of TOC that are very similar, and a software solution could be found, for sure, because where ever regex is, a software can go there and do that. Some TOC and indeed, many of the other tables (like those found in the scientific journals) are not so cut and dry like most tables of contents. The problem with a software solution is that the art of making tables will be lost and those tables without a software solution might never get authored. I fear a world where the robot overlords are not so much evil rule keepers, but are the only things that can do things and the poor humans cannot break out of their cookie cutter lives and evolve. I like doing tables, personally, and actually no. I don't like doing tables so much as having it done; so grumpy in the middle and such.
Bad OCR vs typing: it was easier to type in patents than it was to fix the OCR, but I am a fast typist. The bad OCR was due to the bad scans, and more current OCR did not help much. My guess is that an OCR output that also contains layout guesses will make Validation more like editing a work and less like perfecting a work. I heard of complaints from gutenberg when OCR got really good, it became boring for proofers. They don't do layout at the same time so it is different there. I like good OCR, mostly because here the layout can be done at the same while. Proofing is a task for nit pickers and without the nits, the pickers get bored.
The kudos on your TOC were real, as it was not even one of the most usual of them. You might consider something from Category:Texts with missing tables to hone your skill and craft your regex and consider software solutions (yes or no). Sorry to ramble on.--RaboKarbakian (talk) 13:42, 15 October 2024 (UTC)Reply
Thank you for the kudos! I will certainly be adding some regex cookbook entries on TOCs and ordinary tables. I've also noticed that some tables are pseudo-tables, with the body actually only being one row, which would make it pretty impossible to read across a row if you are using a screenreader, so I may write something to convert that, or convert the tab- or space- or comma- separated table you can readily get by copy-pasting a pseudotable into Gnumeric and out into plain text again.
On the philosophical front, I am not worried about robots becoming the only ones who can do things, because history.
Spinning enough thread to make one suit of clothes per person per year used to be a full-time job for half the adult population, assisted by a lot of child labour, everywhere where clothing was required. The spinning wheel more than tripled the labour efficiency of spinning. The industrial revolution's backbone was the automation of spinning, which now takes up a comparatively negligible portion of the world's labour.
I am glad to live in a world where I can buy a decent full suit of clothes for less than a day's minimum wages. This is wealth. But I also know a fair number of people who spin (and weave). I even know a fair number who spin with a drop spindle. They don't spin or weave all their own clothes, just special articles, but they know how. I don't, but I can build a drop spindle, or a spinning wheel, or a loom.
Improved accuracy from more sensor readings, especially in the southern hemisphere; more understanding of the planet; and more raw compute power.
They knew how to do weather prediction in the 1700s. But they couldn't collate the data and do the arithmetic fast enough to produce the weather prediction before the weather actually happened, and last week's weather forecast is an academic curiosity. There were elaborate schemes for a huge amphitheater-like room, with serried ranks of calculators (the human sort) surrounding a conductor who could light different areas in different colours to tell them who was ahead and who behind the pace (I've forgotten the name of that plan, but wish there was a Wikipedia article on it).
Now, of course, we have automated sensor networks and very high-powered computer models and better scientific understanding and rapid radio communications via cell mast and satellite; better tools. Yet people still own stormglasses and barometers and read clouds. It has, indeed, become profitable to pay people to study clouds in far more detail. Weather predictions are one of the most insanely profitable things a government can do, which is why even quite incompetent governments do them, and quietly co-operate with their worst enemies to do them.
Chi-squared tests are useful, but doing a reasonably-sized one by hand would literally take a lifetime.
Photography did not eliminate representational art, nor even the tradition of photorealistic painting in oils. The increasing ubiquity of high-res cameras has not, contrary to some very bizarre predictions, lowered the photographic skill shown in the average photo, because Septembers are not actually eternal. The ubiquity of word processors has not destroyed literary merit in written texts. Electric instruments did not kill the luthier's craft. Robots have enabled the crafts of circuit design and 3D printing.
There's a book I came across a while back:
  • Seymour, John (1984). The forgotten crafts (First American ed.). New York. ISBN 0394539567.
It's an abundantly-illustrated tribute to (European; lapstrake boats, but no dougong rooves) things people don't do anymore; and yet the author almost invariably finds that there are still a few people doing them. The exceptions are highly hazardous things no-one would do if they had a choice, and large-scale, expensive industrial processes that have been functionally replaced. We can't make vacuum tubes like we used to, yet, but that's because of formal industrial secrets being lost; and the tube industry is now reviving!
Nor is this restricted to "Western" culture. Japan has a massive industry in preserving traditional crafts in live use.
When we make better tools, we expand our capabilities. Humans actually seem very good at not completely forgetting old skills, especially in societies where more people have leisure, but even in disaster scenarios like the Fall of the Roman Empire, where people really didn't (Europeans did not forget how to make Roman roads; sans Empire, it just became uneconomic). I'd worry about loss of traditions among people who are barely scraping a living, but mostly cultural traditions, things like language and narrative, not technologies. Here, everything we do is recorded ~forever.
I am sure there are some people who would find better OCR tools spoilt the fun. But
  1. They don't have to use them.
  2. There are also people who like the result more than the process, and they will do more if it costs me less time and tedium.
  3. There are people who like a lot of things about the process, but not the bits the better tools could take over.
I'm in that last cat, but also sometimes in the second! HLHJ (talk) 18:31, 16 October 2024 (UTC)Reply
About TOCs, don't use {{dotted TOC line}}, {{dotted TOC page listing}}, and some others, because they make a separate html table for each line, resulting in unnecessarily huge output. Rather, use {{TOC begin}}, {{TOC end}} and the various TOC row's (listed at TOC begin).
Also, never link TOCs to pagespace, but always to mainspace, because it's from the TOC that readers will access, from the root mainspace page, its subpages. I haven't done that, as I'm not actively working on that book and I don't know what titles things should have.
About OCR, training an OCR model is a lot of work. We'd also have to do something very different, as normal OCR engines have much trouble recognizing things that are obvious to humans. Take for example font size. On pages like this one, the OCR engine, which is fed the text page by page, has nothing to compare to, so can't differentiate it from a book just printed in small type. — Alien  3
3 3
14:55, 15 October 2024 (UTC)Reply
Rather than using either TOC template set, I've been using a plain table and index styles to format my TOCs. That's the option I've found the best so far, both because it keeps the amount of wikicode down (fundamentally, the TOC row series is a wrapper for table row syntax with some classes and styles attached), and because all the formatting can be done in one place -- also, did you know you can do dot leaders in pure CSS? Arcorann (talk) 23:12, 15 October 2024 (UTC)Reply
Last time I checked, dot leaders had been planned by the W3C but has not actually been implemented by browers. — Alien  3
3 3
05:38, 16 October 2024 (UTC)Reply
Seems, to still be so, as "leader('.')" is filtered out by Firefox as an invalid value for content. — Alien  3
3 3
05:59, 16 October 2024 (UTC)Reply
This is interesting, but since a newbie cannot be expected to know all this, nor can it be gleaned from the template docs, is there something like Help:Table of contents (formatting)?
I am absolutely not suggesting that we train an OCR from scratch; that would indeed be a lot of work. But the Transkribus OCR is under a wiki-compatible license (as, I trust, are the others?). I'm suggesting that we add markup capabilities: adding {{c|}} around centered text, scanning two columns of text sequentially rather than cross-cutting them, scanning tables into Mediawiki tables, and so on.
It seems like it would be possible to set a character height on the scanned images as standard, cross-page, and that would let a program determine if the text was larger, xx-larger, xxx-smaller, etc. Computers are better at measuring large numbers of similar things quickly and precisely; I'm happy to do the things that require human judgement, and troubleshoot, because those IMO are the interesting bits :). HLHJ (talk) 18:38, 16 October 2024 (UTC)Reply
(Sorry for the docs issue, we do have a big one, but eh)
The current about best thing for issues like that is w:Document Layout Analysis, but that only detects the placement of blocks of text.
The question, then, is how do you define centered? You could say it's "equal margins on either side", but then, margins of what? Lines? If so, OCR often breaks up lines, so wouldn't bring you anywhere. Blocks? Then it'd be adding {{c}} to poems, for example. Or even, suppose line and perfect line detection. Images are always cropped, not always equally, the original "natural" margins present on either side of the text aren't necessarily equal, in multiple column text there are multiple centers, and &c, so the margins aren't good. It looks like a simple issue, but it is often a headache (and even for human, I for one have often encountered issues where you can't really tell if something is centred or not). — Alien  3
3 3
19:37, 16 October 2024 (UTC)Reply
Thank you for the link, I hadn't seen that! Adding c to poems wouldn't be too bad. Easy to replace manually. Just something basic that would do the headers and page numbers would be useful. Or something that let us manually select rectangular areas (as with Commons image annotations, which see little use) and OCR them in sequence, that would do multicolumn text. We could even label areas as images and not OCR those. A modified manually-set-x-by-y-grid version would do tables. HLHJ (talk) 23:19, 16 October 2024 (UTC)Reply
Have you considered making a Phabricator request re: rotating the image on the advanced options page? Arcorann (talk) 23:18, 15 October 2024 (UTC)Reply
That's a good idea, I'll try to get around to it, unless someone else does. :). HLHJ (talk) 14:56, 16 October 2024 (UTC)Reply
It's been done: task T87017. Pinging User:ShakespeareFan00, who made the request. HLHJ (talk) 18:33, 16 October 2024 (UTC)Reply

Preliminary results of the 2024 Wikimedia Foundation Board of Trustees elections

[edit]

Hello all,

Thank you to everyone who participated in the 2024 Wikimedia Foundation Board of Trustees election. Close to 6000 community members from more than 180 wiki projects have voted.

The following four candidates were the most voted:

  1. Christel Steigenberger
  2. Maciej Artur Nadzikiewicz
  3. Victoria Doronina
  4. Lorenzo Losa

While these candidates have been ranked through the vote, they still need to be appointed to the Board of Trustees. They need to pass a successful background check and meet the qualifications outlined in the Bylaws. New trustees will be appointed at the next Board meeting in December 2024.

Learn more about the results on Meta-Wiki.

Best regards,

The Elections Committee and Board Selection Working Group


MPossoupe_(WMF) 08:26, 14 October 2024 (UTC)Reply

Tech News: 2024-42

[edit]

MediaWiki message delivery 21:21, 14 October 2024 (UTC)Reply

Bars and manicules and other old timey items

[edit]
☞

Do we have a page that shows the bars and manicules and other old timey page flourishes, so I can match the closest ones to use in a transcription? They are easier to match by sight rather than by name. Do we have a Help:Flourishes or Help:Visual Elements, or something similar? Is there a collective name for all these types of visual elements? RAN (talk) 01:52, 16 October 2024 (UTC)Reply

I'm not aware of such a page, and it sounds like a good idea! The following pages might be helpful:
CalendulaAsteraceae (talkcontribs) 22:19, 16 October 2024 (UTC)Reply
I'm not exactly sure how they could be organized. But, it'd be an extremely helpful page. I'd say, let's be bold and create it. SnowyCinema (talk) 23:03, 16 October 2024 (UTC)Reply
I just looked these up. They don't look very old timey though. I made a list of things I needed for a while. User:RaboKarbakian/Symbols. There was an extra special challenge ( ) to get the text (and not emojified) astrological symbols.--RaboKarbakian (talk) 01:35, 17 October 2024 (UTC)Reply
It's a shame that Unicode Charcter charts aren't necessarily license compatible with Wikisource.. (Or possibly in scope , for that matter).. ShakespeareFan00 (talk) 20:36, 18 October 2024 (UTC)Reply
IIRC theres also a BIG chart/list of symbols attached to a US -GPO style manual that various contirbutors here tried to put the relevant unicdoe symbols in? . ShakespeareFan00 (talk) 13:42, 17 October 2024 (UTC)Reply
ShakespeareFan00, you mean this: U.S. Government Printing Office Style Manual/Signs and Symbols, also, thank you for fixing the sun and moon!--RaboKarbakian (talk) 14:21, 17 October 2024 (UTC)Reply
The "floral heart" in Unicode is termed an "aldus leaf" on Commons, and there is a category of various orientations there. --EncycloPetey (talk) 20:37, 18 October 2024 (UTC)Reply
Side note, but while searching I found both {{manicule}}: , and {{finger}}: . These two probably need to be merged. — Alien  3
3 3
05:16, 17 October 2024 (UTC)Reply
Good catch! —CalendulaAsteraceae (talkcontribs) 06:03, 17 October 2024 (UTC)Reply
I recently created {{fleuron}}, which may be of interest to this discussion. —Beleg Tâl (talk) 22:33, 22 October 2024 (UTC)Reply
Also of note: Category:Special character templatesBeleg Tâl (talk) 22:34, 22 October 2024 (UTC)Reply

Seeking volunteers to join several of the movement’s committees

[edit]

Each year, typically from October through December, several of the movement’s committees seek new volunteers.

Read more about the committees on their Meta-wiki pages:

Applications for the committees open on 16 October 2024. Applications for the Affiliations Committee close on 18 November 2024, and applications for the Ombuds commission and the Case Review Committee close on 2 December 2024. Learn how to apply by visiting the appointment page on Meta-wiki. Post to the talk page or email cst@wikimedia.org with any questions you may have.

For the Committee Support team,


-- Keegan (WMF) (talk) 23:09, 16 October 2024 (UTC)Reply

[edit]

Due to formating issues and other problems. Help would be appreciated due to its gargantuan size. Booklover09097 (talk) 19:26, 18 October 2024 (UTC)Reply

Can you clarify? —Justin (koavf)TCM 23:38, 18 October 2024 (UTC)Reply
Moved from Wikisource:News/2024-10 where it was placed incorrectly. --Jan Kameníček (talk) 22:00, 18 October 2024 (UTC))Reply
This project, being very close to my heart, begs the question. What are you talking about, and what are the specific issues? — ineuw (talk) 04:20, 24 October 2024 (UTC)Reply

how accurate is transcribe?

[edit]

Whenever i'm on a page and the text is juttery and clunky, i press transcribe and the text looks pretty good. But how accurate is the transcribe button in relation to the text? Booklover09097 (talk) 09:29, 19 October 2024 (UTC)Reply

That is mostly dependent on the quality of the image. With lower resolutions, it gets garbled.
It also depends on the OCR engine used. For on-site OCR, Google OCR is the best one for character accuracy, although it has trouble with columns.
What you see when you create a page is the OCR that was embedded in the file before upload. The best engine off-site is probably Tesseract. The people who did that might have not used the best OCR (or it was not available at the time).
Even though it is sometimes pretty good, there is no guarantee of accuracy, and editors are expected to check. — Alien  3
3 3
09:51, 19 October 2024 (UTC)Reply
@Booklover09097: Which book are you talking about? There are a lot of PDF files exported from https://archive.org (IA), which had been optimized for extreme size reduction at the expense of their quality. But thankfully IA typically also has the original high quality images of the scanned book pages for download, and they can be used to create a better quality PDF or DjVu files. --Ssvb (talk) 13:10, 22 October 2024 (UTC)Reply
In my experience with IA and familiarity with OCR technology, five factors influence the quality. Image clarity, scanning method, scanning equipment, scanning software, and optics. Many early documents at IA were scanned manually. With automation, the technical particulars became available on the IA download page.
One additional note. Initial IA scanning equipment used one OCR software for English, and another for accented Latin languages. Since English academic documents reference other languages, look closely before applying Tesseract OCR. It is not always wanted. — ineuw (talk) 05:12, 24 October 2024 (UTC)Reply

I added Show Boat as a book to Wikisource

[edit]

Please help me format it. Thankfully it's in the public domain because it was published 1926. Blahhmosh (talk) 04:18, 21 October 2024 (UTC)Reply

I added a few others. Please proofread. Blahhmosh (talk) 04:35, 21 October 2024 (UTC)Reply
Also, is Amerika: The Missing Person translation in Public domain? I know the Original german version is, but the American one was published 1938 or something. Blahhmosh (talk) 04:50, 21 October 2024 (UTC)Reply
Also, how does "Hunting for Hidden Gold" work? How do you just record the original version as well as the other versions? Blahhmosh (talk) 04:52, 21 October 2024 (UTC)Reply
Note that Wikisource no longer accepts second-hand transcriptions, e.g. from Project Gutenberg. All new works must be proofread to a source text. See Wikisource:What_Wikisource_includes#Second-hand_transcriptions. For example, Show Boat is available to proofread here Index:Show boat - 1926.djvu and An American Tragedy here: Index:An American Tragedy Vol 1.pdf. MarkLSteadman (talk) 05:12, 21 October 2024 (UTC)Reply
I see. How do I submit new PDF files for transcription? Blahhmosh (talk) 07:12, 21 October 2024 (UTC)Reply
You can upload them to Wikimedia Commons, there and then follow the procedure described over there. — Alien  3
3 3
07:30, 21 October 2024 (UTC)Reply
So are digital versions of the books pdf banned? Blahhmosh (talk) 08:02, 21 October 2024 (UTC)Reply
If that's your question (I'm not sure I understood), PDF isn't banned, it's just that it works less well, and has many issues, so DjVu is more convenient. You can perfectly upload a PDF and work on it, it's your choice. — Alien  3
3 3
08:10, 21 October 2024 (UTC)Reply
No, I'm saying sometimes the PDF doesn't contain the actual images of the book and instead contains just plain text of the book (standard Times New Roman, Ariel, Corier, etc.) font that isn't the font used in the original book. Should we use those? Blahhmosh (talk) 13:46, 21 October 2024 (UTC)Reply
No, "source texts" here mean a scan of the physical edition of that book. — Alien  3
3 3
13:47, 21 October 2024 (UTC)Reply
What if it's the physical edition of that book but you can copy the text of the book? Blahhmosh (talk) 13:54, 21 October 2024 (UTC)Reply
Source text, means that it is proofread against a known published version, which is almost always a physical copy (or for recent government works, a PDF release). Although scans are strongly preferred, they are not strictly required per policy, but a clear record of what is the source. Also given the ubiquity of high-quality digital cameras, it shouldn't be too hard to image the pages (even if not able to process them) so that there is a record for someone in the future. MarkLSteadman (talk) 04:31, 22 October 2024 (UTC)Reply
Re taking a physical work --> typing it up --> releasing it as say a PDF on say Internet Archive --> retranscribing it at WS, that typically counts as secondhand / self-published. Several reasons why they are problematic as they effectively create a new "edition": 1. They can introduce issues with any omissions / decisions / additions causing divergences with the physical work, even if just in things like pagination 2. editor's copyright not being released / verified 3. source text uncertainty, are any divergences in the text caused by lack of record in exactly which is the source edition, which may itself introduce copyright concerns. 4. They typically end up being duplicative of a future proofread against the scans version of the text anyways (which then can be used for referencing page numbers, authority control, source text comparison etc.), and will be deleted. MarkLSteadman (talk) 05:02, 22 October 2024 (UTC)Reply

Tech News: 2024-43

[edit]

MediaWiki message delivery 20:52, 21 October 2024 (UTC)Reply

[edit]

Hello everyone, I previously wrote on the 27th September to advise that the Wikidata item sitelink will change places in the sidebar menu, moving from the General section into the In Other Projects section. The scheduled rollout date of 04.10.2024 was delayed due to a necessary request for Mobile/MinervaNeue skin. I am happy to inform that the global rollout can now proceed and will occur later today, 22.10.2024 at 15:00 UTC-2. Please let us know if you notice any problems or bugs after this change. There should be no need for null-edits or purging cache for the changes to occur. Kind regards, -Danny Benjafield (WMDE) 11:29, 22 October 2024 (UTC)Reply

I messed up a title

[edit]

When I made the page Index:The Last Post (1928), it should've been Index:The Last Post (1928).pdf. What do I do? Blahhmosh (talk) 18:33, 22 October 2024 (UTC)Reply

I've moved it for you. It's easiest if an admin does this because we can suppress the automatic redirect that would happen if you moved it. Beeswaxcandle (talk) 18:42, 22 October 2024 (UTC)Reply
I see. Also, based on the nature of the .pdf file, is it valid for Wikisource? Blahhmosh (talk) 18:45, 22 October 2024 (UTC)Reply
It is, again, a second-hand transcription, as it says on page 3: This ebook is the product of [...] Standard Ebooks, based on a transcription by Faded Page Canada, so no. What would be valid in this case, for example, would be to go get the original page scans from Google Books (mentioned still on page 3). — Alien  3
3 3
18:50, 22 October 2024 (UTC)Reply

Wikisource: We preserve publishers typos

[edit]

I think that any person, bot or other software reply mechanism here should never say the words "as published" until current policy is at least softened; as it is a lie.

Table of exceptions
We preserve publishers typos As published
exceptions: exceptions:
None
paragraph indentation
type family
images that start and end chapters
long s
margins

Feel free to add to either list.

Perhaps there are more. While it sounds (and reads as) so 'leet to say "as the publisher" it is simply not true and over the border which makes it a lie. A simple softening of the policy, so that the occasional editor cannot drop in, validate a page that has one image on it and then ravage the style sheet, would perhaps give you back that 'leet feeling you get when you utter that lie. Without the softening of policy on those point, it is simply a lie.--RaboKarbakian (talk) 20:21, 22 October 2024 (UTC)Reply

Also, I beg of you. Please find for me an English text from between 1650 and 1750 that is not using a serif type face!!--RaboKarbakian (talk) 20:23, 22 October 2024 (UTC)Reply
There's nothing like holding an actual book from the 17th century. However, that's quite different from holding it as published, which no one has done in centuries. Good color PDF scans can preserve some of the qualities of an old book, but miss out on a lot of others. It seems quite weird to say "type family" as opposed to "type face"; you think you can just replace Caslon or Baskerville with Times New Roman? Given that Caslon was the old-school and Baskerville was the new wave when they were competing, how does even replacing one with the other qualify as "as the publisher"?
We are not making digital facsimiles. If you want a digital facsimile, use the PDF. I don't see your second column of exceptions as basically changing anything as to the truthhood or falsehood of "as the publisher".--Prosfilaes (talk) 21:38, 22 October 2024 (UTC)Reply
You are completely correct about that, if we are talking about a text file -- which is the one format the exporter does not do! I am asking simply that the policy be changed to be more "in general" and not so "against". I am also not insisting that the policy be changed so that everything on that list has to be reflected. I would prefer the occasional editor to be a little less enabled. My style sheet just said "serif" because it is not a facimile.--RaboKarbakian (talk) 22:40, 22 October 2024 (UTC)Reply
What did I say about a text file? (Which, by the way, drops stuff that's integral to most works, like italics.) Serif/san-serif is an irrelevant distinction. As you say, a work published in 1750 is going to be in a serif typeface. But you're ignoring many of the other features a work published in 1750 would have, e.g. [16], like the font size, the very different looking fonts, the signatures and tail word. You've also ignoring choices of publishers that are distinct choices, like which serif font to use, in exchange for removing the default font of the reader.
I'm against making working on pages more complex; I've found that PGDP's total separation of proofing from formatting to simplify things a lot, and the more formatting we add is just going to make it worse. I'm also against making more per-project idiosyncrasies. I see preserving publisher typos as more making a standard, undisputable format for pages, and not terribly important in and of itself.--Prosfilaes (talk) 20:32, 25 October 2024 (UTC)Reply
(For images that start and end chapters, I always add them, and have trouble understanding why apparently no one does.)
Leaving ls apart as that's another debate, to me what you listed is perfectly compatible with the fact that we transcribe the work as published, not the physical book as published.
After type families, paragraph indentation, and margins, the same arguments would also lead us to replicate the relative height and width of letters, the width of the page, heck, even the color of the paper. At that point, it'd be much more reasonable to get a 600 or more DPI scan, feed it to the OCR, which respects layout as it places the text on the page, and at this resolution if we take the right engine it's going to be near-perfect.
Also, what would you mean by a softening of policy, which would make the occasional editor to be a little less enabled? You said above that you do not think policy should make those things mandatory, but then, do what else? — Alien  3
3 3
06:49, 23 October 2024 (UTC)Reply
(struck above after seeing recent messages) There is, indeed, a fairly long list of things we don't do, but so is the list of things we do, it's not only typos. A non-exhaustive list: images, page layout, TOC dot leaders, text styling in general ({{sc}}, {{sm}}, {{bl}}, ...), {{di}}s, {{***}}s, &c. — Alien  3
3 3
16:03, 23 October 2024 (UTC)Reply
love the passion about transcription; don't like the "as it is a lie." in any project, there will be compromises between verisimilitude and usability, and calling compromises lies is unhelpful. --Slowking4digitaleffie's ghost 13:21, 23 October 2024 (UTC)Reply
Slowking4, or should I say "font-family:UnifrakturMaguntia"? Personally, I miss the rants of Rama's revenge; and perhaps I am just filling in for the lack of those. That said, I was indenting paragraphs when in elementary school and I was not the only one doing that. The indents help for reading. They are a challenge in markup tho', no doubt. What I did try to say was that "As the publisher published it" is a lie, because it is really just "all of the publishers typos" and anything else might get an editor harassed because there is policy against it. So, I am suggesting that if we are to continue on as is, that we stop lying and simply proclaim "we preserve publishers typos and misspellings".
And I don't want to be misunderstood that new policy should insist upon that list; I want the option without the potential harassment. 'Tis a huge and capable layout engine; policy wants it to be used like a bulldozer to go the 20 feet to get the mail.--RaboKarbakian (talk) 15:52, 23 October 2024 (UTC)Reply
@RaboKarbakian: From the technical perspective, a configurable layout with switchable paragraph indentations (on/off) and switchable typos preservation (on/off) in the browser is very realistic and relatively easily doable. For example, see the Wikisource:Scriptorium/Archives/2024-08#Dynamic_Layouts_and_Template:SIC_/_Template:Errata_possible_interaction discussion topic. Currently we don't have these features in the browser because the consensus of the Wikisource contributors is firmly against having them. Exporting to EPUB/PDF is another part of the puzzle, because right now there's only one non-configurable way to do that as well. But this is again not set in stone and it's the community's desire to preserve the status quo that is the decisive factor. --Ssvb (talk) 11:21, 24 October 2024 (UTC)Reply
Ssvb: Too many images, tables, and formulas appear in the middle of paragraphs for indentation to be considered "easy" or "realistic" technically. You would still have to leave a mark where the "new paragraph" does not indent, and that puts the "technical doable" into the same problems that people have with this. Automatic paragraph indentation is confusing to see. Heck, sentences get interrupted for image and the like. I just cannot agree with the "easily doable" part.--RaboKarbakian (talk) 12:16, 24 October 2024 (UTC)Reply
(Also, not all paragraph starts are indented, see for example this.) — Alien  3
3 3
12:18, 24 October 2024 (UTC)Reply
This doesn't seem difficult to solve: just needs one template that marks up non-indented paragraphs; this could just add a css class that does nothing if the CSS displays it as paragraphs with gaps between, but prevents any indentation if the reader selects "original paragraph indentation mode". In fact, we already have {{No indent}} and {{Nodent}} for just these situations. Pretty sure there's also a template that prevents the gap between paragraphs for continuations of the same paragraph (e.g. that have been interrupted by poems, tables, etc.) – but I can't find it at the moment! --YodinT 16:26, 24 October 2024 (UTC)Reply
Putting every non-indented paragraph in a template would make for quite a lot of lot, wouldn't it?
Also, indentation is not always the same, depends on the period, publisher, etc, so we'd have to add something to the index styles too, which would make more stuff to do.
The most problematic part would probably be updating all we've done so far to make it compatible with the changes.— Alien  3
3 3
16:29, 24 October 2024 (UTC)Reply
Most books I've come across that have indented paragraphs only have a few exceptions to that rule as far as I've seen, so not a huge amount of work while proofreading to add {{ni}} in those cases. And I think the idea would be to make this opt-in, so in most cases (including all the books currently transcribed), they'd just display as they currently are. If an editor wants to give readers the options to view it with the original paragraph indentation (or other options, like long-s, original margins, etc. etc.), the editor could add those options to the Index CSS, and just add the {{ni}} exceptions as they were proofreading (again, not too much more work, and entirely their choice). Editors could choose to go back through the works they've already done, and add indentation options, etc. if they wanted, but again this would be completely optional, just allowing those who want to to do so, but no obligation for editors who aren't interested in this. And, as mentioned below, if editors added this option, readers could still choose whether to view it either as it currently is (i.e. modern paragraph spacing, no long-s – this could be the default option for logged-out users), or with something closer to the original typography (could even give more granular toggles, so font style as one option, page margins another, etc.). --YodinT 17:09, 24 October 2024 (UTC)Reply

A dissenting opinion here: I'm personally more interested in producing an accurate digital version of the text itself than this approach, but I think it's both technically feasible, and also not a terrible idea to allow editors to create essentially "vectorised facimilies" (i.e. the precise fonts and typography used, page margins, etc. etc.) – if this was provided for as a separate stylesheet for example (so /styles.css for the normal web edition, and /facimilie.css for this), it would be straightforward for a parser to let the reader choose which version they wanted to see (another option could be annotated versions; again all using the same Page:s). This would let editors produce whichever version they wanted (facimilie, text, or annotated), without having to revert/ban/tell editors that it has to be done in a certain way, and producing standardised texts that the majority of readers would find useful regardless of which approach the editor uses. --YodinT 14:16, 23 October 2024 (UTC)Reply

Yodin When I export my highly stylized works to epub, most of the style goes away, and it is usually a good experience to read these things there. Having the exporter export to text would allow picky readers to impose their own style to it, or not. Exporting to text would also (hold your horses here!!): preserve publishers typos, which is what we do here (by consensus). The 18th century Arabian Nights I have been working on--there is a late 19th century version that is so much more readable: so that to me, having the earlier one "modernized" and streamlined for reading is silly. Having it look the museum piece that it is kind of nice in a documentation sense.
A howto for setting your personal browser's style would settle most concerns, without the need for multiple style sheets.
Also, the long-s option. Really, people should be required to log in to turn them into s. That way, we get the email addys for getting the donations.--RaboKarbakian (talk) 15:52, 23 October 2024 (UTC)Reply
There's already a howto at Help:Layout. But such howto for setting your personal browser's style is beyond the abilities of the vast majority of the Wikisource users. Moreover, many of the existing wiki templates would benefit from becoming a bit more CSS-aware to enable such customization. --Ssvb (talk) 11:41, 24 October 2024 (UTC)Reply
Would be great to have something along the lines of French Wikisource, which has a tab at the top of the page, next to "Page | Source | Discussion" that allows readers to automatically switch between original spelling and modernised spelling (e.g. this page), and even a toggle to highlight the changes that have been made. In our case it could be things like original typography (long S etc.) instead; could even have an option to toggle between original typos and SIC corrected spellings. --YodinT 13:28, 24 October 2024 (UTC)Reply
Yodin At French wikisource, the "Source" just links to the Index page, and this wiki has the same link. "modern" is also dated, like tomorrow it will be different things that "modern" describes; so in some ways, modernization is an editorialization of the spelling and its punctuation and such of that time it was transcribed. I really really like "As it was published", which was probably thoroughly modern at its time.--RaboKarbakian (talk) 15:33, 24 October 2024 (UTC)Reply
Yep, the modernisation option is next to Source (the Index: link) and the talk page (Discussion) tab at the top of the page. It absolutely is editorialisation, but follows predictable rules (this isn't the same in English), and they update the "modernisation" algorithm when there's spelling reform. But the main thing is that it still completely preserves the original "as it was published" version of the text as well, and just allows an automatic option for people who want to read the texts using current spelling conventions. That's what I'd like to see here: an option for readers to easily choose whether they want to see the long-s, original fonts, etc. etc., and original as-is typos, or switch these off. Handling annotations the same way (rather than copy-pasting the text, and adding hyperlinks/footnotes to this copy, which will be extremely difficult to sync with the original if further proofreading/validation improves the quality of the original text) – it seems to me it would be much easier to use templates to markup annotations in Page: space, and switch them off by default – but that's another discussion! --YodinT 16:03, 24 October 2024 (UTC)Reply
Yodin I was wrong and I would strike my paragraph except that I enjoyed the rant about "modern". Also, that French module is very cool. If we use it here, maybe I might still be around for the "Post-modernization" module!! As it is for me here {{ls}} never displays long s; no matter the preference toggle, no matter the namespace; so I find myself being very firmly on the other side of "No options, this way" pasting the long s so that I can see it that way. I think that in the page namespace it always displayed the s, and that was also not helpful for editing. Also, once, I used one of the wikimedia fonts (via @font in the stylesheet) and since then, my browser displays the wrong font size, always; well, not at first (with a vanilla configuration) but at second; just like something is grabbing it and using its configuration instead of mine. I think these and (many, many) other problems are all related, but the long s one did me in. Another thing, I really hate using those words "I was wrong" just so you know.--RaboKarbakian (talk) 19:51, 24 October 2024 (UTC)Reply
Generally I support the as-it-was-published attitude, but I am sceptical we will be able to reach an agreement or change the en.ws approach towards all the mentioned subtopics within one discussion. Maybe we should discuss individual problems like indentation, long s, fonts, etc. one by one. BTW: I do miss paragraph indentation here very much, and do not like the modern inter-paragraph spacing that replaces it at all. --Jan Kameníček (talk) 15:50, 24 October 2024 (UTC)Reply
Jan Kameníček: For group projects, especially those that beginners have been directed to start with, the simpler the better. Individual projects or those having just a few contributors should not have to suffer policy intended (heh, I typoed "indented" first here) for beginners. Another thing, How and where to discuss things where capable and interested hackers might be that can enable things. Poor CalendulaAsteraceae will be coding until the post modern module is needed and maybe still won't be done with everything that is wanted. Also, some of the best coders I know have little interest in policy discussions and might even run from anything using the word "consensus". Phab tickets seem to sit there; although it might just be the tickets I look at. I'ma gonna call what we have now .--RaboKarbakian (talk) 19:51, 24 October 2024 (UTC)Reply

Where to start and end a book's file

[edit]

I've noticed that traditionally, files of books are assembled from cover to cover. I'm considering deviating from this tradition by creating a djvu file of this hathitrust book that starts from the first page that contains transcribable material and ends at the last page containing such. So I wonder if there is any purpose to having empty pages at the start and end besides to act as a placeholder. Is there a need to preserve a book's integrity that necessitates having the entire book? Prospectprospekt (talk) 23:43, 22 October 2024 (UTC)Reply

Having all pages from cover to cover shows that nothing was omitted. If you start chopping off pages, then how can the readers be sure that you haven't arbitrarily decided to delete something important? Such as the toc or preface or errata notes or anything else. Some books include advertisement pages of questionable value, but if you remove them, then this would look suspicious. --Ssvb (talk) 05:54, 23 October 2024 (UTC)Reply
To me, the empty pages in themselves don't have an important interest (though I'd leave them anyway for integrity), but it's much easier to check if something else has been removed in violation of WS:NPOV if they have been left there. If the empty pages have been left, verification is as simple as the number of pages, but if we remove them, it gets much more complicated, and it's not even just substracting the number of empty pages at the beginning and end, as some may also remove the back of plates, so in any way you have to check all the pages. — Alien  3
3 3
06:59, 23 October 2024 (UTC)Reply
Agree, empty pages should not be removed from the scans of the book. Besides the reasons above I will add some more: 1) the scans uploaded to Commons do not serve only Wikisource, but to anybody, and I can imagine that somebody might like to create an exact facsimile of the original publication including the cover etc., and so they would miss the omitted pages then. 2) Although we do not transcribe e.g. the library tags attached to books, for somebody it might be useful to know in which library this particular specimen was stored, so we should not cut it off from the scan. --Jan Kameníček (talk) 09:20, 23 October 2024 (UTC)Reply
  • Prospectprospekt: That file is somewhat unusual. The actual, printed item is /3 to /52; /1, /2, /53, and /54 are all a cover which was added to the pamphlet by the library which owns the item. In this case, properly, those four pages should be excluded; but I generally do not exclude them because I do not think it is necessary to do so. Jan Kameníček, does it change your opinion to know that the covers (of this work) were not original to the publication? TE(æ)A,ea. (talk) 17:43, 23 October 2024 (UTC)Reply
    Well, there still remains my "library argument", which, I admit, is not too strong, so although I personally would not remove these pages, I would not object too much if somebody else would. --Jan Kameníček (talk) 17:50, 23 October 2024 (UTC)Reply

Manual news article aggregation (manual indexing) versus automatic news article aggregation (automatic indexing)

[edit]

See: Jersey Journal (manually curated, always missing entries) versus The Washington Post (newspaper) (automatically curated, always complete) to see the difference. I have identified at least 6 different ways that news articles are manually aggregated in different formats from calendars to various table formats to lists by year. Is there a hard rule that prevents us from having both manual and automatic curation. The best analogy would be Commons which has Commons:Category:Abraham Lincoln for automatic aggregation and Commons:Abraham Lincoln for manual aggregation. I don't see why we cannot have both methods to satisfy both needs. We could have Portal:Jersey Journal or Periodical:Jersey Journal for automatic indexing and a link to Jersey Journal, just like is done at Commons; or, we could have The Jersey Journal (manual) versus Jersey Journal (automatic) and a link between the two. A third option would be a hybrid where both appear on the same page like here: New York Tribune. RAN (talk) 17:12, 23 October 2024 (UTC)Reply

Page access request

[edit]

Hello, I have a small request. I've been addressing some specific priority syntax errors here on Wikisource, and have dropped two error types down to near zero. The Tidy Font Bug (78 remain), and Misnested tags (42 remain). 77 and 41 of these are on Full protected pages, and I wondered if I could have access to these Tidy font and these misnested pages for a brief time to address these issues. I have 2 years of experience on Wikipedia with handling these (and other) tracked syntax errors in an respectful and knowledgeable manner, and currently have a temporary adminship (Sept-Dec) on Wikivoyage, where I addressed 99.99% of their 30k syntax errors in 5k edits (Aug-Sept). I've asked Xover and Encyclopety on their talk pages about the possibility of my accessing these few pages, but neither have been very active here since my messages, and have not replied, so I figured the next step was to ask here since it had been a few days. I am happy to discuss or answer any questions admin may have. Thanks, and hope you have a great weekend. Zinnober9 (talk) 19:54, 25 October 2024 (UTC)Reply

Crossposted to WS:AN since no reply here after a week, and only an admin could grant this request. Zinnober9 (talk) 05:41, 3 November 2024 (UTC)Reply

Qid

[edit]

Is there any place we can add a Wikidata qid so a bot adds the portal or the news article to the corresponding Wikidata entry? RAN (talk) 03:53, 26 October 2024 (UTC)Reply

Wouldn't that be adding a sitelink to the wikisource page to the item? — Alien  3
3 3
08:15, 26 October 2024 (UTC)Reply
  • Yes, we could have: {{author | firstname = Tirey Lafayette | lastname = Ford | last_initial = Fo | '''wikidata = Q7809200''' | description = American lawyer and politician; California Attorney General, District Attorney from California }}

That way a bot at the Wikidata end could add the Wikisource link automatically, and they would be paired at both projects.

What I'm saying is that is already done, you just have to add it at the other end, at WD. When an item has for example a sitelink to one of our authors, {{author}} picks it up, takes the data, and puts a link to the item.— Alien  3
3 3
14:39, 26 October 2024 (UTC)Reply

I'm kind of new to wikipedia and need some help

[edit]

I would like to know how to better edit this code and what exactly it is for:

{{header
| title = {{subst:}}
| author =
| section =
| previous =
| next =
| year =
| notes =
}}

WikiEducationalVol (talk) 03:19, 27 October 2024 (UTC)Reply

Hi @WikiEducationalVol: It seems that you might be confusing us for Wikipedia. We are actually not Wikipedia—we are their sister site Wikisource.
You recently submitted an article about "Adolescence", which I deleted because it is not within our project scope. We host a collection of transcriptions of already-existing texts, mostly old books and government documents. But your article about the definition of adolescence might be better suited for Wikipedia itself or maybe even Wikibooks. Try those communities next. SnowyCinema (talk) 04:45, 27 October 2024 (UTC)Reply
to answer your question, this is the header template for completed transcribed works, for example Beethoven (Rolland). we have a works namespace rather than article namespace. it is backed by a side by side transcription stitched together at an index page, for example Index:Rolland_-_Beethoven,_tr._Hull,_1927.pdf. --Slowking4digitaleffie's ghost 00:09, 28 October 2024 (UTC)Reply
@WikiEducationalVol: If you're used to the Wikipedia way of thinking, try thinking of this template as a sort of little infobox for each book or article.
  • The title parameter is for the full title of the work as originally published. (This might be different from the title of the page in some instances!)
  • The author parameter is for the author of the work.
  • The year parameter is for the year the work was originally published.
  • The section parameter is for the title of the chapter (if you are working on a chapter subpage; don't fill this out on the main page).
  • The previous parameter holds a wikilink to the previous chapter, so you can go back to the previous chapter if you want.
  • Similarly, the next parameter holds a wikilink to the next chapter, so you can skip ahead.
  • Lastly, the notes parameter holds any other info you might want the reader to know. This might be a brief summary of the work, or a comment describing how the formatting of this version differs from that of the original.
Duckmather (talk) 21:06, 29 October 2024 (UTC)Reply

Scans are now migrated to the talk page

[edit]

Scans are now migrated to the talk page, when did that start? See: Talk:The_Indianapolis_News/1937/4_American_Pilots_Quit_Spanish_War_as_Loyalists_Fail_to_Pay
Also, is every news article here at Wikisource supposed to get an entry at Wikidata? RAN (talk) 20:24, 28 October 2024 (UTC)Reply

@Jan.Kamenicek: Any comments? SnowyCinema (talk) 01:51, 29 October 2024 (UTC)Reply
  • If you do not want the scan to appear on the text page anymore, the best thing would be to create a Wikidata entry for each news article and the image will appear there and we will link to the text here from Wikidata, the link will then appear in the upper right corner. See for instance: Wikidata:Q86172138 --RAN (talk) 05:02, 29 October 2024 (UTC)Reply
    Main namespace is supposed to contain the transcribed text, sometimes accompanied by original illustrations of the text. The best place for scans is the page namespace. It is redundant to have both the transcribed text and the scan in the mainspace page. It is not being done with other works and there is no reason why it should be done with news articles. Such practice is not supported by any of our rules or help pages. For example Help:Digitising texts and images for Wikisource#Images and illustrations contradicts this approach by stating that images should be extracted from the work and uploaded as separate files, not like here]. Proper work with scans is described at Help:Proofread, work with .jpg scans is described at Help:Index pages#Using individual image files. I have moved some thumbs of images of a few of such scans to the talk pages so that they are not lost if anybody wanted to use them for proper scanbacking.
    As for Wikidata entries, they are not required but their creation is certainly supported. --Jan Kameníček (talk) 11:23, 29 October 2024 (UTC)Reply
    BTW: One more thing should be said, and that is general appreciation for the work with transcribing interesting and useful news articles. --Jan Kameníček (talk) 11:30, 29 October 2024 (UTC)Reply

Tech News: 2024-44

[edit]

MediaWiki message delivery 20:56, 28 October 2024 (UTC)Reply

Final Reminder: Join us in Making Wiki Loves Ramadan Success

[edit]

Dear all,

We’re thrilled to announce the Wiki Loves Ramadan event, a global initiative to celebrate Ramadan by enhancing Wikipedia and its sister projects with valuable content related to this special time of year. As we organize this event globally, we need your valuable input to make it a memorable experience for the community.

Last Call to Participate in Our Survey: To ensure that Wiki Loves Ramadan is inclusive and impactful, we kindly request you to complete our community engagement survey. Your feedback will shape the event’s focus and guide our organizing strategies to better meet community needs.

Please take a few minutes to share your thoughts. Your input will truly make a difference!

Volunteer Opportunity: Join the Wiki Loves Ramadan Team! We’re seeking dedicated volunteers for key team roles essential to the success of this initiative. If you’re interested in volunteer roles, we invite you to apply.

  • Application Link: Apply Here
  • Application Deadline: October 31, 2024

Explore Open Positions: For a detailed list of roles and their responsibilities, please refer to the position descriptions here: Position Descriptions

Thank you for being part of this journey. We look forward to working together to make Wiki Loves Ramadan a success!


Warm regards,
The Wiki Loves Ramadan Organizing Team 05:11, 29 October 2024 (UTC)

Commision/Commission bug

[edit]

If you look at Van Cise exhibits to the Commission on Industrial Relations regarding Colorado coal miner's strike and click the "Source" button, you will get to Index:Van Cise exhibits to the Commission on Industrial Relations regarding Colorado coal miner's strike.djvu, which doesn't exist. The actual index is at Index:Van Cise exhibits to the Commision on Industrial Relations regarding Colorado coal miner's strike.djvu (note the missing "s" in "Commision" [sic]). There's a similar problem on some of the pages. For example, clicking the up arrow on Page:Van Cise exhibits to the Commision on Industrial Relations regarding Colorado coal miner's strike.djvu/1 also leads you to the nonexistent "Commission" index page.

I see two ways out of this:

  • Move the file, the index page, and all individual pages to use the "Commission" spelling, and make sure that no pages are still using the old "Commision" spelling.
  • Use the "Commision" spelling, and somehow fix all the redlinks (maybe they're due to ProofreadPage going wonky somehow?).

Duckmather (talk) 21:01, 29 October 2024 (UTC)Reply

The problem was caused in Commons about 2 years ago when User:Armbrust moved the file to the new name without taking care of our index page. I have moved the index and all the individual pages to the new title so now it should be fixed. --Jan Kameníček (talk) 22:12, 29 October 2024 (UTC)Reply
@Jan Kameníček: Thanks! Now I can get back to validating it. Duckmather (talk) 02:27, 30 October 2024 (UTC)Reply

Android app for Wikisource

[edit]

Hi, is there an Android app for Wikisource? How does it work? I have been advised that there is no infrastructure for push notifications for Android apps for sister wikis and I would be interested to know more. Related: phab:T378545. Thanks! Gryllida (talk) 23:14, 29 October 2024 (UTC)Reply

ſ to {{ls}}

[edit]
  • purpose: I want to use a bot to replace the ſ with {{ls}}
  • scope: Arabian Nights Entertainments (1706)
  • programming language or tools: I've never done a wiki bot before so I'm not sure yet, I'm open to ideas.
  • degree of human interaction involved: semi-automated I think?

Eievie (talk) 05:57, 31 October 2024 (UTC)Reply

Help with handwritten letter

[edit]

I'm working on Index:T. C. E. Laugesen to Carl Laugesen, am mostly done but would appreciate someone validating my work and helping decipher a word on page 3 I couldn't work out. —CalendulaAsteraceae (talkcontribs) 09:04, 31 October 2024 (UTC)Reply

Zyephyrus

[edit]

Unfortunately, bad news arrived: Zyephyrus, our long-term contributor and admin, passed away last September 8th. -- Jan Kameníček (talk) 16:57, 2 November 2024 (UTC)Reply

More in French Wikisource. --Jan Kameníček (talk) 19:04, 2 November 2024 (UTC)Reply
Very sorry to hear this. I remember them being kind and encouraging when I joined Wikisource (as well as having a wonderful username!). Rest in peace Zyephyrus. --YodinT 13:26, 3 November 2024 (UTC)Reply

Portals in headers

[edit]

The portals were traditionally listed in the portal parameter and divided by slashes. Now CalendulaAsteraceae started replacing this with individual portal1, portal2... parameters, see e. g. here, and plans to stop splitting portals at the slashes in the long run completely. As this is going to influence a really large number of pages, I think it should be discussed first, and so I am posting it here. Jan Kameníček (talk) 12:01, 3 November 2024 (UTC)Reply

Very long run; I don't actually want to take that project on anytime soon because (as you mention) it would be a lot of work. —CalendulaAsteraceae (talkcontribs) 12:04, 3 November 2024 (UTC)Reply
Well, you have already taken some steps, and the discussion should have preceded them.
As for the replacement itself, in my opinion it is not only unnecessary, but also unnecessarily more complicated for contributors. Slashes work well and are easy and quick to write. --Jan Kameníček (talk) 12:07, 3 November 2024 (UTC)Reply
A possible advantage of this would be to allow for the about a thousand portals that include slashes in their names to be used, though I don't know if that's a major loss. (After all, at this point there are probably more pages that use / to include multiple portals than portals that are incompatible with that.) In case anyone else is interested by the technical side of it, it's with this edit at module:plain sister.Alien  3
3 3
14:08, 5 November 2024 (UTC)Reply
True. It is definitely not necessary to deprecate it, it can stay optional, but should not replace the older way. And unless there is a reason in specific cases, like this one, it should not be being replaced massively by a bot. --Jan Kameníček (talk) 15:38, 5 November 2024 (UTC)Reply

Tech News: 2024-45

[edit]

MediaWiki message delivery 20:50, 4 November 2024 (UTC)Reply

Switching to the Vector 2022 skin: the final date

[edit]
A two minute-long video about Vector 2022

Hello everyone, I'm reaching out on behalf of the Wikimedia Foundation Web team responsible for the MediaWiki skins. I'd like to revisit the topic of making Vector 2022 the default here on English Wikisource. I did post a message about this in March, but we didn't finalize it back then.

What happened in the meantime? We built dark mode and different options for font sizes, and made Vector 2022 the default on most wikis, including all other Wikisources. With the not-so-new V22 skin being the default, existing and coming features, like dark mode and temporary accounts respectively, will become available for logged-out users here.

If you're curious about the details on why we need to deploy the skin soon, here's more information
  • Due to releases of new features only available in the Vector 2022 skin, our technical ability to support both skins as the default is coming to an end. Keeping more than one skin as the default across different wikis indefinitely is impossible. This is about the architecture of our skins. As the Foundation or the movement in general, we don't have the capability to develop and maintain software working with different skins as default. This means that the longer we keep multiple skins as the default, the higher the likelihood of bugs, regressions, and other things breaking that we do not have the resources to support or fix.  
  • Vector 2022 has been the default on almost all wikis for more than a year. In this time, the skin was proven to provide improvements to readers while also evolving. After we built and deployed on most wikis, we added new features, such as the Appearance menu with the dark mode functionality. We will keep working on this skin, and deployment doesn't mean that existing issues will not be addressed. For example, as part of our work on the Accessibility for Reading project, we built out dark mode, changed the width of the main page back to full (T357706), and solved issues of wide tables overlapping the right-column menus (T330527).
  • Vector legacy's code is not compatible with some of the existing, coming, or future software. Keeping this skin as the default would exclude most users from these improvements. Important examples of features not supported by Vector legacy are: the enriched table of contents on talk pages, dark mode, and also temporary account holder experience which, due to legal reasons, we will have to enable. In other words, the only skin supporting features for temporary account holders (like banners informing "hey, you're using a temp account") is Vector 2022. If you are curious about temporary accounts, read our latest blog post.

So, we will deploy Vector 2022 here in three weeks, in the week of November 25. If you think there are any remaining significant technical issues, let us know. We will talk and may make some changes, most likely after the deployment. Thank you! SGrabarczuk (WMF) (talk) 15:46, 6 November 2024 (UTC)Reply

To any admins passing by: Could someone take a look at MediaWiki talk:Gadget-Preload Page Images.js? (with V10, since the last codex change, the green border's broken so the arrow shifts down but it's still a noticeable change, whereas in V22 it will be plain undistinguishable, so it'd be nice to fix it.)
@SGrabarczuk (WMF): Why would dark mode and temporary accounts need V22? I already use dark mode on V10, and if we have a banner for IPs editing I don't see why we couldn't have a banner for temp editing.
I can only think of one significant technical issue, and that is paragraph spacing, also mentioned in March without an answer.
On one hand, why? what is the supposed advantage of spacing paragraphs further from each other?
On the other hand, here at ws we often need to make text fit into fixed boxes, and making the height of text that different across skins is a bad idea. Out of my hat, the most common issue I can think of is {{overfloat image}}s that make some kind of border around multiline text that does not already override paragraph spacing, e.g. Page:Salomé- a tragedy in one act.djvu/7, Page:Poems Tree.djvu/9, Page:Poems Jackson.djvu/7, &c. — Alien  3
3 3
18:49, 6 November 2024 (UTC)Reply
  • As someone who has seen Vector 2022 in action, I don’t know how you can say this. The use of Vector 2022 is not possible here; it makes Wikipedia much worse at it is, and at Wikisource it is completely untenable. There is no reason to make potential contributors make an account and change their setting configurations to be able to edit here without great difficulty. We have a lot of highly specialized formatting here, and if recent “fixes” are anything to go by, whoever makes technical changes thinks of Wikisource last in making them. Our site was rendered practically unusable because of an “accessibility” change recently, and it took days to get that patched—and it was only partially patched, at that. You mention “new features” for your shiny new toy, but I’m not sure why they’re necessary (or even not harmful here on Wikisource); the big push towards “dark mode” mirrors the tech industry’s general push towards AI, in that it is being done without consideration of the actual userbase (who, of course, has no need for such a feature). Your list of “[i]mportant … features” showcases the lack of connection to our community (despite your evident desire to force this unwanted and harmful change upon us): tables of contents are usually produced manually here, with templates; dark mode is a fad, and in any case would clash with any of the many texts here with images; and “temporary accounts” are a terrible idea that I can’t even imagine a justification for. I’ve only heard of them now, but I do remember the suggestion from a few years back; this change will make vandalism significantly worse without any demonstrable benefits whatsoever. Luckily, we don’t have much vandalism here, (and we have good administrators to deal with it,) but it seems (to me, at least) obvious that changes should not be made which will encourage and facilitate vandalism while making the prevention of vandalism harder (and in many cases fruitless). Of course, you’ve saved the best for last: changes will happen “most likely after the deployment.” You people, who do no good to Wikisource, Wikipedia, or any other project that actually drives traffic (beyond the moral good of writing articles, transcribing texts, &c.) see fit to make changes—without our consent—to the detriment of our work, and when problems inevitably arise force their solutions on the people you so ungraciously “helped” in the first place. I shouldn’t have bothered writing this, but your attitude in “suggesting” this change was enough to encourage me to write this quick statement down. TE(æ)A,ea. (talk) 22:51, 6 November 2024 (UTC)Reply
  • Just to be clear SGrabarczuk: If you think there are any remaining significant technical issues, let us know. We will talk and may make some changes, most likely after the deployment. – are you saying that you're planning to deploy to a live production website with over half a million views per day, without having addressed any of the issues that prevented you from deploying in April, without carrying out any user testing, and with plans only to possibly fix any breaking changes after carrying this out? What on earth is your deployment process (please link if you have one)? And what is the WMF policy about pushing changes on some communities that have serious unaddressed concerns, but not others (such as de.Wikipedia) – again, please link this. Very concerned that you're rushing this through without realising that it will greatly impact the website. --YodinT 11:25, 7 November 2024 (UTC)Reply

Translations

[edit]

After I do a few translations am I supposed to create an author page for myself and list the entries I translated? RAN (talk) 01:11, 7 November 2024 (UTC)Reply

As far as I know, they should just be marked as translated by Wikisource.
I think you should use {{translation header}}, that does this automatically. — Alien  3
3 3
06:06, 7 November 2024 (UTC)Reply
Exactly. Wikisource translations are created in a similar way as Wikipedia articles, anyone can later edit them and change/improve the translation, so the translations are marked just as translated "by Wikisource". BTW: Before starting such translations, take a close look at WS:T#Wikisource original translations, especially the part stating that "A scan supported original language work must be present on the appropriate language wiki, where the original language version is complete at least as far as the English translation." --Jan Kameníček (talk) 17:27, 7 November 2024 (UTC)Reply

IA Upload Status?

[edit]

Since internet archive has come back this seems to not be working, with no recent uploads and it apparently not able to find the metadata from Internet Archive, even though it seems to be available (e.g.[23] is returns a JSON response). MarkLSteadman (talk) 14:39, 7 November 2024 (UTC)Reply