From Wikisource
Jump to navigation Jump to search
Warning Please do not post any new comments on this page.
This is a discussion archive first created on 01 January 2021, although the comments contained were likely posted before and after this date.
See current discussion or the archives index.

Putting a letter "inside" another letter[edit]

I'm in a situation where I (guess?) I have a logo that says TC, but in that original logo, the T is larger and the C is inside it. See Page:A Dog's Love (1914).webm/1. Is there some trick to emulate this on Wikisource, or do we just not do it at all? (Not that it's incredibly important to the transcription...I mean it's just the letters "TC" that could be found anywhere.) PseudoSkull (talk) 17:23, 1 January 2021 (UTC)[]

Logos are done as images rather than textually re-created. Just rip the image from the scan. Beeswaxcandle (talk) 17:32, 1 January 2021 (UTC)[]

Page doesn't display in read mode[edit]

After my last edit THIS PAGE no longer displays the read view. Already purged it and was curious what causes this and how can I deal with it myself when I come across such issues in the future.— Ineuw (talk) 18:43, 1 January 2021 (UTC)[]

The problem is in {{sc|{{nop}} at the top. What do you want to achieve by that? --Jan Kameníček (talk) 19:04, 1 January 2021 (UTC)[]
@Ineuw: Use {{nopt}} instead of {{nop}} for tables. And you'll often run into trouble if you to use {{sc}} on whole blocks of stuff. Since you have a {{ts}} on every table cell there, I suggest you move the small-caps formatting there instead ({{ts|sc}}). --Xover (talk) 19:19, 1 January 2021 (UTC)[]
I just did not see any small caps in the original, so that is why I asked about the purpose. But now I can see that Ineuw removed the template and the page works. --Jan Kameníček (talk) 19:44, 1 January 2021 (UTC)[]
I feel terrible, so let me start with wishing you and everyone else a healthy and happy new year. This was a Logitech keyboard and mouse error and Windows 10, with which I am dealing with currently. It has nothing to do with Wikisource, or the Firefox browser which I originally suspected. Logitech's problems also affected my main editing tool AutoHotkey, which triggered the macro generating {{sc}} and I didn't notice it. Thanks to everyone for the help.— Ineuw (talk) 19:55, 1 January 2021 (UTC)[]
@Ineuw: No problem, funny things sometimes happen to all of us :-) Happy new year to you too! --Jan Kameníček (talk) 20:11, 1 January 2021 (UTC)[]

Source file with Corrigendum published later[edit]

I'm considering to import the Singapore Arms and Flag and National Anthem Rules 2003, in which some images are later omitted by a corrigendum. So how should I deal with the corrigendum (i.e. follow the original version or the version amended by the corrigendum)? Many thanks.廣九直通車 (talk) 09:20, 4 January 2021 (UTC)[]

@廣九直通車: I think create the two documents (Rules 2003 and Corrigendum) as two separate documents, following the source documents, and make a note in the header of each about the presence of the other. Creating versions of laws at every stage of amendment often turns into a mess, so I'd steer clear of synthesising an "as-amended" version. Inductiveloadtalk/contribs 10:49, 4 January 2021 (UTC)[]

Tag: Bad markup[edit]

I've just spotted that a recent edit I made here has been tagged with bad markup. I'm not sure what I have done wrong, but assume it's to do with my adding two images. Can anyone advise what problem I have caused? I may have created more of these before without noticing them, sorry for that. Sp1nd01 (talk) 23:25, 9 January 2021 (UTC)[]

I'm not seeing any warning message, but I do see you've introduced a paragraph break that isn't present in the original. --EncycloPetey (talk) 00:21, 10 January 2021 (UTC)[]
@Sp1nd01: This is because you have used the align="right" HTML attribute in the table. The use of the align attribute is deprecated the the HTML standards, and the better modern markup is to use something like CSS: style="float:right;" (the equivalent for align=center is margin:auto;). It's unlikely to cause major issues on export, because the export tool actually removes them and replaces with CSS. The original motivation was tables were coming out as text-align:center; for align=center, but it turned out to be an issue with the "fix" in the export tool, rather than programs not handling align. There's a patch on the way. phab:T270807.
Also, use of the tags <big>, <small>, <center>, <tt>, <font> set off the filter, as these tags are also deprecated. These will also trigger a linter error). Deprecated attributes are not (yet) a linter error; phab:T173944 is a task to add them as such.
I will write some documentation for this, as it is indeed not clearly stated.
This markup will not cause major harm in the short (or mid) term, but over the longer term, we will need to clean it up one day as phab:T26529 progresses. Inductiveloadtalk/contribs 00:40, 10 January 2021 (UTC)[]
I tried using {{img float}} with the signature in the cap parameter. It seems to work well, but if you disagree, feel free to revert. --Jan Kameníček (talk) 00:59, 10 January 2021 (UTC)[]
Thanks for the detailed explanation, its useful to know the reason. I wasn't aware of these html details, and to be honest I usually just try and copy what style (if any) had been previously used on a book for consistency, so I'm sure there are going to be other pages with the same problems present in that volume. I do like the img float solution, and I didn't know that an image could be added for the caption, that's nice to know. If it's ok I will use that approach for any future images I add, and if I can track any other examples will change them also. Sp1nd01 (talk) 10:52, 10 January 2021 (UTC)[]

Fostered content...[edit]

Page:An Index of Prohibited Books (1840).djvu/73

I fail to see where the 'fostered' content is coming from. Suggestions? ShakespeareFan00 (talk) 18:16, 11 January 2021 (UTC)[]

@ShakespeareFan00: Why are you using a table for this anyway? This seems to work OK:


* {{di|T}}he quick brown fox
* Jumped over the
* Lazy dog
  • The quick brown fox
  • Jumped over the
  • Lazy dog
Wrong answer Not sure about the fostered content yet.
Looks like your fostered content was because you had an extra table header in the page header field (but it was scrolled off the bottom): Special:Diff/10829638. Inductiveloadtalk/contribs 18:34, 11 January 2021 (UTC)[]

Too many short chapters[edit]

This volume has 75 Chapters with many adjacent Chapters are one page long. Is it permitted to name the main namespace page as Chapters 36 to 39. I am using anchors to access these embedded chapters from the TOC and elsewhere.— Ineuw (talk) 05:05, 17 January 2021 (UTC)[]

I would keep them as a subpage per chapter. By doing so, you're keeping better to the author's/editor's intent. I'm dealing with a similar situation on Anecdotes of Great Musicians, which has 300 short items, some only a couple of paragraphs long. Beeswaxcandle (talk) 06:32, 17 January 2021 (UTC)[]
That is what I am planning but the main ns page name should also indicate the chapter numbers subbed beneath it. These short chapters are 1st level TOC entries.— Ineuw (talk) 11:48, 17 January 2021 (UTC)[]

Uploading of an original translation[edit]

Dear Sir/Madam,

I am a newcomer to Wikisource and need some guidance on how to go about submitting a text. I recently contributed to the transcription of a renaissance Italian text from the original manuscript on Wikisource of "Peregrinaggio di tre giovani figliuoli del re di Serendippo.djvu/1" by Christoforo Armeno (1557) with the purpose of creating a source text for translations. The translation in English was recently completed and I would like to upload it to Wikisource. For your information, translations in various European languages have been created since the original was published in 1557, but they vary a great deal in adherence to the original, as for them being a literary translation or even the adherence to the original text. The English translation I would like to upload is close to the original Italian version and is original, so not subject to copyright restrictions. I intend to make it available to Wikisource under a Creative Commons Attribution-ShareAlike 3.0 Unported License (“CC BY-SA”), and GNU Free Documentation License (“GFDL”) (unversioned, with no invariant sections, front-cover texts, or back-cover texts).

Since I have never uploaded a text to Wikisource, I am looking for guidance as to the various steps to take to complete a successful upload.

Your guidance is appreciated;

Regards,MvRwiki1944 (talk) 00:33, 19 January 2021 (UTC)[]

Mark van Roode Wikisource username: MvRwiki1944

@MvRwiki1944: Welcome to Wikisource. It sounds like your translation will be a great fit and a welcome addition. The steps are:
There are some (sparse) details at Wikisource:Translations. Let me know if there's anything you are not clear on. Inductiveloadtalk/contribs 11:50, 19 January 2021 (UTC)[]


Please can someone advise on WikiSource's policy regarding the variation of English spelling used? I was reading The Bronze Ring, written by a Scottish author, and noticed that the text uses American spelling. Is this something that should be changed? Regards, DesertPipeline (talk) 11:22, 19 January 2021 (UTC)[]

@DesertPipeline: We use the text as it was published in the edition being reproduced. I will note that we have a variety of UK editions and US editions, especially noting that may of the scanned works come out of US libraries and can therefore have a bias to US editions for main works. If you don't know which was used in the edition being reproduced, then do nothing. — billinghurst sDrewth 11:39, 19 January 2021 (UTC)[]
It depends on the edition the text comes from. In this case, the text comes from Project Gutenberg, so the source document is unknown and they may have made editorial changes. Ideally, we'd be looking to transition this to a known edition with scans to prove it. For example Index:Lange - The Blue Fairy Book.djvu, which was published in London and New York and has UK spellings, e.g. "colours". Inductiveloadtalk/contribs 11:42, 19 January 2021 (UTC)[]
Thank you both for the responses; I'll leave it alone then and let someone with more experience in these matters handle it :) DesertPipeline (talk) 12:03, 19 January 2021 (UTC)[]
@DesertPipeline: you're more that welcome to proofread the work against the scan if you'd like to help, and I can assist with queries you may have. "Letting someone with more experience in these matters handle it" effectively means that probably no-one will touch it for years and years unless it becomes a WS:POTM or someone else really wants to move it over, because our backlog of unsourced or PG copy-dumps is...large. Inductiveloadtalk/contribs 12:15, 19 January 2021 (UTC)[]
Is the scan available somewhere for me to view? I can possibly do that tomorrow if so. DesertPipeline (talk) 16:48, 19 January 2021 (UTC)[]
@DesertPipeline: Indeed it is! Index:Lange - The Blue Fairy Book.djvu, along with some of the tricksier formatting done already. The Bronze Ring starts at Page:Lange - The Blue Fairy Book.djvu/29. Inductiveloadtalk/contribs 16:52, 19 January 2021 (UTC)[]
Hmm, the second link is red for some reason. Also I notice that link was sent earlier – not sure how I missed it :) I'm not sure what the filetype .djvu is though. Something like pdf? DesertPipeline (talk) 16:58, 19 January 2021 (UTC)[]
Strange, it's blue now... did you create the page while I wasn't looking or was that just a bug? :) DesertPipeline (talk) 16:59, 19 January 2021 (UTC)[]
@DesertPipeline: I just created that page for you as a demo. DjVu is quite like PDF, it's a container of images and text in a single file. It has a few benefits over PDF for the purposes of scanned documents, but the difference isn't really important for now. You don't really need to worry about it—the point is that there is a Page:XXX page for each page of the book. Inductiveloadtalk/contribs 17:02, 19 January 2021 (UTC)[]
Very well, thank you :) I'll write this down on a note on my desktop so I don't forget to do it tomorrow. See you around, and thanks again! DesertPipeline (talk) 17:07, 19 January 2021 (UTC)[]
Good luck and remember you can leave me a message here (use {{ping|Inductiveload}} or on my talk page if you get stuck. Some basic instructions are at H:Formatting conventions, but some things aren't covered clearly (yet). Inductiveloadtalk/contribs 17:09, 19 January 2021 (UTC)[]

Moving/copying a source from the wrong wiki[edit]

I've recently found a source that has (it seems to me) been uploaded to the wrong place. Here is the URL:,_of_Aberdeen,_on_Thursday,_April_1,_1813.pdf

It's a short pamphlet (in English) that has been loaded to the main wikisource domain/namespace/whatever instead of English wikisource. Is it possible to move it across to English wikisource, retaining its edit history and leaving a redirect in the old place? Is it actually necessary to do that, and is it perfectly OK where it is?

My intention is to transclude these pages into a single one, then put a link to it at

Chuntuk (talk) 20:56, 15 January 2021 (UTC)[]

I imported the pages but now realise the file is also at mulWS, not Commons. Could a Commons transwiki importer please import,_of_Aberdeen,_on_Thursday,_April_1,_1813.pdf to Commons? Inductiveloadtalk/contribs 21:58, 15 January 2021 (UTC)[]
The move has now been done by Billinghurst. See Talk:Melancholy loss of the whale-fishing ship Oscar, of Aberdeen, on Thursday, April 1, 1813. Inductiveloadtalk/contribs 16:57, 24 January 2021 (UTC)[]
@Inductiveload: There is now a feature "FileImporter" that anyone can use at any wiki to move files to Commons. It doesn't need any special requirements just the ability to have an account to upload to Commons. Detail at mw:Help:Extension:FileImporter. Works WAY better than transwiki as it grabs the whole underlying construct of the file and moves it. — billinghurst sDrewth 22:25, 24 January 2021 (UTC)[]
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. Inductiveloadtalk/contribs 16:57, 24 January 2021 (UTC)[]

Page:Motion Pictures 1912 to 1939 (IA Motionpict19121939librrich0010).djvu/10[edit]

There's a problem in which I can't get the Template:rule in the right place on the TOC. I tried. Anyone have any solutions to get a line in between those two row items? PseudoSkull (talk) 16:25, 23 January 2021 (UTC)[]

Issue resolved by User:ShakespeareFan00 PseudoSkull (talk) 17:37, 23 January 2021 (UTC)[]

Hanging indent paragraphs spanning multiple pages?[edit]

Page:History of Woman Suffrage Volume 5.djvu/804 How can I join hanging indent paragraphs spanning several pages so that they show up properly in the main namespace? — Ineuw (talk) 20:35, 25 January 2021 (UTC)[]

@Ineuw: yes, you can, with {{hi/s}}, {{hi/m}} and {{hi/e}}. The /m goes in the header of the last n-1 pages. Inductiveloadtalk/contribs 20:49, 25 January 2021 (UTC)[]
@Inductiveload: Many thanks. Is there also a simple way to reduce the font-size of the paragraphs as well? — Ineuw (talk) 21:14, 25 January 2021 (UTC)[]
@Ineuw: Actually, for that page, {{plainlist}} with a hanging-indent parameter might work better. Firstly, it'll collapse the items to not have gaps, and secondly you only need one pair of templates per page and thirdly it's semantically more correct, as this is is actually a list. Inductiveloadtalk/contribs 21:22, 25 January 2021 (UTC)[]
@Inductiveload: Thanks again. This index list is 52 pages. It cannot be enclosed in a single {{plainlist/s}}{{plainlist/e}} tag. So, this begs the question on how to do it from the beginning to the end in a generally preferred way that would make the main namespace display correctly?— Ineuw (talk) 02:27, 26 January 2021 (UTC)[]
P.S: {{plainlist/s}} does not indicate how to add parameters.— Ineuw (talk) 15:55, 26 January 2021 (UTC)[]
@Ineuw: why can't it go inside a single plainlist? The split templates work just like any other, using the headers and footers. Unordered lists have no limit on items. However, you would probably have a list per letter, since the heading isn't a list item:
* A Item 1...
* A Item 2...
* B Item 1...
Plainlist/s has the same parameters as plainlist. Inductiveloadtalk/contribs 16:11, 26 January 2021 (UTC)[]
Much thanks for the example. I never used parameters with start and end templates before, so I was confused. — Ineuw (talk) 16:22, 26 January 2021 (UTC)[]

Motion Pictures, 1912-1939/Main/AC transclusion problem[edit]

Page:Motion Pictures 1912 to 1939 (IA Motionpict19121939librrich0010).djvu/18 is not separating from p. 17 properly. It is supposed to say:

"ACTION FOR SLANDER. Presented by Alexander Korda. Released through United Artists. 1937. 8 reels, sd. From the novel by Mary Borden."

Instead it says:

"ACTION FOR SLANDER. Presented by Alexander Korda. Released through United Artists. 1937. 8 reels, sd. From
the novel by Mary Borden."

Can anyone fix this? PseudoSkull (talk) 17:43, 27 January 2021 (UTC)[]

@PseudoSkull: It turns out you shouldn't put empty lines between the list items, or it breaks the list into hundreds of single-item lists. You can either remove the gaps, or replace with <!----> (a comment, with or without content). Inductiveloadtalk/contribs 18:34, 27 January 2021 (UTC)[]

Why I can't make Template:Dotted TOC page listing indent??[edit]

After I discovered the existence of {{Dotted TOC page listing}}, the template definitely helps me with my works of Hong Kong laws. However, some of them requires the indentation of some dotted TOC listings (such as Page:Prevention of Child Pornography Ordinance (Cap. 579).pdf/1, section 138A and 153Q to 153R). How can I achieve this??? I tried the most common : to methods like {{Indent}}, but nothing works? Assistance will be appreciated.廣九直通車 (talk) 13:25, 13 February 2021 (UTC)[]

@廣九直通車:The only way I found is to make a separate entry for "14. Section added" and separate for "138A. Use, procurement…". The problem is that Dotted TOC page listing does not enable to switch the dots off and TOC page listing is a completely different kind of template not suitable for combining with the Dotted TOC page listing within one list. So I tried to make some workaround, replacing dots in line 14. by spaces, the result of which can be seen at my sandbox. It is not ideal, it would be better if the dots in the line 14. could be simply switched off. --Jan Kameníček (talk) 14:03, 13 February 2021 (UTC)[]
@廣九直通車: I really cannot emphasise enough how strongly I recommend to not use {{Dotted TOC page listing}}. From a technical perspective it is a very bad solution, that creates problems (like the one you bring up here), and is really not needed. The dot leaders are purely stylistic, often exist in a very paper-page specific form, and are better omitted even if they could be added in a technically sound way. This is an issue that's up to each contributor to decide, of course, but I really strongly urge everyone to not use {{Dotted TOC page listing}} except in some hypothetical special case where having the dot leaders is extra special important. --Xover (talk) 14:25, 13 February 2021 (UTC)[]
Well, they are not only stylistic, their main purpose is to help the eye keep the line, and until somebody finds a better way of generating dot lines (e. g. something as easy as generating dynamic dot lines in MS Word documents), this template is the best thing we have in this regard. However, I admit that they are used also for stylistic reasons, but this stylistics is imo very important too. Without dots we would have to help the eye e.g. by creating bordered tables or something, which would interfere with the style of Contents pages of books (especially old books) too much. --Jan Kameníček (talk) 17:35, 13 February 2021 (UTC)[]
{{dtpl}} is specifically annoying because it produces a whole, separate, table for every single row. As well as being semantically kind of messed up, this also means it generally doesn't work brilliantly on export.
In this case, I'd say {{TOC begin}} probably produces the easiest result:


{{TOC begin}}
{{TOC row 1-dot-1|spaces=0
 | 1.
 | Short title and commencement
 | A1387}}
{{TOC row 1-dot-1|spaces=0
 | 2. 
 | Interpretation
 | A1387}}
{{TOC row c|3|'''Amendments to Crimes Ordinance'''}}
{{TOC row 1-1-1
 | 14.
 |Section added}}
{{TOC row 1-dot-1|spaces=0
 | 138A. Use, procurement or offer of persons under 18 for making pornography or for live pornographic performances
 | A1405}}
{{TOC row 1-dot-1|spaces=0 
 | 15.
 | Conviction for offence other than that charged
 | A1407}}
{{TOC end}}
Short title and commencement
Amendments to Crimes Ordinance
14. Section added
138A. Use, procurement or offer of persons under 18 for making pornography or for live pornographic performances
Conviction for offence other than that charged
The dot markup is still "messy" at the HTML level, but it's functional in-browser and removed on export (since very very few readers can deal with it). It might be worth investigating if we can export them only in PDF (since the PDF renderer probably can handle them).
{{TOC begin}} also handles things like vertical alignment and setting text wrapping by default. Inductiveloadtalk/contribs 19:38, 13 February 2021 (UTC)[]
After some cleanup, I found Inductiveload's solution is very suitable for the case: I just replaced the space between the indented TOC section number and the subtitle with {{Gap|1em}}. Definitely feels good!
By the way, what's the issue of HTML? I think previously when Billinghurst told me not to use <center></center> at here, he mentioned similar reasons.廣九直通車 (talk) 08:37, 14 February 2021 (UTC)[]
@廣九直通車: Great work! But is there any particular reason you have not marked the pages as Proofread? See Help:Page status for details. --Xover (talk) 10:06, 14 February 2021 (UTC)[]
Isn't that the pages are for other users to proofread? I'm only the guy who import them into Wikisource...廣九直通車 (talk) 11:36, 14 February 2021 (UTC)[]
@廣九直通車: "Importing" isn't really a concept in our model. Transcribing and formatting a page is what we call "proofreading". Proofreading can happen iteratively and collaboratively, but once a page is presumed to be "finished" (and thus ready to be transcluded for presentation) it should be marked as "Proofread". At that point it should be double-checked by a second person, and once that's done it is marked as "Validated". When you mark a page as "Not proofread" you're saying there is more work to be done before it's ready, and it should generally not be transcluded for presentation in mainspace. By my cursory look you have finished proofreading the pages and should mark them as "Proofread". I could be wrong, of course, as I only took a quick look. --Xover (talk) 13:44, 14 February 2021 (UTC)[]
@Jan.Kamenicek: I'll concede your point about dot leaders sometimes being used to help the eye track, but most of the time (in my experience) they are used purely stylistically and sometimes to the outright detriment of readability. And in either case I think the technical disadvantages of {{dtpl}} outweigh any benefit of the dot leaders in all but the most exceptional of cases. I very much wish the draft specification for dot leaders in CSS would materialise soon, but absent that we have no good ways to reproduce them, only various degrees of bad ways, and {{dtpl}} is the worst of the bunch. Please avoid using it whenever possible (I absolutely guarantee that at some point down the line, some poor schmuck is going to have to go through and redo every single page we use {{dtpl}} on, and it's already a bear of a task with what we have so far). --Xover (talk) 13:51, 14 February 2021 (UTC)[]
OK, many thanks for everyone's assistance!廣九直通車 (talk) 13:59, 14 February 2021 (UTC)[]

Bug in Text Downloading function[edit]

I found that the blue "Download" button on the upper-right-hand corner cannot process Chinese text, instead outputting replacement characters. Can somebody report the stuff to Phabricator? Many thanks.廣九直通車 (talk) 10:33, 17 February 2021 (UTC)[]

@廣九直通車: have you got a specific example? Inductiveloadtalk/contribs 10:35, 17 February 2021 (UTC)[]
@Inductiveload:Please try to download any of the transcribed pages on Category:Laws of Hong Kong (like this), or precisely, try to download any texts hosted on Chinese Wikisource (like this). I actually suspect that the problem is not limited to English Wikisource.廣九直通車 (talk) 10:39, 17 February 2021 (UTC)[]
Ah, I see, it's only in the PDFs - I was looking at the EPUBs. Inductiveloadtalk/contribs 10:41, 17 February 2021 (UTC)[]
Also, it seems that the bug affects Japanese and Korean as well, as reflected in those (totally scrambled) downloaded Chinese texts. Even the Japanese and Korean disambiguation are affected.廣九直通車 (talk) 10:42, 17 February 2021 (UTC)[]
@廣九直通車: reported at phab:T274997. CJK is generally a whole "thing", so it's unsurprising that Japanese and Korean are also affected. Inductiveloadtalk/contribs 10:50, 17 February 2021 (UTC)[]

Missing page 97 for Volume 24 of EB1911[edit]

As of Feb. 12, 2021: gives page 96. gives page 98. Suslindisambiguator (talk) 23:28, 12 February 2021 (UTC)[]

@Suslindisambiguator:The pages on the index are fine. The page number on "Page:" subpages are reference numbers to the source file's page number.廣九直通車 (talk) 11:38, 14 February 2021 (UTC)[]
Not really index 97->98; 98->99; and 99->97 . The index needs some cleanup and I'm not sure how to do it. Languageseeker (talk) 03:51, 25 February 2021 (UTC)[]

Poem tag and page wraps[edit]

How do we handle <poem>, when the verse in the source document is across two or more pages? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:25, 25 February 2021 (UTC)[]

You simply finish the page with </poem> and start the new page with <poem> again. Only when a page break coincides with stanza break, you finish the page with
Stanza one.
<br />

For details see Help:Poetry --Jan Kameníček (talk) 11:58, 25 February 2021 (UTC)[]

Alternatively, don't use poem tags at all, but use <br /> between lines and block center/s & /e. This is what many of us do because of the problems of alignment of poems across pages when using poem tags. Beeswaxcandle (talk) 03:39, 26 February 2021 (UTC)[]

Tooltips lost in PDF Export and in Kobo epub Reader[edit]

I noticed that Richard II (1921) Yale has the notes in tooltips. When exported to a pdf, the relevant text is underlined, but there is no pop-up note. When exported to an epub, the comments renders in Calibre, but not on a Kobo. Languageseeker (talk) 13:59, 25 February 2021 (UTC)[]

Whether or not PDFs can support tooltips rather depends on whether the PDF format supports such a thing. @Samwilson: is there any hope of this working in PDFs?
For EPUB, which is basically HTML, Calibre uses the Chrome engine internally, so it behaves more or less just like a browser. Kobo uses a much less capable rendering engine, based on RMSDK which presumably doesn't handle the title attribute. Koreader also does not handle the title attribute. On a touchscreen, it's tricky to handle in general, because there's no "hover" concept without a mouse, and it's down to the program in question to look for a title attribute on long press.
Something that might be possible is on-the-fly conversion of elements with tooltips to references in the export process.
There are more details about what does and does not currently work at Help:Preparing for export. Inductiveloadtalk/contribs 15:29, 25 February 2021 (UTC)[]
Thanks for your reply. Seems that thinks are a bit broken with Kobo. Would it be possible to automatically convert tooltips to footnotes when generating an epub? Languageseeker (talk) 04:06, 26 February 2021 (UTC)[]
For PDFs, could we transform tooltips into comments? Languageseeker (talk) 06:24, 26 February 2021 (UTC)[]
@Languageseeker, @Inductiveload: This is mainly about the {{tooltip}} template isn't it, and maybe {{SIC}}? I sort of feel like it shouldn't be up to the exporter to do the translation to footnotes; that it'd be better to implement tooltips on-wiki in a better way. Because viewing the wiki on a touch screen is as annoying as on an ereader, as far as not being able to hover goes. Maybe re-implementing them as an 'annotation' reference group would be better? That'd work in print as well, and it'd give ereaders the required structure to be able display them as popups (which my Kobo does). Open to suggestions though! :) — Sam Wilson 09:05, 26 February 2021 (UTC)[]
Personally speaking they are web page features only, along hte lines of Wikisource:Annotations. The exported works should not have our annotations, and they should be #noprint. — billinghurst sDrewth 12:07, 26 February 2021 (UTC)[]

Two Enhancement to Search[edit]

Would it be possible to create two enhancement to search that would make this site much more usable to those in university?

  1. Add a citation Box to the Mainpage of a series. For example, The Czechoslovak Review would have a box that says Volume:|_| Number: |_| Page: |_| in the top right corner. You can either type in Volume:|3| Number: |4| Page: |97| or you can select the volume and number from a drop down menu and that would take you immediately to The_Czechoslovak_Review/Volume_3/Joža_Úprka#97 or to the Page:The_Czechoslovak_Review,_vol3,_1919.djvu/131 if there is no proofread version.
  2. Allow users to perform a search within a Category. For example, in Category:Periodicals, Current affairs, there would be a search box that would enable to only search the texts in the category. You can also perform an advance search as well. The search would look through both proofread text and OCR text for the words. For instance, you can search for "soldiers" from June 12, 1912 until June 16, 1914 in Category:Periodicals, Current affairs

Languageseeker (talk) 14:50, 25 February 2021 (UTC)[]

  • For searching within categories, use incategory:. The problem with the first proposal would be finding what article is on that page; this would likely be a manual entry for each work. As a drop-down menu (showing volumes, issues, and then articles), it is much more feasible, and can likely be done automatically. TE(æ)A,ea. (talk) 03:04, 26 February 2021 (UTC).[]
Thank you for your reply. I think that it's important to be able to jump to the precise page number because there are works with tens of thousands of pages where the citation is given as a specific page number. For example, the Federal Register is routinely over 50,000 pages per year with thousands of entries. A standard reference would be 78 FR 51713. A citation tool should be able to take this and jump to the right place in the Federal Register. Perhaps, there can be a lookup table generated as part of the trasclusion process that this page is found in Consumer Product Safety Commission: Notices: Meetings; Sunshine Act.
The incategory: is a good start, but I know that when I'm teaching undergrads, we use databases that allow us to search specific subsets of works. The wiki category is the best equivalent. However, strong search capacities is essential. I'm thinking of something more along the search box in Popular Science Monthly with advanced options. For example, imagine a student wants to find all mentions of parasol by women authors between 1789 and 1815. They would be able to go to the Category "Women authors," go the search box, type in "parasol," select the advance option and limit it by date. Languageseeker (talk) 04:05, 26 February 2021 (UTC)[]

Pictogram voting comment.svg Comment I have added a search box that will search within the publication of The Czechoslovak Review. We can make it more configurable, though at this time it hasn't been necessary. Otherwise Search doesn't work that way you want as the search data is not recorded that way, nor is the work set up that way. Search itself does not understand dates which need to be a special indexed component for any search engine. You can read more about WMF's search at mw:Help:CirrusSearch. These searches can also be setup with the Index: namespace, per Index:Men-at-the-Bar.djvu though we would not typically setup a main ns work that searches in Page: ns for works. Main ns searches should and will only retrieve transcluded work. — billinghurst sDrewth 12:22, 26 February 2021 (UTC)[]

Pictogram voting comment.svg Comment 2. Re incategory: searches, that is going to be a little problematic with how we do things, and maybe it is something that we should rethink now that search is more expanded. We typically only categorise the top level of a work, so INCATEGORY search parameter will only search that page, not the subpages of the work which are not categorised where you identified your interest. There are means around that, though each of them has its consequences, and definite amounts of work to achieve. — billinghurst sDrewth 12:35, 26 February 2021 (UTC)[]
I see. It seems to me more of a limitation with CirrusSearch that anything else. To perform the kind of research searches required in universities would require a new search engine. Would it be appropriate to post a request on Phabricator? Languageseeker (talk) 16:09, 26 February 2021 (UTC)[]
Phabricator is probably not (yet) the right place for this. What you describe requires a base of structured data, which we store on Wikidata. But we don't have very good tools for managing that data, so what's there is and will continue to be very spotty for the forseeable future. First step, thus, is improving our tooling for adding and maintaining that data. Once we have a reasonable coverage on Wikidata, the next is allowing search and navigation based on it. That's going to require a custom search facility that understands the information model (i.e. it knows that works have authors, were published on a given date, can be collective works, are split into volumes and issues that in turn contain articles, etc.). This kind of engine is not useful outside Wikisource (i.e. Wikipedia has no direct use for it) and is a relatively complex bit of software, so it's not something that the (often volunteer) developers working on MediaWiki can just add. It's a tall order, so it needs to start with coming up with a reasonable specification and description of how this will interact across projects and how the technology will interact with the community. Then it needs wider discussion with those impacted (the various language Wikisource projects, Wikidata, and Commons). And then we can start looking for ways to actually get it implemented (probably starting by trying to find a volunteer willing to do the programming).
Don't get me wrong, I think it's a really good idea and something I wish we had; but the realism of getting there any time soon is questionable. --Xover (talk) 16:36, 26 February 2021 (UTC)[]
Can we start the conversation now? I think that this is the perfect time because the pandemic has devastated university budgets and they are looking for alternatives to costly databases. Most university libraries spends millions of dollars a year for databases that mostly use texts in the public domain. We can offer them a free alternative. I'm sure that if we reach out to them, some will be willing to provide technical support and grants. What about the various Marc standards as the basis of our dataset? It's the universal standard for library catalogs and will make it super easy to batch import information from university libraries to Wikidata. We could also sell Wikisource to libraries as a way to become ADA compliable. Imagine, a student with a disability requests an electronic version of an article from 1893. Then the university can pay a student to proofread the article on Wikisource and wikisource will generate the electronic version. It's a win-win. I know the the National Library of Scotland and other libraries have participated her. Maybe, we can reach out to them and see what they want. Languageseeker (talk) 18:26, 26 February 2021 (UTC)[]
The conversation not only could and should start now, but it is actually overdue. I am just trying to manage the expectations of what it is realistic to achieve in any relatively close time frame for a chronically resource-starved all volunteer project. We need much better Wikidata integration for all sorts of things, and what you propose here would just be one exceptionally good showcase for such functionality. The fact is, for what we have coverage for, our metadata is actually generally better than most libraries and archives' databases; we just don't do a good job of making them structured and reusable. Bulk imports of data to Wikidata are happening at an absurd pace already so that's not really a problem; it's connecting the bibliographic related work we do here to Wikidata that's the major gap. --Xover (talk) 19:23, 26 February 2021 (UTC)[]
Glad you agree. I look through Wikidata and, as far as I could tell, there is no MARC parser or exporter. I think that this would be the first step towards working with any library. It will enable us to import the data from libraries into Wikidata and export it out. The MARC standard is available online [1] and open source implementations exist [2]. This is a huge undertaking and will probably require fundraising, but I think that it's the first critical step towards making Wikisource a true online library. It also probably makes sense to reach out to the Open Library for possible collaboration in software development. I'm a new user here so I haven't earned my stripes, but I would love to work with you on this project. Languageseeker (talk) 20:13, 26 February 2021 (UTC)[]

Bot to replace long s with {{ls}}[edit]

I've noticed that there are numerous texts where there is a long s "ſ". I was wondering if it would be possible to create a bot to replace all long s "ſ" with {{ls}}. This would enable users to toggle the view between"ſ" and s. It would also help to make the texts more compatible with no drawbacks. Perhaps, we could limit the bot to just validated texts in case the {{ls}} makes proofreading more challenging. Languageseeker (talk) 17:36, 22 February 2021 (UTC)[]

  • Whether or not to use reproduce long s is not a settled matter and is currently treated as a matter of discretion for the initial proofreader of a work. In general, we don't mess with their formatting provided that it's consistent with policy and they've completed the work with that formatting.
  • {{ls}} is certainly not drawback-free. The switching script is a user script that's incompatible with one of our gadgets.
So I don't think this would be appropriate.
Also, since this would be a substantial change, if you want to purse this, it should be at WS:S, not this subpage. BethNaught (talk) 19:11, 22 February 2021 (UTC)[]
Gosh is that still used? I though it was turned off years ago, I didn't realise it was still advertised at {{ls}}. It might be fixable now I know kind of what I'm doing :-s. Inductiveloadtalk/contribs 19:24, 22 February 2021 (UTC)[]
I think that it's an important script. Can you fix it please. Languageseeker (talk) 21:45, 22 February 2021 (UTC)[]
OK, I have fixed it up a bit and it seems to work. Whatever was the problem with the alt-index gadget seems to be resolved. Perhaps this should be a Gadget? Inductiveloadtalk/contribs 09:53, 23 February 2021 (UTC)[]
I am against the forced use of {{ls}}. We have works where the long-s has some significance and works where its orthographic reproduction is entirely superfluous. As for proofreading, my experience is that any differences from standard modern English are a challenge for proofreaders. This includes ligatures such as æ, diacritics such as in é or ï, and archaic hyphenations as in "to-day". We have no mechanism for turning these off. Long-s is just one of many issues that proofreaders have to train their eyes for. --EncycloPetey (talk) 16:40, 23 February 2021 (UTC)[]
I am for using long s but paste a ſ when proofreading. I think there should be an annotated copy -sans ſ- produced when the book is transcluded for those who find it difficult to read with ſ. Old style script is actually part of the charm of reading old books so should be preserved, I think. Zoeannl (talk) 21:49, 23 February 2021 (UTC)[]
Thank you for all the thoughtful feedback. @EncycloPetey I'm not advocating for forcing proofreaders to use the long s, but I think that if a text already has a long s, it might be useful to have a bot that will automatically replace it with {{ls}} in the posted text. Then, readers can turn it on or off. Some enjoy the long s, others find it distracting. Languageseeker (talk) 02:24, 24 February 2021 (UTC)[]

@BethNaught: I would disagree that this is not a settled matter, I believe that it is a well-settled matter. The discussion was had, and the decision was to use a standard s, or to have {{long s}}, so that users could have their long s in page: and we had a modern characterisation in main. The documentation is pretty certain about the approach. For many years we used to fix this up through patrolling and stopping the use of long s in works, though it seems that today's patrollers have not been as stringent.

With regards to bots, where I see (stumble upon) works with long s in use, I do in fact replace them using my bot. I don't go hunting them specifically. — billinghurst sDrewth 03:41, 24 February 2021 (UTC)[]

@Billinghurst: Believe it or not, I did search the archives before making my assertion, and I could find no such definitive discussion about never showing long s in mainspace. Could you actually point us to it, rather than merely asserting it exists? Indeed, while I found several discussion in the past decade, none were definitive. For example: this one, where a variety of opinions were expressed, including that long s should be displayed in the primary version; this one, similarly; this one where you yourself describe long s as a preference.
Additionally, where is the "pretty certain" documentation? The style guide does not mention long s; Wikisource:Style guide/Orthography does talk about long s, and while it discourages literal long s, there is no language forbidding it nor mandating {{long s}}.
It seems to me that on a project where a lot of norms are uncodified, and latitude is given for contributors' editorial discretion, to an extent practice is policy. And given that we have a featured text with literal long s, you can't rely on historical patrolling practice to counter modern patrolling practice.
I disagree that it is appropriate for you to mass-replace long s with a bot, at least in a completed work where it is consistently used. BethNaught (talk) 10:46, 24 February 2021 (UTC)[]
@BethNaught: Orthography says don't use long s. It is not wanted in main namespace. Template:Long s was designed to allow for those who wanted long s in the page namespace to represent the work there, yet displaying properly in main ns. So _orthography_ is meant to allow users to use long s template, or use the standard letter s and not have to reproduce long s in the text as printed.

No, the guidance doesn't ban it, and as I said elsewhere in the past couple of days, there are works where it is specifically added to be long s as a more modern work posing as an older work, for example Manners and customs of ye Englyshe; so banning it would defeat those needing to display as required. Similarly there is some German use, and also old text reproduced in works where we display it as reproduced. The conversations about the use of the ligatures and long s was also contained in search discussions as long s texts do not reproduce plain English searches, works are lost. — billinghurst sDrewth 11:26, 24 February 2021 (UTC)[]

So you can't point to an actual discussion deciding in favour of your position? I'm looking for the authority of the community, not the dictum of an elder statesman. BethNaught (talk) 12:15, 24 February 2021 (UTC)[]
  • I strongly oppose this proposal. In addition, if this is a request for a bot to accomplish that task, it needs approval elsewhere. TE(æ)A,ea. (talk) 14:42, 24 February 2021 (UTC).[]
Pictogram voting comment.svg Comment By the way, the internal CirrusSearch search engine and Google, Bing and Yandex treat "ſenſe" the same as "sense" (and vice versa). On the other hand, Yahoo and DuckDuckGo do not "normalise" long-s. So ſ at least doesn't totally torpedo searchability any more. Inductiveloadtalk/contribs 09:22, 25 February 2021 (UTC)[]
I have also noticed that search engines can cope with long s well and so I tend to agree with not preventing its usage in our main namespace. It is historical typography and it looks good in historical documents. So I would not forbid users to enter a typographical version according to their preference. If somebody wanted, they could also be allowed to enter both typographical versions (e.g. one of them as annotated) and we may think of some ways of enabling users to switch between the two versions. --Jan Kameníček (talk) 10:20, 25 February 2021 (UTC)[]
@Jan.Kamenicek: +1. There actually is such a system, unofficially, at least: pages using {{ls}} can be toggled using this script:
It also allows you to toggle external links' blue color. Inductiveloadtalk/contribs 12:42, 25 February 2021 (UTC)[]
@Inductiveload: Yes, I know, but that is not what I meant. I can do it after I was explained, but my parents would not do it no matter how well you would explain it to them. Similar tools make real sense only if they are accessible to ordinary users. What is more, similar scripts can be utilized only by logged in users, while vast majority of our readers do not log in. --Jan Kameníček (talk) 13:03, 25 February 2021 (UTC)[]
Sure, but with approval, this can be made a default gadget. Inductiveloadtalk/contribs 13:09, 25 February 2021 (UTC)[]
I think it would be much more user friendly to have a button to toggle long s on and off on the page than to have to make custom css. Perhaps, we could place a slider button on the top right corner of the page. Languageseeker (talk) 13:20, 25 February 2021 (UTC)[]
Yes, something like that would be great. --Jan Kameníček (talk) 14:49, 25 February 2021 (UTC)[]
"it looks good in historical documents" — no it doesn't. It looks awful and causes difficulties in reading fluently—principally because the common fonts in use don't distinguish it effectively from "f". The glyph had its hey-day in the Tudor and Stuart periods and fell into disfavour during the end of the Stuart period and was mainly used through the 18th and 19th centuries by publishers who wanted to pretend antiquity. My approach to this is to restrict use of the to works printed in the Tudor and Stuart periods up to and including the First Commonwealth. I use a plain "s" for works printed after the Restoration. Beeswaxcandle (talk) 03:59, 26 February 2021 (UTC)[]
I'm not a huge fan of the long-s except in facsimile reprints. But I will say that as w:long s details, use of the long-s in English plummeted between 1790 and 1810; its use in the 18th century was pretty standard.--Prosfilaes (talk) 03:26, 27 February 2021 (UTC)[]

UniversalLanguageSelector font list?[edit]


Is there anywhere that I can view a list of the fonts included in the UniversalLanguageSelector extension? Perhaps I am just being a dunce, but I can't find any obvious place that lists them.

The reason I ask is – the dictionary I'm transcribing is from a time before the IPA was standardised, and it uses a few bizarre characters that don't seem to be rendered very well with the default fonts that are used (at least on my computer). The ones that are causing the most issue are the two culprits in the infobox. The diaeresis and the macron are supposed to be centred above the ꜵ. Would anyone happen to know if any of the fonts in the UniversalLanguageSelector would render this character properly, and if not, if a suitable font could be added?— 🐗 Griceylipper (✉️) 22:22, 26 February 2021 (UTC)[]

@Griceylipper: All fonts included with ULS. But your two ao-ligatures seem to render just fine in my browser (Safari on macOS) so I'm not really sure any ULS shenanigans are necessary. --Xover (talk) 07:49, 27 February 2021 (UTC)[]
I see the error on firefox and IE on Windows. Languageseeker (talk) 15:14, 27 February 2021 (UTC)[]
Ao ligatures not rendering correctly.png
Thanks for the list @Xover:. This is what I see on Windows 10 with Chrome. Charis SIL seems to render the ao ligature properly, but there seem to be characters missing in it as well such as ꬶ (renders fine for me, but another font is being substituted for this character.) Though I can live with that.
If Charis SIL is the best compromise out of all the fonts included in ULS, would it be appropriate to blanket apply this font to the whole work? Or is that not advisable?— 🐗 Griceylipper (✉️) 19:25, 27 February 2021 (UTC)[]

Undo a Move[edit]

I moved Index:Milton - Milton's Paradise Lost, tra il 1882 e il 1891.djvu to a different title and it seems to have broken everything. Is their anyway to undo the move? Languageseeker (talk) 06:48, 27 February 2021 (UTC)[]

@Languageseeker: Done. For the future, perhaps temper enthusiasm with a pinch of caution until you gain more experience with the software and our community? :)
Most things can be fairly easily fixed so mistakes are not a big deal, but we do have some relatively unique software and community features that can sometimes make assumptions based on experience elsewhere a little iffy. In this case the issue was that the way the software connects the scanned file at Commons with the Index: and Page: pages here is through the page name. If a file is named "File:The Book.djvu" then the Index: must be at "Index:The Book.djvu" in order to work. We also cannot easily rename the File / Index / Page: pages for various technical reasons and so we tend to treat those as mostly opaque (rather than human readable) strings. We prefer nice logical and accurate file names if they can be had at the time of upload (or shortly after, before dependent pages are created), but we live quite happily with all but the most misleading filenames in most instances. --Xover (talk) 07:43, 27 February 2021 (UTC)[]
@Xover: Thank you! I'll be more cautious in the future. I didn't want to rename the book in Commons because I didn't upload it, but it seems that that would have been the place to do it. Languageseeker (talk) 15:01, 27 February 2021 (UTC)[]

Table Trouble during proofreading[edit]

How to achieve that kind of custom table style as used in paragraph (c) of Page:Administration of Justice (Protection) Act 2016.pdf/35? I'm still confused with Template:Table style. Many thanks for solutions.廣九直通車 (talk) 07:47, 28 February 2021 (UTC)[]

Xover has beaten me to it. A slightly simpler way of doing the same thing is:

"175 If the document or electronic record is required to be produced in or delivered to a court of justice Ditto Ditto Ditto Imprisonment for 6 months, or fine*, or both Ditto";

Beeswaxcandle (talk) 08:54, 28 February 2021 (UTC)[]

@廣九直通車: (e/c) I had a go at it, see if it's what you had in mind.
Tables in HTML and CSS are complicated, table syntax in MediaWiki is complicated, and {{ts}} is complicated. And you need to wrap your head around all three to be able to effectively work with tables here. Add the sometimes really quirky table formatting in the works we reproduce and I would be very much surprised if most people didn't struggle with this at least some of the time (I know I do).
For this particular table the key was to turn off borders for the table overall, and then to apply a single border to each of the cells (except the last one). {{ts}} is just a shortcut for inserting style="border-top: none; text-align: left;" and similar. Each of the obscure little keywords documented for the template equates to one CSS keyword with a specific value (text-align: left, for example, is al).
And since {{ts}} inserts a style attribute, you need to use it where such attributes are valid; most often on the main HTML <table> element (created by {|), a <tr> (table row, created with |-), or a <td> (table cell, created with various combinations of | and ||). Inside a table row each || separates the cells, but you can also put attributes directly on those cells with || attributes | cell content. Attributes can be a {{ts}} (it outputs a style="…", recall) or colspan="…" or rowspan="…".
In any case, the point is that you need to keep four different models of the concept of a "table", with attendant syntax and details, in your head at once so it's really more surprising anyone ever figures it out. --Xover (talk) 09:08, 28 February 2021 (UTC)[]

Joining lines[edit]

Is there a way to easily join lines together? I'm working on a few texts with hanging indents and if I don't join the lines together, then the indent breaks. Languageseeker (talk) 11:42, 28 February 2021 (UTC)[]

@Languageseeker: See Wikisource:Tools and scripts#PageCleanUp --Jan Kameníček (talk) 13:07, 28 February 2021 (UTC)[]
@Jan.Kamenicek: Awesome, thank you so much. It's even better that I hoped for. Languageseeker (talk) 14:37, 1 March 2021 (UTC)[]

Corruption in main namespace page?[edit]

There is some corruption on the top of this main namespace page and I have no clue what caused it. Can someone please look at it. Thanks.— Ineuw (talk) 07:30, 14 February 2021 (UTC)[]

@Ineuw: I excluded the empty page and now it looks OK. --Jan Kameníček (talk) 08:09, 14 February 2021 (UTC)[]
@Jan.Kamenicek: Thanks, but the page error is still showing this morning, even though I deleted and recreated the page. This is what it looks like on the recreated page. I copied the contents to this Sandbox where it shows OK. The error disappears when I am logged out. So, I deleted the browser cookies of all wikis, logged back in and the problem re-appeared.— Ineuw (talk) 15:21, 14 February 2021 (UTC)[]
I am not experiencing this problem in my browsers neither logged in nor logged out, trying Chrome, Firefox and Edge, so it must be something related to your settings… Could Xover give some advice? --Jan Kameníček (talk) 17:29, 14 February 2021 (UTC)[]
Since the last post, it has also disappeared while I was logged in. I mostly use Firefox, and then Vivaldi as a Chromium substitute, but looked there too late. Thanks. — Ineuw (talk) 18:20, 14 February 2021 (UTC)[]
(e/c) Not happening for me either. However, there was something screwy with the pagelist command, which I've corrected in accordance with the policy on these. Beeswaxcandle (talk) 18:22, 14 February 2021 (UTC)[]
@Beeswaxcandle: Is there a policy that limits pagelist layout to numbers but not alpha letters? What about roman letters? I have done many index pages with indicating the chapters, sections, images, etc. I need it! Especially, when another policy bars me from creating wiki links to the page namespace in a book's table of contents.
In my opinion, if something is technically feasible it should be allowed. Otherwise remove the feasibility. Community policies barring the use of features is the same of telling developers not to use one programming language vs. another. Let's see how that works for Wikimedia. If I am not allowed to organize and identify the data my way to ease my work, then I don't see how I can contribute at the level I have been contributing.— Ineuw (talk) 19:07, 14 February 2021 (UTC)[]
@Ineuw, @Beeswaxcandle:, there is nothing technically wrong with that syntax AFAICT, it's a string like any other. It would be good to be able to have an extra "label" that doesn't affect the numbering for the purpose of splitting up the page list and making it easier to navigate during proofreading: so I created phab:T274740. Inductiveloadtalk/contribs 20:18, 14 February 2021 (UTC)[]
Any page that is part of a numbered flow needs to be labelled as that number. Whatever is put into the pagelist command is an anchor when the page is transcluded into the mainspace. If a page is labelled as something other than its number, then it cannot be linked to in a standardised way from other works. Pages that are interpolated (images, plates, and the like) can be labelled with what they are. In terms of roman numerals as opposed to arabic, yes, these can be done. It is because the pagelist command accepts strings, that the policy at Help:Index pages#Pages was developed to ensure that we have a standardised approach to implementing the pagelist command. Beeswaxcandle (talk) 08:40, 16 February 2021 (UTC)[]
@Beeswaxcandle: Yes, I understand that, which is why I'm suggesting a an extra "label" that doesn't affect the numbering. Something like this:
ProofreadPage pagelist with extra label.png
Such a thing is a pretty trivial gadget, but would be easier if it could be part of the tag markup. Inductiveloadtalk/contribs 09:10, 16 February 2021 (UTC)[]
Such labels would also assist with the situation of multiple works in a single pagelist? Currently some of these end up as seperate Pagelists, which confuses some Gadgets. 09:21, 16 February 2021 (UTC)[]
@ Yes. My personal use case is delineating issues in a periodical or other collective work, which are otherwise really hard to work out. E.g. find the start of the December issue start in Index:Notes and Queries - Series 5 - Volume 10.djvu, or things like Index:Parliamentary Papers - 1857 Sess. 2 - Volume 43.pdf. Inductiveloadtalk/contribs 09:28, 16 February 2021 (UTC)[]

Pictogram voting comment.svg Comment It has long been possible and acceptable to have multiple <pagelist> where it adds value to the Index, and it doesn't break transclusions. I did it at index:Men-at-the-Bar.djvu years ago, as it is a work that has been dipped into and out of, and has not ToC or easy to work out where in the work. It wouldn't be normal to do it where there is a ToC as it become superfluous. — billinghurst sDrewth 11:50, 16 February 2021 (UTC)[]

@Billinghurst: Thanks for the example of multiple pagelists. But, please keep in mind that what is superfluous because of your knowledge and experience, may not apply to others.— Ineuw (talk) 18:41, 17 February 2021 (UTC)[]
Hey what.? If a transcluded ToC on an index page says chapter 1 starts at 53, then it starts at 53. Not typically a hard concept for anyone transcribing or transcluding. I didn't say don't do it, I just said generally superfluous if there is a ToC. — billinghurst sDrewth 06:27, 18 February 2021 (UTC)[]
I suspect we are talking about two different things. I find your solution of using multiple pagelists in a single index page to be an excellent solution for my problem and I am working on it now.— Ineuw (talk) 07:47, 18 February 2021 (UTC)[]

Pictogram voting comment.svg Comment/plug: The User:Inductiveload/index preview script can help here since you can see a page preview without leaving the index page. Actually, so can User:Inductiveload/popups reloaded.js, but that's still WIP. Inductiveloadtalk/contribs 08:01, 18 February 2021 (UTC)[]

Corruption in main namespace page? Redux[edit]

I have a simple question. All Index files' pagelists have characters in them. In fact, all Index pages I edited have text in the pagelist, and I never heard about this being an issue. The problem only cropped up in one Index page. I am continuing to indicate pages as before, and no problems. Could it be that the page is damaged? Otherwise, what changed? — Ineuw (talk) 01:02, 3 March 2021 (UTC)[]

the book Golden Treasury of English Songs and Lyrics[edit]

Hi I have this book, it is missing the first 22 pages including the introduction, also pages 455 to 466 are missing I found them someone put them back in the book but in the wrong place. As far as I can tell all the other pages are intact. It was first printed in 1861, I think the one I have was in 1903?? Just FYI, On updating the file. Jere L Wilson

Hello. Do you mean Index:Golden Treasury of English Songs and Lyrics.djvu? The file looks OK. --Jan Kameníček (talk) 21:45, 6 March 2021 (UTC)[]

Texas Independence Referendum act[edit]

I would like to add the Texas Independence Referendum Act which can be found here to Wikisource. How do I start? Do I upload it to commons? How do I create an index page. Also, the lines are numbered, should I keep the numbering or not? Thanks in advance, -Gifnk dlm 2020 (talk) 14:41, 10 March 2021 (UTC)[]

@Gifnk dlm 2020: I left our welcome template which contains all the relevant help messages. In short upload the PDF to Commons, and best to utilise their c:template:book. Then you can simply create the Index: page here the filename and indexname need to have the same page name, and if {{book}} is used properly the data will populate over. We can help once the file is uploaded to Commons. — billinghurst sDrewth 09:54, 11 March 2021 (UTC)[]
@Billinghurst:, thank you very much? How should I call the pdf file? Texas Independence Referendum act? or is there a convention in naming that sort of documents. Thanks in advance, -Gifnk dlm 2020 (talk) 19:29, 12 March 2021 (UTC)[]
@Gifnk dlm 2020: Typically follow c:Commons:naming conventions, though the one proviso that we would emphasise is that we are typically working with editions of works, so there can be multiple versions of some works, so additional identifying information can be of value, eg. year, place, etc. — billinghurst sDrewth 03:46, 13 March 2021 (UTC)[]

Increasing the resolution of a PDF in edit mode[edit]

I've been transcribing Frank Leslie's Illustrated Newspaper (starting with Volume 18), and because it has large pages and small type, I've found that the resolution of the preview images in edit mode is not sufficient to read the words even zoomed in (not a problem with the PDF). I suspect this is also the cause of the issues with the OCR, as neither the automatically-generated OCR, the OCR gadget, nor the Google OCR gadget does a remotely reasonable job of picking up the text on the page (and they get wildly different unreasonable results). I've tried putting very large numbers in the "Scan resolution in edit mode" field of the index, but this hasn't had any noticeable impact. I had the same problem with converting pages of the PDF to images for image extraction (Preview on Mac FWIW), and for those had to resort to zooming in on the PDF and taking a screenshot. — CalendulaAsteraceae (talk) 07:56, 11 March 2021 (UTC)[]

@CalendulaAsteraceae: Looking at the file (714 × 1,045) and File:Frank Leslie's Illustrated Newspaper Vol. 11–12.pdf (606 × 904) the problem is that the source is not high quality and you cannot increase it past the file quality. For images, I would be heading over to and peering into the "View Contents" of the zip, and you should be able to get good access to high quality jp2 images or jpg if you prefer. It may be possible to rederive the file at a higher quality as those images are (1,193px x 1,747px) though that is not an insignificant task. — billinghurst sDrewth 09:51, 11 March 2021 (UTC)[]
Thank you! —CalendulaAsteraceae (talk) 03:01, 12 March 2021 (UTC)[]
@CalendulaAsteraceae: There is also a user script that pulls the images from Internet Archive or Google Books directly, when Wikisource is in the edit mode on a Page: (assuming the PDf/DJVU has certain detectable links set up). The script is User:Inductiveload/jump_to_file.js. However the script is not fool-proof and in this instance a manual offset would need to be applied, something which @Inductiveload: was working on adding to the script concerned. ShakespeareFan00 (talk) 10:45, 11 March 2021 (UTC)[]
@ShakespeareFan00, @CalendulaAsteraceae: it should now allow you to set an offset (on a per-index basis), as well as turn high-res loading on and off.
For Index:Frank Leslie's Illustrated Newspaper Vol. 18.pdf, the offset appears to be 192.
It is be installed like this:
mw.loader.load('// to file/load.js&action=raw&ctype=text/javascript');
More docs at User:Inductiveload/jump_to_file. Have fun! Inductiveloadtalk/contribs 17:13, 11 March 2021 (UTC)[]
Thank you, that works really well! —CalendulaAsteraceae (talk) 03:01, 12 March 2021 (UTC)[]

A minor gadget/edit toolbar distraction[edit]

On activation of both OCR Gadgets, with the first edit page refresh, the Tesseract OCR icon is separated from the Gogle OCR icon by the "Insert template" icon. Afterwards, the two OCR icons are side by side, and it's very distracting. Would it be a major correction to keep the two separated by anything? — Ineuw (talk) 23:28, 13 March 2021 (UTC)[]

Asking for suggestions for a minor javascript correction with an image this time[edit]

Can someone suggest how I can correct this issue for myself? I copied User:Ineuw/Gadget-ocr.js to my namespace, hoping to modifying the script because, on the first edit, the [Template button] is positioned between the two OCR buttons (which is very helpful) and afterwards it's positioned to the left as in this image. (Having both side by side is distracting). Half my thanks in advance, many thanks for a successful solution.— Ineuw (talk) 19:29, 16 March 2021 (UTC)[]

Index from Image Files and Haithi Trust[edit]

@Xover, @Inductiveload, @ShakespeareFan00: Today, I experimented with creating an Index from a set of images for three reasons

  1. There is the possibility of adding JP2 support that will enable the usage of image files from IA and other sources. However, the Wikifoundation has made it clear that it will not support images in zip files. Therefore, Wikisource will need to create Index files from individual images to leverage the advantages of JP2 files.
  2. Books can be downloaded from Hathi Trust as individual images. If uploaded to the IA, then IA converts them from the original files to JP2 and then to PDF incurring serious quality loses. One file went from 2GB in JPEG files to a 100mb pdf.
  3. Understanding the problems of Index files created from images can help prepare this site for the future.

For the purpose of review, I created Index:A dictionarie of the French and English tongues based on 2gb of images of a book from 1611 and Index:Shen of the Sea based on 137 mb of a book from 1925.

Here are my major finding

Hathi Trust[edit]

  1. Images for a book are stored in a mixture of JPEG and PNG files. The images stored on Hathi Trust have no extension and the extension must be added from the minetype in the file. Therefore, most book downloaded from Haithi Trust will contain a mixture of image formats.
  2. Hathi Trust does not return an error when it goes beyond the last page of a book. It just keeps on sending the back cover.
  3. You need a program like trid to add an extension based on the minetype.
  4. Images can be downloaded sequentially from Hathi Trust.
  5. Hathi Trust throttles downloads and either automatic retrying or a 4 second delay between requests is needed.
  6. Images downloaded from Haithi Trust must be renamed into a sequential order during download.
  7. Pattypan makes it easy to upload the image files to common and generates a sequence of images that can be used to generate the list of Pages for the Index file.

Advantages of an Index File generated from Images[edit]

  1. Much faster loading time.
  2. Easier to add in missing pages.
  3. Images can be cropped directly.
  4. Text is easier to read due to higher quality images.
  5. Easier to add missing pages or replace damaged pages.
  6. No need to rely on the IA upload tool or try to figure out why your file won't upload to Commons because individual 400kb to 2.5mb are no problem for common and Pattypan makes batch uploads trivial.

Disadvantages of Index generated from Images[edit]

  1. No OCR layer
  2. Merge and Split doesn't work
  3. index preview.js does not work
  4. Preview Pagelist does not work
  5. Uploading 1,000 images to Common takes a long time.

Suggested Changes[edit]

  1. Add a third parameter page_sequence for the Pages category on Index ns
    1. Currently, when adding pages from images the code is [[Image|Page Name]]. This creates Page:Image. This is out of sync with the way Pages are created from DJVU or PDF files; namely, Page:Index_name/Sequence.
      1. Example, the same book Index:Shen of the Sea creates Page:Mdp.39015056023214 001.jpeg, Page:Mdp.39015056023214_028.png, Page:Mdp.39015056023214 280.jpg.
      2. They would make more sense as Page:Shen of the Sea/1 Page:Shen of the Sea/28 Page:Shen of the Sea/280
    2. The current approach to creating Page from separate image file will always break Merge and Split because of the possibility of mixed format images.
    3. The new approach should be [[Image|Sequence|Page Name]] creating Page:Index_name/Sequence.
    4. For Index created from Images with two parameter index ([[Image|Page Name]]), a bot should exist to automatically add a numerical sequence starting with 1. This should also be done on the creation of an Index.
  2. Fix index preview.js and Preview Pagelist to handle Index ns with images from individual files.
    1. Having a sequence number for images can help.
  3. Automatically run OCR for all the Images and created a text layer.
  4. The Scans section on Index ns does not make sense because of the possibility of multiple minetypes. Instead of asking the User to manually enter the file type, the Index ns should automatically list all minetypes present.

Here is a page created with text generated by Google OCR and an Image generated by the Crop tool.

I've probably forgotten a few things, so please ask questions. Languageseeker (talk) 03:03, 21 March 2021 (UTC)[]

@Languageseeker: While I appreciate your enthusiasm and zeal here, you're misunderstanding the problems and prescribing inapt solutions to the wrong problems. The above is a reasonably accurate summary of the status quo (which we're painfully aware of), and "being able to use images from Commons for an Index:" is roughly a description of the desired goal (which we have previously articulated). But getting from here to there is going to take sustained effort from the community in specifying the solution, followed by advocacy and recruitment to find the developers able to do the work, and then a significant amount of developer resources over a fairly long stretch of time (including ongoing maintenance). And due to the existing platform infrastructure, into which any solution for our needs is going to have to fit, this will not be green field development: it will involve not just "a developer" hacking together some new functionality in Proofread Page, but multiple developers from multiple teams at the WMF working on multiple components of the technology stack. It is also very probable that the features we need do not all actually exist in the stack and will need to be developed more or less from scratch (and without breaking anything in the process). And because these do not yet exist out of the box we are actually going to need some developer assistance in just coming up with a specification that is even remotely implementable. Meanwhile, we can't even get minimal ongoing maintenance of our core software components (except by the kindness of volunteers in their very limited spare time) or bug fixes for which there is a patch provided applied. So, to put it succinctly, this problem only looks simple if you ignore all the hard parts.
I wouldn't for all the world want to put a damper on your enthusiasm, but right now you're flailing about all over the place without the background to direct the energy constructively (hint: it's not in getting Commons to ban duplicate scans in PDF and DjVu because you think the issue has any similarities with VHS vs. Betamax). Slow down. Learn. Discuss. And then figure out how to direct your energies where they can do the most good. There is no simple short term solution that none of us have been able to come up with: there are only hard long-term solutions that will take all of us pulling together in a sustained effort. --Xover (talk) 09:04, 21 March 2021 (UTC)[]
@Languageseeker: Re Hathi Trust: FYI you can get the number of pages in the book from the Data API (along with the image data itself). You need a free UoM Friend account to get an API key. You can also find it in the HTML for a book (<span data-slot="total-seq">254</span>, but the Data API is tidier.
The file extension is easy to work out from the returned mime-type of the image data. Generally PNG is bitonal and JPG is coloured. If you do make a DJVU from the images, you should use this information as it cuts the filesize by an order of magnitude. Inductiveloadtalk/contribs 13:51, 21 March 2021 (UTC)[]
@Xover: Thank you both for your thoughtful replies. I agree that I probably do need to learn more and that many of these changes are far more complicated. Do you think it would be possible to change the Pages generated from an image based index to Page:Index_name/Sequence instead of Page:Image Name or would that also a deep restructuring of the platform infrastructure?
@Inductiveload: Thanks for the advice. I actually discovered that triad [3] can do this as well. For me, the question is what Source do I put down on an Index ns if I use the JPG/PNG file from Haiti Trust. Languageseeker (talk) 14:07, 21 March 2021 (UTC)[]
@Languageseeker: Without knowing the code intimately, so caveat etc.… I think that would be a relatively contained change only in the Proofread Page extension. I don't want to speculate about the complexity / how much work that change would be without knowing the code, but it doesn't obviously require any major surgeries anywhere. But a better question is why do it? What does it gain us? The individual page names could essentially be random strings for all it matters: it's the Index that ties it together and the software knowing what the sequence of pages is. So long as we can use the next/previous buttons on each page, and transclude sequences of pages (from/to) using <pages … /> what does the page naming matter?
The Match & Split tool doesn't support non-multipage formats, but if that's what you want to use why not pursue making it support that? It's not really the mixture of image formats that's the problem there, but rather that it assumes it's a single multi-page format file. But the source code is available and I have access to the relevant server if need be. Of the two JS tools one is developed by a long-time enWS contributor, and the other by a more recent contributor as a student Google Summer of Code project, and both of them are active and responsive to queries. Both tools should technically be able to work with an image-based Index, albeit possibly with code that is too hacky to want to implement in production (I don't think there's a clean API in place for the necessary information yet). If you're having trouble using one of those tools for a specific project that's the level at which you'll want to pursue it. --Xover (talk) 15:34, 21 March 2021 (UTC)[]
@Xover: The major reason would be to harmonize the way we make Pages for PDF and DJVU indexes with the way we do so for images. For PDF and DJVU, the system uses Page:Index_name/Sequence and for Images Page:Image Name. This means that every piece of code has to take into account these two systems. Also, for tools, such as merge and split, you would need to query the list of images and then match to individual images instead of a sequential range of pages. Languageseeker (talk) 16:41, 21 March 2021 (UTC)[]

Appendix 2 of Katherine Mayo's book, "General Washington's Dilemma"[edit]

Help requested to upload to Wikisource please

I wish to upload Appendix 2 of Katherine Mayo’s book, “General Washington’s Dilemma” to Wikisource, since it is missing from the New York (1938) edition and only available in the London versions. I wish to upload it here: but would far rather upload the text, with references and Wikilinks, which has been prepared by me here (full explanations are there too): I think this would be far more useful. The text commences: “The following is a faithful copy...”. Copyright approval has been obtained here: see (Nthep (talk) 15:07, 17 March 2021 (UTC)). I am essentially looking for some kind editor who would do this for me since I’m too old to have good IT skills and this is likely to be well beyond me. Help would me massively appreciated. Once done, there are three main pages where this will need to be linked, the main one being,_2nd_Baronet but I should be able to copy the outcome to the other pages. Many thanks in advance to the person willing to help me out here. If this really can only be done with either jpeg or pdf documents, then I will have to ask my daughter to do this since she has a copy of the correct edition of the book. Arbil44 (talk) 18:24, 18 March 2021 (UTC)[]

Thank you for your willingness to contribute to Wikisource. Do you have a full copy of the book? I know that you say that the only the Appendix is different, but there can be subtle differences between editions. Also, we prefer to have our works scan backed. Is there any place where we can find a scan of the book? Languageseeker (talk) 19:26, 18 March 2021 (UTC)[]
Thank you Languageseeker. I have the book (well my daughter does now) and it is the London: Jonathan Cape, 1938 edition, pp.263-268. The New York Harcourt, Brace edition does not have an Appendix 2, and that is the reason I would like it uploaded to Wikisource here. I have six scanned pages of Appendix 2, in jpeg. format. Would that be acceptable? If you were able to send me an internal email, I could then send these scanned images to you? That said, I wish you could use my Sandbox 4 edition! It is a faithful copy and I put a great deal of work into it, including lots of Wikilinks, and of course the references needed (2 of them) as well. However, if that must go to waste, the important thing is to get it up on Wikisource please. Arbil44 (talk) 01:32, 19 March 2021 (UTC)[]
Just adding some information here from my sandbox 4.
The following is a faithful copy of Appendix 2 of General Washington's Dilemma by Katherine Mayo. This appendix appears in the London: Jonathan Cape, 1938 edition, pp.263-268, and in the New York/London: Kennikat Press 1970 edition, pp 263-268, but not in the New York, Harcourt, Brace & Co., 1938 edition, which is also online here:[a 1]
All references to The Hon. R. Fulke Greville, of the First Foot Guards, are now known to refer to Lieutenant and Captain The Hon. Henry Greville of the 2nd Foot Guards (now known as the Coldstream Guards[a 2] Arbil44 (talk) 01:39, 19 March 2021 (UTC)[]
  1. Mayo, Katherine (1938). General Washington's Dilemma. New York: Harcourt, Brace and Company. 
  2. "The Thirteen Officers and Their Regiments". The Journal of Lancaster County's Historical Society 120 (3): 100. 2019. OCLC 2297909. 
@Arbil44: None of your proofreading would not go to waste if we scan-backed this work, it would be placed next to the scan images. Otherwise no-one else can validate this book, as it appears to be not otherwise available online. For more information, Help:Beginner's guide to proofreading provides a quick intro to how this normally works.
But you can email me the complete scan images of the book (ideally including covers and blank pages—a scan of only part of the book is not very ideal at all) at my username at and I'll make them into a file that we can use to scan back your edition of this book and set it up at Wikisource. A Google Drive/Dropbox link or similar is fine too if the images are too big to email.
Thanks for contributing to Wikisource and I look forward to helping you realise your goal. Inductiveloadtalk/contribs 22:14, 20 March 2021 (UTC)[]
Sorry, I'm a bit of a tech idiot and I don't really understand what you have said, but I have uploaded the Appendix 2 of the Mayo book (six pages) and you can find them here: [4]. However, the entire book is available online here: [5] with the exception of the Appendix 2. I would simply say that my daughter now has my copy of the book and I couldn't possibly ask her to scan the entire book, when it is already available at HathiTrust. If that has to happen I'm afraid I will probably have to abandon this quest! That will be a great pity as two of the most important letters written regarding The Asgill Affair are the only element comprising Appendix 2. Thanks for your email address, but now I have uploaded the 6 pages to Wikimedia I imagine you will be able to find them there? I don't know who does the proofreading, but I have typed up the entire Appendix 2 here: Please remember that my notes regarding the editions which do and don't have an appendix are important, as is also my notes regarding the different spellings of Asgill's name and the totally incorrect particulars of Greville's name. [User:Arbil44|Arbil44]] (talk) 00:44, 21 March 2021 (UTC)[]
Book added at Index:General_Washington’s_Dilemma_(1938). I'll try to merge and split asap. Languageseeker (talk) 01:34, 21 March 2021 (UTC)[]
Thank you. I think the page numbers are labelled incorrectly. I had some trouble with this during the upload process. My apologies for that. Arbil44 (talk) 08:31, 21 March 2021 (UTC)[]
Please note: This appendix appears in the London: Jonathan Cape, 1938 edition, pp.263-268, and in the New York/London: Kennikat Press 1970 edition, pp 263-268, but NOT in the New York, Harcourt, Brace & Co., 1938 edition, but all other pages of that edition appear in the online edition.Arbil44 (talk) 09:28, 21 March 2021 (UTC)[]
All the various issues (where the appendix is and is not - who Asgylle and Asgyle really is - and the bad transcription by Earl Spencer with all details regarding Greville) are all covered in my Sandbox 4. Arbil44 (talk) 09:38, 21 March 2021 (UTC)[]

Languageseeker please could you correct the publisher's name, because Harcourt Brace is wrong. That edition is the one which does not have an Appendix 2. Arbil44 (talk) 09:09, 23 March 2021 (UTC)[]

small caps names within italic text[edit]

Esme Shepherd (talk) 16:34, 20 March 2021 (UTC) I have been formatting many dramatic instructions that are in italics except the character names, which are in small caps. The usual form is therefore 'text in italics' Name 'text in italics', there being spaces between the text blocks and the name. Mostly, this works fine but sometimes the result is 'text in italics'Name'text in italics' without spaces. I don't know why this is so, and all I can do is format it as 'text in italics ' Name ' text in italics' to provide the spaces. Is there any rationale that differentiates these cases or is it just random?[]

@Esme Shepherd: Can you give a link to a page where it happens? --Jan Kameníček (talk) 17:46, 20 March 2021 (UTC)[]

Esme Shepherd Esme Shepherd (talk) 10:17, 21 March 2021 (UTC) It isn't easy to spot them retrospectively, so I will post you the next one I find. The Exeunt at the bottom needs separating from the character name.[]

Pardon if I have misunderstood, but there is an optical effect created by the slope of italic text. The example given 'looks' fine to me, there is a space. CYGNIS INSIGNIS 12:53, 22 March 2021 (UTC)[]

Esme Shepherd (talk)Yes, I have put a space where one is not usually required. It may have something to do with the italics but compare the following page 'Re-enter Leonora' (no space here). There is also a longer passage on without spaces, where a character name is preceded by an italic t. Also sometimes, the word following the character name needs a space before it. I haven't located an example of this yet.

Esme Shepherd (talk) 19:33, 22 March 2021 (UTC)Okay, I think we are working towards eliminating the problem. It was just a puzzle and annoyance. I still don't understand why, but at least I can overcome it! Thank you.[]

@Esme Shepherd: There is no need to add any extra spaces, I have removed the extra space that you added to Page:Dramas 1.pdf/284 and the result is as expected. Such extra spaces should definitely not be inserted. If you do not see the space, it can be caused by the effect described above by Cygnis insignis. This effect can be stronger in some browsers than in others, but it is not the reason to add any extra space which does not belong there. --Jan Kameníček (talk) 21:19, 22 March 2021 (UTC)[]

Esme Shepherd (talk) 19:31, 23 March 2021 (UTC)All spaces on proofread pages have now been removed and the rest will soon follow. The closing up still appears sometimes on transcluded pages and doesn't look good, but the spaces are confirmed by 'copy and paste', so I'm happy with that.[]

Importing from PGDP[edit]

The following discussion is closed:
We stepped into this one with perhaps more enthusiasm than sense, so best let this lie for now.

I was wondering if there is a way to import a project from PGDP to Wikisource. The works on PGDP have images and corrected text with formatting. I know that the formatting will need to be wikified, is there a tool to do this? Or is this a request best posted on Phabricator? Languageseeker (talk) 16:07, 26 February 2021 (UTC)[]

@Languageseeker: Please don't. Project Gutenberg texts are not generally of any particular edition of a work (amalgamations of multiple editions in some cases), and their transcribers sometimes "innovate" in various ways (modernised or americanised spellings, for example). Works here should generally start with uploading a scan and then proofreading against that scan; and the raw OCR in the scan will usually be a lot better for that than the PG text. If you don't care about the fidelity of the text, why not just read it on PG directly? --Xover (talk) 16:23, 26 February 2021 (UTC)[]
Totally with you on account of Project Gutenberg. I would advocate for a removal of all texts from Project Gutenberg. However, it's not Project Gutenberg, but Distributed Proofreaders that feed into Project Gutenberg. They have proofread texts with formatting and the original images. So, we would get a proofread text that we could compare to and validate them against the original image. See, for example, [6] (login required) Languageseeker (talk) 17:31, 26 February 2021 (UTC)[]
@Languageseeker: My apologies. I have obviously not been entirely clear on the distinction between PG and DP. Having a quick look at their guidelines it appears at least mostly compatible with our practices, so they could certainly be one source of text for us (provided what they actually output matches the guidelines, which I haven't checked). We'd need to find some technical way to import page by page to a scan hosted here so we can run our own Proofreading (just with a better starting point than OCR) and to make sure our texts are validatable to that scan for our readers. Possibly a mechanism akin to Help:Match and split, and it would probably require DP to have something API-ish that we could consume, but overall it should be feasible. --Xover (talk) 18:53, 26 February 2021 (UTC)[]
@Xover: Created a phabricator task. Hope it gets done. Languageseeker (talk) 20:27, 26 February 2021 (UTC)[]
@Languageseeker: this is an interesting idea, but an importer would almost certainly be done as an external tool that constructs a matching DJVU file from page images, feeds data in over the MediaWiki API, and then uploads the pages. I do wonder if it can be fully automated. The biggest worry so far, after logging in and sniffing around a bit, is that I cannot find page images for the "complete" works, nor a reliable link to something like the IA.
Also, I'm rather jealous of their velocity, even with such a huge number of review stages, they're clocking 140 works a month.
The other problem is that they do not format works to our level, for example "--" instead of "—", capitals, not small caps (the do mark this up), no centering, no sizing, etc etc.
On the subject of Phabricator, I've recently been wishing for a way to track enWS tasks, since they often have dependencies. Does anyone know if we can use Phabricator for that? Can we ask for a project? For example "move {{header}} to module logic". Inductiveloadtalk/contribs 20:53, 26 February 2021 (UTC)[]
(e/c)At present, the instructions for Match&Split specifically exclude DP works. However, IF a DP work is based on a single edition and the other criteria are met, then the Match & Split tool is fine. Certainly some of Laverock's contributions were done this way and the EB11 project is also utilising a version of the process. We would still require the normal enWS validation process. Beeswaxcandle (talk) 21:00, 26 February 2021 (UTC)[]
@Beeswaxcandle:, DP provides a file split by page, so you can in theory do better than M&S. However, you do need to figure out where the scan came from (hopefully the IA) and work out the offset (their page 1 is not the front cover) or construct a scan from the DP images, if present. The bigger challenge will be to write a parser for their markup, because it'd be a shame to junk it all. Inductiveloadtalk/contribs 21:30, 26 February 2021 (UTC)[]
So, I made a really shonky script to import a DP page-by-page text file: User:Inductiveload/dp_reformat.js using the magic of regex. It seems to have worked OK: Index:The ways of war - Kettle - 1917.pdf. However, the biggest issues I see is that once DP "archives" a project, the links to the marked-up source are removed from public view, as well as the page images. I'm unsure of why they do this, but it makes it all-but impossible to do a perfect match/split on the work, even if you can hook it up with a matching edition's scan. Inductiveloadtalk/contribs 10:52, 1 March 2021 (UTC)[]
The long and short is that DP does not appear willing to share their archived projects. So, making a tool that is specific to DP makes little sense. However, I still think that it makes sense to create a tool that would allow us to import OCR from a different source or replace the image files. So, I made a different phabricator ticket. Languageseeker (talk) 14:36, 1 March 2021 (UTC)[]
@Languageseeker: as long as you can massage the text into "Match and Split" format, you can already drive mass page uploads directly though the normal Wikisource interface. For the case of the User:Inductiveload/dp_reformat.js, this script will (attempt to) transform raw DP text into split-ready text with as much wikiformatting as it can. I will add some quick docs at User:Inductiveload/dp_reformat. It might not work for every type of DP project (since AIUI, different projects have different formatting standards). Inductiveloadtalk/contribs 15:06, 1 March 2021 (UTC)[]
@Inductiveload: Your script is utterly amazing. I'm astonished. I've used it on several books and it is great. I do have one bug and one suggention
  • Bug: If the offset is negative, you need to type the number first before you can insert a minus sign.
  • Request: Can you make the Index Menu a drop down menu so that the tool could be used on non-English Wikisource pages? For example, for French it would be "Livre" and "Page"
Also, is it possible to redo the match and split if the results are incorrect? I started one for Index:The American encyclopedia of history, biography and travel (IA americanencyclop00blak).pdf and it turns out they removed several blank pages from inside of the book, so I would need to rerun the script. In the past, when I tried to do this, it failed silently.
BTW: Everything from [7] upwards is still available on PGDP. It might be good to do a collective project to add these to Wikisource before the files are archived. Languageseeker (talk) 18:20, 2 March 2021 (UTC)[]
@Languageseeker: I looked at the offset and I think it's a bug in OOUI (phab:T276265).
Re the Index drop down, the namespace names "Index" and "Page" are canonical, so the script should just work at other Wikisources. E.g. s:it:Index:Peregrinaggio di tre giovani figliuoli del re di Serendippo.djvu works, even though the local namespace is "Indice". Let me know if it does not.
As for fixing a bad split, this is best fixed by a bot and admin, otherwise the redirects make a mess. Let me know the range to be moved and the offset and I'll sort it for you.
I am working on salvaging the texts at F2 levels (~1600). Inductiveloadtalk/contribs 19:26, 2 March 2021 (UTC)[]
Haha, you're awesome. Thanks for salvaging those texts. The ones that are posted to PG are archived first, so it probably makes sense to salvage those first. It's such a rich source.
For the problem with the merge, starting with Page:The American encyclopedia of history, biography and travel (IA americanencyclop00blak).pdf/22, the text should be moved +2 pages. So 22 has the text for 24. Languageseeker (talk) 21:15, 2 March 2021 (UTC)[]
I tried splitting a French book and it mostly works except for Modèle:Nop and Modèle:Ch don't work. Languageseeker (talk) 21:15, 2 March 2021 (UTC)[]
@Languageseeker: Move underway for the misaligned pages. In future please be cautious before splitting that the alignment is correct. It is annoying, I know, but if they mess with the pages, thems the breaks.
Re the French templates, I guess it possible to handle the other subdomains, as long as you know what to map each formatting element to. E.g. I think is their "nop". But it will take a little bit of a fiddle to do so. You can also do the replacements in a text editor if you know what you want to replace with.
Also, I wonder where to put the text files - they total over 950MB when uncompressed! Maybe the IA? Inductiveloadtalk/contribs 23:18, 2 March 2021 (UTC)[]
Thanks. I checked a few pages in the beginning and this one tricked me.
The IA might be a good place, or you can batch upload them to Commons. It might be good to store the original OCR for the future. You never know.
A few more markup for your script: [ae] = æ; [oe] = œ; {{...}} = … Languageseeker (talk) 02:19, 3 March 2021 (UTC)[]
@Languageseeker: Commons doesn't accept random zips/txt files, though. Anyway fill your boots:
Re the OCR, I'm not really sure about that, as long as we have the scan, we can OCR to our hearts' contents.
{{...}} is actually a WS template, it's designed to prevent a line break in the middle of ". . ." Inductiveloadtalk/contribs 09:09, 3 March 2021 (UTC)[]
Given DP don't have a publically available archive of their completed projects, and have stated that is intentional, I'm not sure harvesting everything that is available and posting it ... to a publically available archive ... is a great way to make friends! Nickw25 (talk) 08:24, 4 March 2021 (UTC)[]

All, it's important to note the position of the DP in relation to the activity in this space. In summary, DP have stated a view that WS should not use their in-progress texts in line with their community wishes. This is stated by the DP General Manager in the forum discussion on this topic at their site, which can be easily located. It was stated on the first phab ticket above that DP pointed the enquirer to in progress texts rather than archived ones. That was never stated publicly there. I don't know if it was stated privately or not, although it is no longer the position, if it ever was. It's also stated by their administrators in the same forum that the bulk harvesting of texts (presumably the same referenced above) was so disruptive it caused their server to become unresponsive for a period. Maybe unintentional, although destablising other projects servers to harvest information they don't want harvested cannot be the standard for a WMF project. Given this, I'd think it's reasonable for WS volunteers to refrain from harvesting and bulk importing content from DP given their community wishes. Disclaimer that I'm a volunteer at DP as well, and have been for many years on and off. To be clear, I'm a standard volunteer there, as here, and have no more knowledge other than what has been publically posted on their forums. Nickw25 (talk) 02:35, 6 March 2021 (UTC)[]

@Nickw25: I'm somewhat familiar with the situation. Inductiveload archived the project pages and concatenated texts because DP removes the images and text from DP soon after they get posted to PG. It did not cause the server crash. We asked if it was possible to just obtain the concatenated text afterwards and they said no. I'm not sure what the exact issue is. It seems to be a moral/philosophical issue rather than a legal one. They asserted no copyright claim, just a statement that downloading texts is a subversive activity that disrupts the core mission of the site.I'm sure that millions of authors who have their texts lapse into the public domain would like to restrict copying their work with a similar argument, but that is not how the PD works. If the text is an exact mechanical reproduction of a text in the PD, then the reproduction is in the PD. You cannot copyright a PD work. Languageseeker (talk) 02:55, 6 March 2021 (UTC)[]
The issue is not the public domain status of the material. If a physical library has a work that is in the public domain you're not entitled to photocopy it if the library says their policy is no photocopying; you are still subject to the terms and conditions of entrance to the library and any conditions they put on your access to the resource. Given those files require being a member to access; DP is entitled to set the terms and conditions for said access. I've been around there for a long time, and I understand why they have arrived at the conclusion they have -- they've arrived at it many times before this request. Either way, most websites have a fairly dim view of that kind of scraping. While I think it is a shame they aren't open to sharing their outputs more widely, nobody is being forced to volunteer there. Unfortunately such aggressive tactics don't win friends and influence people; they just result in more technical barriers being implemented and tighter T&C's than would have otherwise existed. Frankly, harvesting everything like that is a form of information colonialism in my opinion, it comes from a position of deep entitlement, shows no respect for the community that is there and damages the reputation of WMF projects along the way. If you disagree that strongly, those energies might be better directed to improving WS to learn from the attributes of DP that make it so effective despite not necessarily having all that many more volunteers than WS. Nickw25 (talk) 04:44, 6 March 2021 (UTC)[]
@Nickw25: I read carefully through the Code of Conduct and there is nothing that prohibits the copying of texts to other sites. It only states "Volunteers or guests must not intentionally harm or subvert DP processes or systems." If downloading a concatenated file harms or subverts their website, why do they have that option on every project page? If they want to write a stricter code of conduct, then I would welcome that. I think that volunteers should be allowed to know exactly what terms they are agreeing to. My intention was never to go over there to scrape all their files and I never did. I cannot speak for anybody else. I was hoping to be able to gain access on a case-by-case basis and not a massive batch import. For example, if we decided to add "Waterless Cooking for Better Meals, Better Health," then I hoped that we could ask DP for the concatenated text file and receive it. We might do so because the current version on PG is not disability friendly with dark text on a blue background. Even with the files on the IA, it would take weeks if not months of continuous work to match and split them all. Languageseeker (talk) 05:25, 6 March 2021 (UTC)[]
I agree that their code of conduct could be clearer around their expectations, I suspect that that will now be followed up. The ability to download is a little ambiguous. DP has been around since before Wikipedia; when it was put there I doubt anyone thought to use it for anything other than DP things. That said, in reality, if you quietly downloaded a few projects that were still active that you had some kind of 'personal use' for, it would have very very likely gone under the radar, especially if you were cautious to remove non public-domain elements (i.e. volunteer annotations). My interpretation of their code of conduct is that doing that at scale however, would be frowned upon as I interpret harvesting as subverting systems (they were made for human use). A bit like using the photocopier at work for personal use, the occasional page is OK but don't make a habit of it. One of those tricky to navigate grey zones, and like most small organisations, it takes getting to know the culture + what is actually written to figure out how things work--and the internet can make that more difficult.
Anyway, looking at Waterless Cooking; copyright logistics on that one aside, provided you can source a scan-set (DP do make their scans publicly available when they archive a project) my question would be what is stopping WS from working back from the final PG file? The PG file has the page numbering embedded; and the text could easily be matched up copied and pasted into WS (or a tool could be developed to extract based on the embedded page numbering). The transcribers note acknowledges a couple of silent corrections; which is what we'd have to keep an eye-out for when touching it up.
I would say, many of the 'reasons against' PG on WS seem outdated, and mainly affect earlier titles in PG's collection. Everything that goes to PG from DP for many years is expected to be from a single edition, doesn't get modernised, generally has a list of corrections made, and at worst has a few silent corrections to obvious errors that WS needs to keep an eye out for. I'd say DP processed files at PG would be the most reusable items from PG for WS given how DP work. It's certainly better than starting with raw OCR! I'd add the DP team have a point that the post processed file on PG might be better, even if a bit more tedious to work with. I've done 4.5k pages in their formatting rounds, and previously Post Processed a handful of titles there. I assure you things do slip through the 3 proofing (transcription) rounds. The PP and SR processes are quite likely to pick up on a number of those. Nickw25 (talk) 10:16, 7 March 2021 (UTC)[]
@Nickw25: I agree entirely, and I think you put the issue forcefully and cogently. In this particular case we've treated DP like some faceless entity rather than as a community with a culture and values of their own. While we may not have done anything wrong in any formal sense, we've acted hastily, and quite possibly brashly, without sufficient concern for a kindred project and its community. In comparable circumstances there would be outrage on the Wikimedia side.
There are probably some irreconcilable philosophical differences between our two projects (I suspect we're in The Cathedral and the Bazaar territory) that may make close cooperation effectively impossible. But we share similar enough interests and goals that we really should be cooperating wherever possible, and coexisting in an environment of mutual respect. That probably means we need to start by building some cross-cultural understanding so that we have a basis for dialogue, which in turn might let us identify what we actually agree on (which is probably a lot more than what we disagree on). --Xover (talk) 09:14, 6 March 2021 (UTC)[]
@Xover: I wouldn't entirely agree with this statement. The original proposal was to write a parser to allow for the importation of PGDP text files on a case-by-case basis. Then, we found out that DP remove the text files and images about three months after posting to PG. The immediate question became would it make sense to write a dedicated parser if we cannot get access to the files. So, I asked my friend, who is a DP volunteer, if she could find out if we could get access to the files. The basic answer was absolutely no under any circumstance; DP never has and never will share its in process files with anyone. PG is all you get. At that point, I believe that Inductiveload independently decided to archive the text files so that we could have more time. Since I was worried about copyright, I asked my friend to find out if the files are under copyright. DP would not give her a direct answer.
I don't think that time would help at all. DP simply believes that its sole purpose is to create ebooks for PG. When it comes to user contributions, the idea is that they belong to DP and, more specifically, the site administrators. They characterized wikisource as project that produces inferior ebook that will never meet their standard of quality. I'm not happy about the outcome or how the conversation went, but they left no room for dialogue. Languageseeker (talk) 02:31, 7 March 2021 (UTC)[]
I'd agree Xover that I hope at some point in the future there can at least be some mutual understanding. I'd never say never languageseeker. Anyway, a few points on all the discussion above:
  • Above Langugageseeker stated DP said they won't provide access under any circumstances. As such I think we just need to draw a line under it, respect their decision and wishes and move on, even if we disagree. As such I think it is important we don't bulk import items that were harvested. To do so risks diminishing WS's reputation in the DP community; but also at Project Gutenberg (where there is volunteer crossover) and potentially more broadly--it's a small world (even if most of us can't go anywhere right now!).
  • From what I can tell DP only shut the conversation down only after it became apparent that the bulk harvesting of texts had occurred against their wishes. I'd suggest that is really what crossed the line with them. It's an unfortunate series of events, although, regardless of who did what or how we got there, it was unsurprisingly interpreted as related and determined as a bad faith move on their part. I suspect they don't want to hear from anyone claiming to be from WS for a while.
  • As background, to understand the culture of DP, you need to consider them akin to a small local all volunteer community group in your neighrborhood and approach them as such. Anyone who has spent some time perusing their forums would be aware they are a very small organisation, overseen with a traditional governance structure (i.e. a board), run on a shoe-string budget of a couple of hundred dollars a month, and kept going by a very small group of people that have basically made it their FT job. There is no WMF on standby to take care of a whole bunch of foundationals or step in if required; and that no doubt contributes to their independent mindset. PG is similar. Even so, the community votes in the Board and has a say in the leadership and strategic direction. While some folks there might like them to be more open, it is clear, in WikiSpeak, that the Community Consensus on DP is they produce for PG and they are happy with that scope for now. There has been a lot of community upheavel at various points in their 20 year history. At the moment their community is settled and calm. As such, I suspect DP's leadership group are equally prioritising community harmony - as the leader of their community should. One good way to do that is keep the mission clear! As with small community groups change doesn't come quickly, and rarely happens because an outsider turns up. Such change takes many months, if not years, generally has a strong and trusted advocate within the organisation, and is quite complex and doesn't always succeed. It certainly doesn't happen inside of a week. Nor need it, DP and WS have both been doing their thing for the better part of 2 decades, there is no reason for haste. Nickw25 (talk) 10:16, 7 March 2021 (UTC)[]
They shut down the conversation before the harvesting happened. It was precisely their unwillingness to share the archived works that led to the bulk harvesting. I don't want to be specific because the conversations were private, but they claim ownership of the work of their volunteers and will not share any archived material. In the end, my own analysis is that they are afraid that Wikisource will destroy their community by attracting away volunteers. They have spent decades building the community and they don't want to lose it. Which I completely understand and accept. They want to be left alone and I'm leaving them alone. I don't think that wikisource should approach them for help in the future. Languageseeker (talk) 02:32, 8 March 2021 (UTC)[]
Appreciate you've come to your conclusions, although I would not agree after both being around for this long they are threatened by WS. At any rate, the issue is now not what happened, the legalese or interpretation of their T&C's, what misunderstandings took place or why they may not like WS. In the long term what WS should not do is engage DP the way that occurred on this occasion. We must expect genuine collaborations take time, be based on trust + mutual benefit and start from a place of curiosity, rather than just WS just wanting something.
The central issue now is the ethics of using that material. I've previously stated my personal expectations are WMF projects uphold the highest standards. You state they have asserted ownership and they've otherwise stated WS does not have permission to use. The raw download files, as is, contain material that is quite arguably not in the public domain. If I'm not mistaken your edit history shows you continue to import their files in bulk as recently as a few hours ago? I'm curious do you intend to continue given all the above? Given the potential damage to WS's reputation and that DP are unwilling collaborators in this instance, where is the community consensus to proceed, especially given this course of action does not align with WMF values as far as I can tell? Those values include statements like we will be "caring neighbors" and "humbly learn from our mistakes". . I'd respectfully submit it is time we do those two things in relation to this matter. There is much work to do on WS -- why persist with something that has the potential to be so damaging to our community? It's not a race to see who can get the most books, and the way we do things is just as, if not more, important than what we do. Nickw25 (talk) 11:13, 8 March 2021 (UTC)[]
@Nickw25: I think it is not overstepping if I suggest that the community at Wikisource now acknowledge that we've acted a bit of a bull in a china shop here, and that this can beneficially be conveyed to the DP community. We seem to have several philosophical differences that makes not just cooperation, but even just dialogue, a challenge. But, ultimately, I think both projects share a fundamental goal of making these works available; and both projects value accuracy and quality in the texts we produce. That, to me, seems like enough of a foundation to build on. Let's try to at least keep the lines of communication open and eyes open for any issue where our interests might align (increasingly draconian copyright, as one obvious shared concern). If someone that volunteers on both projects feels like acting as a bit of an informal liaison I think that would probably be a very good idea. --Xover (talk) 15:58, 8 March 2021 (UTC)[]
I am sorry for my actions in over-eagerly downloading the text. I have removed the public IA item. I didn't intend to cause such a problem, it was a "I wonder if I can" exercise that got out of hand. It was rude and thoughtless to proceed at such a rate. Inductiveloadtalk/contribs 13:36, 8 March 2021 (UTC)[]
I apologize as well. I never meant to cause the community any harm. I truly hoped that we could establish some form of collaboration, but they merely want to be left alone. Apologies were extended to the senior leadership of PGDP who consider the matter resolved. Languageseeker (talk) 13:49, 8 March 2021 (UTC)[]
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. Xover (talk) 14:12, 30 March 2021 (UTC)[]

Project to Match and Split OCR from Distributed Proofreaders[edit]

The following discussion is closed:
We stepped into this one with perhaps more enthusiasm than sense, so best let this lie for now.

Distributed Proofreaders has several thousands projects to proofread and correct texts against a single-edition work. Most of their projects derive their scans from IA or Google. However, they archive they scans after posting to Google. The project is to download the proofread text, match them with their appropriate scan, and post it here. Inductiveload has created a awesome tool that makes it possible to preserve much of the formatting and easily get the text into a format ready for match-split.

Here are the requirements

  1. Install User:Inductiveload/dp_reformat.js

Project Instructions

  1. Pick a project from one that Inductiveload archived at [8].
  2. The "Concatenated Text File" in the zip file and the the Project Description is in the matching HTML.
  3. See if the Project Description gives the original location for the scans. If it doesn't, you'll need to manually match the work.
  4. Create an index file for the work and make sure to create page numbers.
  5. Go the the Sandbox. Select Edit and paste in the text from the text file inside of the zip that you downloaded from Distributed Proofreaders.
  6. Select Reformat DP text in the Tool section of the left side bar.
  7. You will need to paste in the Index name for the work and calculate the offset. Distributed Proofreaders deletes the first few pages, so you will need to do a bit of math to get the correct number.
  8. Either select Show Preview or Publish Changes. Verify the pages. Distributed Proofreaders sometimes remove blank pages from within the work and you will need to check a few pages to make sure that the offset is correct.
    1. If their are multiple offsets, you will need to do the merge and split in stages.
  9. A tab called "Split" will appear next to discussion. Select discussion. If you want to verify that your split started, visit [9]
  10. Once you have done this, record that you have imported the work at User:Languageseeker/PGDP.

Disclaimer, this is a personal project and has no official sanction. Just asking for help from the community. Languageseeker (talk) 23:25, 2 March 2021 (UTC)[]

Follow-up note, Inductiveload's script will work on non-English wikisources, but the markup will need to be manually updated.

Are the downloaded texts coming from Gutenberg? We have had poor success with their "proofread" texts. They often mix-and-match editions, have modernizations, or other problems that make them incompatible with Wikisource standards. --EncycloPetey (talk) 00:09, 3 March 2021 (UTC)[]
No, Distributed Proofreader takes a single source usually from the IA or Google Books, proofreads the books against the source images, and then processes them and posts them to PG. They are strict to match the source text to the image prior to posting to PG. Before they post to PG during post-processing, they sometimes correct errata and deviate from the source text. The files on Distributed Proofreaders match the source text. You can take a look at the couple of books that I matched-and-split from Distributed Proofreaders as an example. The major thing that is lost in the Distributed Proofreader sources is the header and footer, but this is easier to add-in that proofreading the text. Languageseeker (talk) 01:08, 3 March 2021 (UTC)[]
Just commenting on the concerns about PG. Many (most?) of their texts now come from DP, who expect them to be a single edition. Most of DP's final output will list the changes made, and many also state if further silent changes were made. PG has come a ways from the days of refusing to acknowledge which edition something was prepared from (as I understood they did). There is certainly a subset of projects from PG that are lower risk from a WS perspective. They'd have been processed at DP (identifiable in the PG credits line), have an easily identifiable scan set (sometimes referenced in the PG text, otherwise can be traced back via the DP project comments, or a bit of investigative work) and have a transcribers note that silent changes were not made. Nickw25 (talk) 02:41, 6 March 2021 (UTC)[]

Also posting here to link to my comments above, that DP do not want WS using their in-flight works for this purpose. See that post for more detail. Nickw25 (talk) 02:45, 6 March 2021 (UTC)[]

Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. Xover (talk) 14:12, 30 March 2021 (UTC)[]

3 versions of Pratt's History of Music[edit]

There is an unsourced and incomplete version in the Main The History of Music and two transcription projects: Index:Pratt - The history of music (1907).djvu and Index:Pratt - The history of music (1907 Preface Variant).djvu. If The History of Music could be moved to The History of Music (Pratt, unsourced), I can take it from the redirect. Thanks.--RaboKarbakian (talk) 17:08, 7 March 2021 (UTC)[]

Pratt - The history of music (1907).djvu is the original printing; Pratt - The history of music (1907 Preface Variant).djvu is a later reprint of the 1907 edition with a new preface, list of deaths after the appendix, and the removal of blank plages. The transcription is probably sourced from Pratt - The history of music (1907).djvu, but, outside of the images, I'm not sure how much value it contains as it was last edited in 2007 and the text in Pratt - The history of music (1907).djvu comes from a good source. Languageseeker (talk) 17:36, 7 March 2021 (UTC)[]
They have a {{versions}} here. I was told to ask an admin to move things due to clean-up of redircts being easier with some tools they have. There is a broken, unfinished version in Main that needs moving. So, I ask here for that.--RaboKarbakian (talk) 15:53, 8 March 2021 (UTC)[]
@RaboKarbakian: Just transcribe, and when you have the pages, just transcluded into the existing pages. No point in moving an unsourced, unfinished work, we just overwrite with something verifiable. — billinghurst sDrewth 09:58, 11 March 2021 (UTC)[]
@Billinghurst: I would have suggested to delete the 2007 work that has nothing to transcribe as I understand the definition of that word here. But the way of the sourcerers, usually, is to move it from the Main namespace. I might be completely confused, so perhaps you can provide a link to where it is to be transcribed at and I will work at it....--RaboKarbakian (talk) 14:21, 11 March 2021 (UTC)[]
@Billinghurst, @RaboKarbakian: Seconded, their is a sourced copy with good text to replace the unsourced copy. Languageseeker (talk) 14:30, 11 March 2021 (UTC)[]
I am presuming that the name is okay, and that the chapter structure is okay, so just transclude in place. There is no requirement to delete, just replace. Nothing is gained by deleting; and nothing is gained by moving an incomplete work that will never be completed`. — billinghurst sDrewth 07:53, 12 March 2021 (UTC)[]
@RaboKarbakian: Do you still need assistance with this or has it been resolved? --Xover (talk) 14:07, 30 March 2021 (UTC)[]
@Xover: as far as I am concerned, I left this to billinghurst's competent management....--RaboKarbakian (talk) 14:23, 30 March 2021 (UTC)[]

Tom Urton[edit]

The following discussion is closed:

I happened to read one of your letters from Tom Urton, who lives in Norton, England with great interest. I live in Santa Rosa, Sonoma Co, California, and my great-grandmother was Kate Elizabeth Urton. It is not a common name. A few years ago we went to Norton and to St James Church, but we could not go inside. I am a genealogist and have a lot of information about the Urtons, but I am stuck about 1470. Tom, I hope you see this note! I sent a letter to you last December, but it was returned the first week of March and said the address was incorrect. I used the address you posted with your note on Wikisource. I am afraid to write my address/email here because it would be available to everyone in the world. I hope you will write again so that I can see your address or an address that you use. I live in Spring Lake Village, Santa Rosa, CA 95409. Suzanne (Tharp) Guerra

Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. Xover (talk) 13:49, 30 March 2021 (UTC)[]

Index:A discovery that the moon has a vast population of human beings.djvu[edit]

The following discussion is closed:
Request appears to have been otherwise resolved. @ShakespeareFan00: If you still need this doctored in some way feel free to hit me up on my talk page.

Can someone trim this down? It seems there are some additional clippings in it, that aren't part of the original. ShakespeareFan00 (talk) 22:21, 25 March 2021 (UTC)[]

Why? Just mark them as no text and move on. Creating work for others where there is no value, and an easy solution. — billinghurst sDrewth 14:50, 27 March 2021 (UTC)[]
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. Xover (talk) 11:22, 30 March 2021 (UTC)[]

Help: Need to move Index and its Pages because of a spelling error[edit]

The following discussion is closed:
All pages, file, and index migrated to new name.

Years ago, I copied and pasted OCR text for the Index name, and I missed a typo in the name. The scanned "g" in Egypt ended up as the letter "q". Index:On the Desert - Recent Events in Eqypt.djvu. I would like to save rather than delete and re-install, but don't know how to move the index and pages. — Ineuw (talk) 06:36, 27 March 2021 (UTC)[]

Doing... --Xover (talk) 13:20, 27 March 2021 (UTC)[]
Yes check.svg Done File rename requested at Commons, and a temporary redirect established. @Ineuw: --Xover (talk) 14:33, 27 March 2021 (UTC)[]
I have moved the file at Commons, though I wonder why we even bothered. There is no need, and it creates a lot of work for next to no value. — billinghurst sDrewth 14:48, 27 March 2021 (UTC)[]
@Xover: Many many thanks for taking the time out and correcting my error. I should be able to do it by now. What tools do you use to rename the pages? — Ineuw (talk) 19:02, 27 March 2021 (UTC)[]
@Ineuw: Pywikibot has built-in support for moving pages, you just need to massage up a list of from->to page names. --Xover (talk) 19:04, 27 March 2021 (UTC)[]
@Ineuw: I also have a handy script for exactly this purpose: User:Inductiveload/Scripts/Page shifter. Inductiveloadtalk/contribs 20:32, 27 March 2021 (UTC)[]
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. Xover (talk) 11:16, 30 March 2021 (UTC)[]

Index:The star in the window (1918).djvu has problematic pages, need to be replaced[edit]

The following discussion is closed:
All pages migrated to a new index.

I need some help, because pages 286 through 293 of The Star in the Window are missing for some reason, and are just a bunch of copies of page 284 and 285.. This would be the entire content of Chapter 31.

The pages on the DJVU need to be replaced without damaging the Index page. This DJVU came from Internet Archive, but there is another version at Google Books, which is located here: File:The Star in the Window (Grosset & Dunlap).pdf. The pages 286-293 are correctly shown there, and the actual book content between the two versions is exactly the same (trust me).

I have no idea how to do this, so could someone please replace the problematic pages at Index:The star in the window (1918).djvu with the same correct pages from File:The Star in the Window (Grosset & Dunlap).pdf? Thank you! PseudoSkull (talk) 17:16, 27 March 2021 (UTC)[]

Pages will be proofread as normal from the other scan but are marked as problematic for now until this is fixed. PseudoSkull (talk) 17:19, 27 March 2021 (UTC)[]
I'll upload the Harvard copy of the Stokes version later. Languageseeker (talk) 20:53, 27 March 2021 (UTC)[]
New version at Index:The Star in the Window.pdf] Languageseeker (talk) 03:13, 28 March 2021 (UTC)[]
@Inductiveload, @Xover: Could one of you move the text from Index:The star in the window (1918).djvu to Index:The Star in the Window.pdf. The offset is -1. Languageseeker (talk) 15:05, 28 March 2021 (UTC)[]
Yes, please. I have tried myself. PseudoSkull (talk) 15:23, 28 March 2021 (UTC)[]
I also made this request at Commons to convert that to a DjVu and keep the index we already made. If migrating the page contents is what we must do instead I can do that much. But it will take a damn long time... PseudoSkull (talk) 15:25, 28 March 2021 (UTC)[]
Migration in progress. PseudoSkull (talk) 16:04, 28 March 2021 (UTC)[]
I aborted my migration due to User talk:PseudoSkull#Please do not bulk move Page: namespace pages. @Xover, @Inductiveload, @Billinghurst, @Jan.Kamenicek: Any of you can migrate the rest in whatever way you wish, since you are the ones with more efficient tools. My bot got to Page:The_star_in_the_window_(1918).djvu/89. None of the pages in the transclusion (except in the front matter) have been migrated to the better scan. PseudoSkull (talk) 22:18, 28 March 2021 (UTC)[]
The migration needs to be from Index:The star in the window (1918).djvu to Index:The Star in the Window.pdf. PseudoSkull (talk) 22:20, 28 March 2021 (UTC)[]
More info: Only Chapter 1 up to Chapter 30 need to be changed to reflect the correct scan, since I'm proofreading the rest of it on the correct scan. PseudoSkull (talk) 23:29, 28 March 2021 (UTC)[]
@PseudoSkull: Yes check.svg running ; pages are being moved, and the transclusions are being updated. And has been mentioned previously, please do not do titles as hard links
| title = [[The Star in the Window (Stokes)|The Star in the Window]]
please do relative links like
| title      = [[../|The Star in the Window]]
Thanks. — billinghurst sDrewth 02:28, 29 March 2021 (UTC)[]
@Billinghurst: Ah yes, thank you for doing that. And I apologize for repeating a previous mistake, I must have copied from elsewhere and not have seen that error. I will try and do differently in the future. PseudoSkull (talk) 02:32, 29 March 2021 (UTC)[]
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. Xover (talk) 11:15, 30 March 2021 (UTC)[]

Paragraphs with a left margin across page-breaks[edit]

I have come across numerous instances where a paragraph that is left indented runs from one page to the next. There does not seem to be any way of running this paragraph together on transclusion. Am I correct in this? see, for example: and the subsequent page. Esme Shepherd (talk) 10:23, 31 March 2021 (UTC)[]

In this case, use the "split" templates {{left margin/s}} and {{left margin/e}}, just like the {{block center}} equivalents. These are undocumented, so it's hardly surprising you didn't find them! Inductiveloadtalk/contribs 11:16, 31 March 2021 (UTC)[]

Thank you, that's brilliant! I had experimented with this, but I must have had the formulation wrong! Esme Shepherd (talk) 09:55, 1 April 2021 (UTC)[]

Black Beauty, versions and translations[edit]

No big deal as I put them on Author:Anna Sewell, but there is a translation and a film of Black Beauty (silent though). There are other versions with different illustrators, but I haven't compiled that list.

Maybe Black Beauty could be moved to Black Beauty (first edition)? Or not. Just let me know.--RaboKarbakian (talk) 16:17, 31 March 2021 (UTC)[]

Another perfectly good option is to tell me that I was given bad guidance and allow me to move it myself. I am annoyed to be here asking, which often means that I am being annoying.--RaboKarbakian (talk) 17:18, 31 March 2021 (UTC)[]
Labeling something as "first edition" can be problematic, as there can be a first book edition, first magazine edition, first paperback edition, first edition in the US, first edition in the UK, etc. It is much better to use the date and/or publisher and/or place of publication to identify the edition rather than an edition number. --EncycloPetey (talk) 17:37, 31 March 2021 (UTC)[]
And the items you added to Author:Anna Sewell were not written by Sewell, so they should not be placed on her Author page. Nor should you link in an Author page to a Wikipedia article about the work she didn't write from a title that would be expected to link to the work itself on Wikisource. --EncycloPetey (talk) 17:45, 31 March 2021 (UTC)[]
@RaboKarbakian: First, you only need to ask for help in moving pages if you are uncertain about the correct page names etc. or there are a large number of pages involved (think "more than about five" as a rule of thumb). In the latter case both because with a large number of pages any messes are also going to be large and because it is much easier to get an admin to do the move than to go back and clean up redirects etc. Black Beauty is a case where it's a good idea to ask for help for both those reasons. So I think the advice you were given that led you here in this case was very good.
Because… I'm not sure any move is warranted here. We generally don't preemptively create versions or disambiguation pages (yes, there are exceptions), and so far as I can see we currently only have one work with that name and one edition of that work. Once we have an edition of Black Beauty Retold in Words of one Syllable ready to transclude we might want to look at what to do, which might be a disambiguation page or might be to put the latter at Black Beauty Retold in Words of one Syllable. If it were concluded to use Black Beauty for both, the original would live at Black Beauty (1877) and the other at Black Beauty (1905) (or, often, disambiguating using the author's last name because we usually don't have multiple editions of the same work). --Xover (talk) 18:10, 31 March 2021 (UTC)[]
@EncycloPetey: Everything you said was true, although, I have been treating "one syllable" works as translations (as others are here). Everything you removed from Anna Sewell belongs at Black Beauty which is the reason I am here.--RaboKarbakian (talk) 18:29, 31 March 2021 (UTC)[]
@Xover: The Main space name is interesting, in that it is a pain. At commons, another sourcerer was naming cats: Title (YYYY, publisher) which I started to follow there, leaving Title open for all editions, or Title (Author) for problem titles, or Title (YYYY, Author) for the prolific and revisionists. Whatever name you (all) think works will be just fine.--RaboKarbakian (talk) 18:29, 31 March 2021 (UTC)[]
@EncycloPetey: What author page linking? I am confused.--RaboKarbakian (talk) 18:31, 31 March 2021 (UTC)[]
I am referring to the links you incorrectly added as part of the discussion you started. You placed links on a page where they should not have been placed. If you need to keep notes, you can place them in your User space. Or you could place them on the Talk page for Black Beauty. But please do not place works by one author onto the Author page of a different author. --EncycloPetey (talk) 18:42, 31 March 2021 (UTC)[]

Transclusion not wiki-formatting heading[edit]

Page:CTSS programmer's guide.djvu/53 is fine by itself, but not when transcluded on Compatible time-sharing system: A programmer's guide

Thanks, Phillipedison1891 (talk) 15:03, 1 April 2021 (UTC)[]

Nevermind, was able to fix it. Phillipedison1891 (talk) 15:04, 1 April 2021 (UTC)[]

Index:Hans Holbein the younger (Volume 2).djvu[edit]

Was looking through this, and found some 'bonus' images and other ehpemra in the scans..

I've marked the file as problematic, so that a further discussion can be had here.

The images look like they are of Holbien (or similar-era) paintings (so PD-art). If they can be identified it would be reasonable to retain them..

However the copyright status of the ephemera is unclear. Do I mark the ephemra for blanking given the unclear status? (it's also not clear if they are contemporaneous withe the rest of the book.)

Example : News clipping of unknown date Page:Hans_Holbein_the_younger_(Volume_2).djvu/30 ? ShakespeareFan00 (talk) 22:46, 18 March 2021 (UTC)[]

@ShakespeareFan00: If it's not part of the book as published then mark the pages as without text. If they are additionally of unclear or dubious copyright status then flag the specific pages and I can excise them. Just looking at the index it wasn't clear to me which pages this was concerning. --Xover (talk) 13:48, 30 March 2021 (UTC)[]
I think you are both overthinking this. Like I did with my failed 9 page djvu file. Look at the publication date. — Ineuw (talk) 00:24, 3 April 2021 (UTC)[]

Setting up Merge and Splits for The Complete Works of Geoffrey Chaucer[edit]

I just uploaded the scans for all 7 volumes of Author:Geoffrey_Chaucer#Collected_works. The first 6 volumes have text from Gutenberg done by PGDP. For that reason, they have page numbers. Is there anyway to merge-and-split these texts? Languageseeker (talk) 01:45, 2 April 2021 (UTC)[]

Missing page images of a linked djvu file?[edit]

I created this eight page article as a .djvu file which displays correctly in my desktop DjVu app. But here, the page images are not showing, but the text layer is re-created with the OCR. — Ineuw (talk) 04:18, 2 April 2021 (UTC)[]

Page images are missing and OCR error[edit]

Installed this 9 page document. The page images are missing, but the OCR succeeded, except on the last page on which OCR generates an error. Whenever someone has the time, please look at what's wrong. Thanks. — Ineuw (talk) 13:09, 2 April 2021 (UTC)[]

IOError: (invalid url?)
@Ineuw: That file claims to have a resolution of 19,204 × 26,458 pixels (about 10x what's typical), but still only 9.31 MB. I'll dig a bit, but my initial guess is that this file is broken in some way. --Xover (talk) 13:33, 2 April 2021 (UTC)[]
Uhm. How did you extract the 9 pages? And for that matter, why? You can proofread and transclude only those 9 pages even if the file and index contain many more. --Xover (talk) 13:35, 2 April 2021 (UTC)[]
Definitely a funky file. It's got indirect chunks, looks to put the text layer in annotation blocks, and claims to be an insane resolution. What tool created this file? I'll try to generate a DjVu of the whole volume, but it'll have to wait until later today or tomorrow. --Xover (talk) 13:42, 2 April 2021 (UTC)[]
@Xover: Please don't waste your time. I will try it again in a different way to learn how to do it. These were made from 9 JP2 pages converted to PNG then uploaded to Convertio to convert to 9 separate djvu pages (I have no offline djvu conversion tool), which was stitched together with djvm in Windows. Go ahead and laugh. :-) — Ineuw (talk) 22:19, 2 April 2021 (UTC)[]
@Ineuw: Regardless of how roundabout that process sounds (happy to provide guidance, but tl;dr if you have djvm you should have c44, which would convert a JPG input directly to DJVU): why not just upload the entire document, which even comes with the OCR? And then it allows proofreading of the rest of The World's Work v. 14 by others. Inductiveloadtalk/contribs 22:59, 2 April 2021 (UTC)[]
@Inductiveload: You are absolutely right. There is no excuse for my approach, except that I was exploring (playing) to see the end results. The djvudump displayed everything that's wrong. So, went back to the drawing board, found c44.exe, as well as the scripts posted on the Wikimedia Commons. About uploading the complete volume. I try not to upload books which I have no interest to proofread, so this seemed to be an alternative and a teaching moment. Only because it's 9 pages.— Ineuw (talk) 00:18, 3 April 2021 (UTC)[]
@Inductiveload, @Xover: I converted the .jpg page images with c44 and then assembled them with djvm. It's about 20% of the previous uploads, but the same problem exists. The text comes through but not the page image. Could you please look at it. — Ineuw (talk) 23:06, 4 April 2021 (UTC)[]
@Ineuw: A big part of the issue is that you tagged the images with Internet Archive identifier : worldswork14gard on Commons which prevents the IA tool from uploading the file. Languageseeker (talk) 23:27, 4 April 2021 (UTC)[]
Thanks for explaining. This is not working out for me. The volume is 700 pages and is not worth uploading in my opinion. So, I will delete it here, and ask for a deletion at the commons.— Ineuw (talk) 23:59, 4 April 2021 (UTC)[]
@Ineuw: I checked the new version of the file and it looked just fine, including showing the images in the Page: namespace. If you're still seeing broken images it is probably a caching issue or similar. The only thing wrong with your new version is that it doesn't have a text layer in the file itself (let me know if you want instructions for adding one: it's complicated and inconvenient, but entirely doable). --Xover (talk) 00:48, 5 April 2021 (UTC)[]
@Languageseeker: And just what in the world does that have to do with anything? --Xover (talk) 00:48, 5 April 2021 (UTC)[]
@Xover: The IA tool checks if there is a file tagged with {{IA|worldswork14gard}} on Commons. Even if it's an image, then the IA tool will not allow you to upload the file stating that the file already exists. I tried uploading the entire file with the IA tool and the images that Ineuw uploaded to Commons and tagged with the IA link prevented the uploading of the actual book. Languageseeker (talk) 00:55, 5 April 2021 (UTC)[]
@Languageseeker: Yes, that is roughly how the ia-upload tool works. However, as ia-upload was involved nowhere in Ineuw's problem, why are you bringing it up at all, much less framing it as a causal factor for the problems they were having? --Xover (talk) 10:06, 5 April 2021 (UTC)[]
@Ineuw: It's actually easier to manage a single 700 page volume than managing an extracted article. You don't need to proofread the entire thing, just the part that interests you. Languageseeker (talk) 00:46, 5 April 2021 (UTC)[]

@Languageseeker: Thanks for the correction on the commons and will that with future uploads.— Ineuw (talk) 00:50, 5 April 2021 (UTC)[]

@Ineuw: If an uploaded image, DjVu, or PDF comes from IA then the file's information page should definitely contain the IA identifier or another link to IA. ia-upload was designed to avoid duplicate uploads based on an assumption that most works available on IA were not, and probably never would be, uploaded to Commons. That assumption has been turned inaccurate over the last couple of months thanks to a way overzealous bulk upload of as many of IA's PDFs (mostly low-quality, and with awkward autogenerated filenames and the raw IA bibliographic metadata) as the bot could get their paws on (mostly constrained by copyright). This state of affairs most likely means that the ia-upload duplicate checking in its current form is no longer feasible, and will either have to be removed or rewritten to work in a significantly different way. At which point the problem Languageseeker is talking about, and that affects one single specialised uploader tool, will disappear, but we will still need good information about the source of media files on Commons. --Xover (talk) 10:06, 5 April 2021 (UTC)[]

Uploading Large PDFs to Commons[edit]

I don't seem to have a lot of luck uploading large PDFs to Commons. I've tried Chunked Uploader and it does not work. Does anybody have any suggestion? For example, I want to create a PDF for [10]. Languageseeker (talk) 20:29, 2 April 2021 (UTC)[]

@Languageseeker: I use just Upload Wizard and imo it should be able to handle this file too. --Jan Kameníček (talk) 22:22, 2 April 2021 (UTC)[]
Upload Wizard refuses documents over 100MB.--Prosfilaes (talk) 23:16, 2 April 2021 (UTC)[]
@Prosfilaes: that's the Basic Upload - te Wizard goes up to 2GB, I think (it uses chunked uploading). Inductiveloadtalk/contribs 23:27, 2 April 2021 (UTC)[]
Actually, it should be up to 4GB. --Jan Kameníček (talk) 08:30, 3 April 2021 (UTC)[]
@Languageseeker: I've been running into phab:T278104 on and off for a while with API uploading and the upload wizard, perhaps it's that?
On the other hand, this document produces a 55 MB DJVU from the 494MB of Hathi images, so perhaps that's a better way forward? If you must have a PDF and you want to crush it down, JBIG2 encoding the PNGs produces a PDF around 12MB, but I don't have tools to combine the JPGs with the PNGs as a PDF so only the PNGs are JBIG2'd, and I don't have tools to write the OCR into PDFs. Also the PDF is mind-expandingly slow to render compared to the DjVu. Inductiveloadtalk/contribs 23:27, 2 April 2021 (UTC)[]
@Inductiveload: Yep, that's the exact error that I'm getting. I'll just wait until that bug get's fixed. I'm trying to preserve the image quality because of the illustrations. Thanks for your help. Languageseeker (talk) 00:48, 3 April 2021 (UTC)[]
@Languageseeker: The illustrations are already pretty damaged by the Google's compression, so IMO it's not particularly critical (especially as it's way easier to extract the images from the existing JPGs at Hathi rather than from a PDF that another user wouldn't know has or hasn't re-encoded the image). As was said by Nemo_bis in phab:T277921, Commons isn't attempting to compete with Hathi/IA for storage of endless terabytes of "raw" (not that is really is raw, see below) scan images. Because what's the point?
Even then, the 36 JPGs in this file total 35MB, so, on top of the ~12MB of lossless JBIG2-encoded bitonal images, you could still produce a PDF under 50MB, without a byte of data loss from the Hathi scan (except in the Google watermarks). But the PDF will render like molasses, because JBIG2 is very slow to decode. So I'd still suggest going for DjVu, and if the image quality from the default c44 encoder settings is not good enough for whatever reason, you can set that manually. For example:
$ c44 -decibel 50 mdp.39015011058198.0001.jpg page765.djvu
$ ddjvu page765.djvu -format=pnm page765_from_djvu.pnm
$ compare -metric PSNR mdp.39015011058198.0765.jpg page765_from_djvu.pnm diff.png
Which is kind of what you expect since we asked for 50. 50dB of PSNR is really rather good (way over JPG quality=90). In fact, since 255 is ~48dB it's essentially perfect (below the quantization error of the actual 8-bit image, but since the two aren't quite identical I'm obviously missing something). This is the difference map between the input JPG and the 50dB c44 encoding. White means identical.
Which is all kind of moot, because although the Hathi JPGs may be set at Q=95, they're encoding substantial compression noise, probably from before the data ever reached HT, which implies that 95 is far from representative of the paper-to-user Q factor and using a Q=95 level of compression is mostly just a waste of bits:
JPG compression from a Hathi Trust file.png
Striving to store data that's already totally swamped by compression noise is not particularly useful (in the context of Wikisource), IMO. Sure, reducing compression damage at each step is a nice goal, but once the data is trashed to n dB (where n << 50), what are you hoping to achieve by worrying about further lossless encoding. You have to ask yourself what exactly you are trying to achieve, or it's going to turn into a classic w:XY problem. Inductiveloadtalk/contribs 16:41, 3 April 2021 (UTC)[]
I'm trying to make sure that users do not have to go through Help:Image_extraction to crop an images from a file. I know that Google scans are of an inferior quality, but they are often all we have. There is nothing wrong with lossless compression, but lossy compression alters the image. As you know, getting images from Haithi Trust is difficult. So why make users go through extra work?
Yes, DJVU can compress more, but DJVU is no longer being actively developed. It's one major bug away from following the fate of Lilypond phab:T257066. If the security team discovers a major security bug in the DJVU viewer, who will fix it? What about if the code become incompatible with the latest release of Debian? As for JBIG2, it's dangerous to use because it can alter the image, see JBIG2.
I'm not asking to import the entire IA or Haithi Trust, but I want to make sure that the images are of the highest quality because the quality of monitors are continuously improving. A higher quality image will last longer. If the scans come from IA, I don't care because I know that we can pull the scans at any time. For Haithi Trust, I'm not so sure because it already imposes restrictions. Downloading from Haithi Trust at this moment, places Wikisource in National Portrait Gallery and Wikimedia Foundation copyright dispute territory. Languageseeker (talk) 00:16, 4 April 2021 (UTC)[]
FYI, not that I'm saying JBIG2 is ideal (due to the insane decode time making them truly miserable to use on all but the most monstrous CPUs), but jbig2 operates in lossless mode by default (it's lossy if you set -s).
And even if you do just use PNG, remember to make them bitonal first, because the Hathi PNGs are only not bitonal due to the Google watermark. That will save you hundreds of MBs per file. Inductiveloadtalk/contribs 07:11, 4 April 2021 (UTC)[]

Transcriptions of an audio work[edit]

Hello Wikisource editors, we have been publishing (in Apple Podcasts, and the like) and also donating to Wikimedia Commons a podcast series, under the standard Creative Commons Attribution-Share Alike 4.0 International. We are considering creating a Wikisource page with the transcription of those podcast episodes. It seems that Wikisource welcomes transcripts of audio (WS:SCOPE), but more guidance, especially to confirm whether this contribution is within the scope of Wikisource, would be much appreciated. JCPod (talk) 19:52, 3 April 2021 (UTC)[]

Wikisource:What Wikisource includes should give you an idea of what we include. For works published after 1925, the work should meet out equivalent of "notable". Podcasts generally do not meet that criterion, as they do not pass through peer review or editorial controls. --EncycloPetey (talk) 19:58, 3 April 2021 (UTC)[]
@JCPod: while it's pretty unlikely a modern "self-published" work like a podcast meets WS:WWI, I think it sounds like something Wikibooks would allow, since it's essentially a book? I don't speak for them, but you could ask at wikibooks:Wikibooks:Reading room. Inductiveloadtalk/contribs 20:27, 3 April 2021 (UTC)[]
Thank you both for your prompt responses. JCPod (talk) 20:58, 3 April 2021 (UTC)[]

Pictogram voting comment.svg Comment Aside from this specific case example, we need to better look at how we handle transcriptions of audio works, especially progressive transcriptions. Are we going to work in the Index: / Page: ns from a file at Commons, and look to go through the double process of validating. How would we get the snippets of sound into files, etc. We have done something with video, and I think that it is time we looked to better formulate these media types. Needs guidance in Help: namespaces for video and audio files. PseudoSkull would be our current lead exponent. — billinghurst sDrewth 00:13, 4 April 2021 (UTC)[]

Author creation requested[edit]

Can anyone help to create the author page for Bruneian sultan Hassanal Bolkiah? I'm working on his Syariah Penal Code Order, 2013, and other emergency enactments solely made by him. In particular, I'm not sure how to deal with all of those authority control scribble-scrabbles. Many thanks.廣九直通車 (talk) 13:54, 5 April 2021 (UTC)[]

@廣九直通車: Yes check.svg Done See Author:Hassanal Bolkiah. I am uncertain about the best copyright tag to use, so I've stuck EdictGov there for now. --Xover (talk) 19:45, 5 April 2021 (UTC)[]

Looking for help for some hebraic caracters in a french Champollion book about hieroglyphs ![edit]


I'm active in the french Wikisource, and I'm working on a book from Jean-François Champollion about hieroglyphs... In this book, there is THIS PAGE with a text in hebraic caracters... As I'm not good in hebrew langage nor in hebraic caracters, I'm looking for some help to correct the page. Any help would be welcome. Thanks Lorlam (talk) 18:50, 5 April 2021 (UTC)[]

@Ineuw: Is this something you are able to help out with? --Xover (talk) 19:38, 5 April 2021 (UTC)[]
Done. — Ineuw (talk) 19:52, 5 April 2021 (UTC)[]
Many thanks for your help — Ineuw :-) --Lorlam (talk) 21:19, 5 April 2021 (UTC)[]

Paginated text without scan[edit]

Following from this lengthy discussion and others on English Wikipedia w:en:Talk:Sir Charles Asgill, 2nd Baronet#General Washington's Dilemma by Katherine Mayo, Anne User:Arbil44 has transcribed a hard-to-find historical letter at w:en:user:Arbil44/New_sandbox4. It's well out of copyright. Anne has retained the original pagination and headers. Would someone be able to help copy this across to Wikisource with the appropriate page structure? Or advise me how to do it? (For example, without a scan, do we still use the Index: namespace to assemble the pages?)

Note, I don’t want to ask Anne to go back and add a scan, I get the sense that she has become somewhat frustrated in her interactions with Wikipedia and I don’t want to make things worse. So I’m hoping we can accept this as a non-scan-backed text as it is.

Pelagic (talk) 01:34, 6 April 2021 (UTC)[]

@Pelagic: I made the 6 pages into an index: Index:General Washington's Dilemma - Mayo - 1938 - Appendix 2.djvu. I'm not quite sure how it should be transcluded to mainspace, as it's just a fragment of a complete work. Inductiveloadtalk/contribs 01:38, 7 April 2021 (UTC)[]
OMG, I didn't see that Anne had already posted above. Great news that she was able to provide a scan! Many thanks for your help on this, Inductiveload.

Index:UN Treaty Series - vol 1.pdf, etc[edit]

This work and subsequent volumes of the United Nations Treaty Series are in English and French. I see the first volume proofing only English, so I would like to be ask if separate indexes would have to be made in French Wikisource to proofread the French portions. If so, I am making more indexes here to encourage proofreading, but I do not have a reliable OCR.--Jusjih (talk) 05:01, 6 April 2021 (UTC)[]

@Jusjih: frWS would need separate index pages, yes. But they should mostly be able to just copy the data we have here if we have ones they don't already have. And, of course, they can use the same File: on Commons.
What's your problem with OCR? --Xover (talk) 07:35, 6 April 2021 (UTC)[]
Thanks. I wonder if reliable OCR is available online.--Jusjih (talk) 18:04, 6 April 2021 (UTC)[]
Perhaps this site already has OCR when creating page namespace? I just added some well formatted covers of the United Nations Treaty Series, but we will have to mark the year published since Volume 401.--Jusjih (talk) 00:47, 7 April 2021 (UTC)[]

Transcribing directly from webpages (Highway Code)[edit]

Hi, I believe the current Highway Code, published by the British government's Department for Transport, falls under the CC-BY-compatible Open Government Licence and thus would be eligible for inclusion (we already have a 1931 edition and parts of the 2008 Traffic Signs Manual). But how would I go about copying it here? I know scans are preferred for verifiability - would it be appropriate to print the webpages to PDF and upload them to Commons, or is a URL sufficient attribution? If so, how do I create the relevant pages without a scan? --Wodgester (talk) 17:01, 7 April 2021 (UTC)[]

I would "print" web pages into PDFs, upload then to Commons saying that the source web pages have been converted to PDFs, create indexes here, then proofread the pages.--Jusjih (talk) 20:49, 8 April 2021 (UTC)[]
Thanks for the help @Jusjih! I've started an index. --Wodgester (talk) 16:17, 9 April 2021 (UTC)[]
You are very welcome and I see the PDF well describing the tools used.--Jusjih (talk) 01:48, 10 April 2021 (UTC)[]

How to Browse? A Lot of Confusion for a Beginner User[edit]

From the Navigation Sidebar => Help - at the bottom of the Help page

I clicked: "Do you need assistance? Post a request!"

My request is to reduce the confusion for a beginner user to browse, and to improve the browsing experience overall.

I came looking for books about time travel. My objective is to browse by Fiction => Genre, and I expect to be able to choose "science fiction" in a list of genres, and narrow my search further to "time travel" in a subsequent sub-genre list to return a list of all the science fiction books in Wikisource that revolve around time travel. I also want to browse Science => Physics and find time travel in a list to return a list of all science books that discuss time travel, but I did not get that far.

The most impactful improvement would be an enhanced method to create, assign and search Categories. Here are some comments I had as I browsed:

In Help:Beginner's guide to navigation => Browsing

  • Browse by Authors - this is fine, and there is a Navigation Sidebar link to click for Authors - intuitive, consistent and useful.
  • Browse by Subjects - this is confusing. There are portals, and there are categories. Neither term appears in the Glossary of Terms on the Help page. Neither term appears on the Navigation Sidebar. There is "Subject Index" on the Navigation Sidebar that returns a Portals list. My first thought was, "What is a Portal?" Nothing in the portals list says Fiction, Popular Fiction, Genre, or Science Fiction. At a glance, this looked fruitless. Clicking on "Index" at the bottom exposes a hierarchy of portals, and with some exploration, there is a sub-genre for time travel with three books. This is a paltry list, and I do not know if the list just reflects a small collection of books in Wikisource, or poor use of categorization. or ability for a book to belong only to one portal that reflects the dominant sub-genre when there are potentially several that are appropriate.

In Search

  • I next tried to use the Search function for a browsing tool - I searched at the top for "science fiction", and I had three directions I could take - Portals for Science Fiction and also for Science Fiction Films, and for pages containing "science fiction". Are there any Categories for science fiction? I did not see any in the search results.
  • I used the Search function again for a narrower search of "time travel" - nothing bubbled up in the near-in results except for Wells' The Time Machine. The vast majority of results were for time, or travel, which were all unrelated to my search objective.

Portals and Categories

My presumption is that Portals are hierarchical, that a book belongs to only one portal at the bottom of the hierarchy, and that a portal may contain many books (or perhaps as few as one). I frankly do not like the Lib. of Congress classification system, but it is well described and freely reproducible, so why not, I guess. It works fine.

My assumption is that a book may have several categories. I would think that categories are analogous to "tagging" in metadata. A book should probably include categories for each child portal in its classification hierarchy, and I see that is done. It would be useful if a book had a variety of topical categories that the contributor assigns, but I don't see that is done here. That would be useful!

What is the basis for Categories? How are they chosen / assigned? What categorization is automatic if any? Is there a "pick-list" of Categories? unsigned comment by ‎Bccrowe (talk) .

Did you see the "Highlights" block on the Main page, with a line on "General literature" and a direct link to "Science fiction"? --EncycloPetey (talk) 21:26, 18 April 2021 (UTC)[]

Please stop tinkering with the mediawiki OCR![edit]

The title says it all. It's dead in the water again.— Ineuw (talk) 21:11, 18 April 2021 (UTC)[]

Formatting of dot-separated left and right align[edit]

Hello! I'm a bit new here, and so I'm still getting used to the text formatting templates. Can someone let me know what to do in the case of a page like this? I assume it's a bad practice to hardcode the dots in (see example 1), but I do not know of a way to replicate this with a template. So, should it just be aligned to left and the page number to float right (see example 2)? Is there a more elegant way to do this?

Example 1:

Input: 1. Section Title .......... 1
Output: 1. Section Title .......... 1

Example 2:

Input: 1. Section Title {{float right|1}}
Output: 1. Section Title 1

Thanks, Tol | Talk | Contribs 01:37, 20 April 2021 (UTC)[]

There are currently two schools of practice here with respect to dot leaders: a) replicate them, using the various Dotted TOC templates; b) omit them and use a table format instead. Both are accepted practices. I lean to the second (e.g. Page:At the Fall of Port Arthur.djvu/15). An example of the first style is at Page:Ballantyne--The Pirate City.djvu/11. Beeswaxcandle (talk) 04:04, 20 April 2021 (UTC)[]
@Beeswaxcandle: Thank you! Tol | Talk | Contribs 04:06, 20 April 2021 (UTC)[]
IMO, if you're going to use dot-leaders, it's better to use {{TOC begin}} and {{TOC row 1-dot-1}}/{{TOC row 2dot-1}} rather than {{dotted TOC page listing}}, because the latter uses a complete table for every single row and this 1) exports badly, 2) is semantically highly suspect and 3) massively inflates the HTML output. {{TOC begin}} produces a single HTML table (though the markup within the row isn't very "tidy", but I don't think it can be in the current state of CSS). Inductiveloadtalk/contribs 06:19, 20 April 2021 (UTC)[]

Locked myself out of My Facebook Account[edit]

i dont have access to the phone number to get the login codes but still have access to my gmail account

That is some misunderstanding. This is Wikisource and we are not able to help you with your FB account. --Jan Kameníček (talk) 13:56, 21 April 2021 (UTC)[]

Template:PD-EdictGov and HK Commission of Inquiry Reports[edit]

I recently found the two Blair-Kerr reports commissioned by the Hong Kong government on corruption published in 1973, which resulted in the establishment of the Hong Kong ICAC. Are these reports OK for English Wikisource under Template:PD-EdictGov? I know both reports won't be accepted on Commons, can it be accepted here? Many thanks.廣九直通車 (talk) 14:22, 24 April 2021 (UTC)[]

TOC with braces and dotted cells[edit]

Please help to format TOC with braces and dotted cells: Page:Works of Thomas Carlyle - Volume 01.djvu/286. Thanks. Ratte (talk) 18:21, 24 April 2021 (UTC)[]

I've made a start for you by way of example. I don't usually bother with dotted cells, so haven't attempted to reproduce those. Beeswaxcandle (talk) 18:37, 24 April 2021 (UTC)[]
Thank you for your help! Ratte (talk) 18:56, 24 April 2021 (UTC)[]

DjVu: two missing pages[edit]

Index:Works of Thomas Carlyle - Volume 17.djvu: pages 214, 215 are missing. Could someone please add them to file from here or here? Thanks in advance! Ratte (talk) 19:29, 24 April 2021 (UTC)[]

Ratte, you don't need to add them to the file, simply uploading those two pages from the same edition separately can be suitable. Create the file, index page and just transclude them. It is not technical issue in transcluding works fromdifferent index pages to the same page. — billinghurst sDrewth 12:48, 25 April 2021 (UTC)[]
But this is not a fix for the source file, as the red message here dictates („Source file must be fixed before proofreading“). And the source file will remain incomplete. I have doubts about the correctness of such approach. Ratte (talk) 13:06, 25 April 2021 (UTC)[]
I suggested somewhere recently that languageseeker replace this file with a different source (NYPL, Robarts), after consulting you that it was okay, and indicated that I've had bad experiences with scans from the current source. CYGNIS INSIGNIS 14:07, 25 April 2021 (UTC)[]
@Languageseeker, @Ratte: probably best to coordinate with yourselves on what to do with these volumes. CYGNIS INSIGNIS 14:46, 25 April 2021 (UTC)[]
Ratte, it is an acceptable solution, that it is not a fix for the source file is a different situation. Been done on multiple occasions. That dropdown is generic, and guidance only. There are many ways to resolve issues. If you are solely focused on a fix for the file, then please drop the request into the appropriate section in WS:Sbillinghurst sDrewth 23:38, 25 April 2021 (UTC)[]
@Ratte: Yes check.svg Done
@Billinghurst: For DjVu files, both Inductiveload and myself can make these kinds of repairs fairly easily (we just both happened to be a bit busy right now). Since fixing the files in place is a far simpler solution than having a work transcluded from multiple indexes I would generally recommend trying that approach first. For PDF files that cannot be manipulated quite as easily as DjVu files the equation may fall out differently. And in either case, as you say, the requests are best put in the Repairs and moves section on the Scriptorium so the right people will notice them, and so we can keep track of them. Xover (talk) 09:37, 26 April 2021 (UTC)[]
There is next to zero difficulty in transcribing two different indices. Both ways work, either are functional. The fixation on perfect File: is a fixation, it has no necessity. — billinghurst sDrewth 09:51, 26 April 2021 (UTC)[]
Both ways work, sure; but all else being equal, fixing things at the source is generally the better approach. In particular, for most users that's going be easier and less confusing, and it won't create extra complexity in keeping track of extra indexes, files, and special transclusion rules that we will have to maintain indefinitely. So long as we have people available that are able and willing to patch files in this way, my strong recommendation is that we try that first and only fall back to multiple indexes and other workarounds when we have to. Xover (talk) 11:05, 26 April 2021 (UTC)[]
@Xover: you are awesome, thank you! Ratte (talk) 12:16, 26 April 2021 (UTC)[]

Dotted TOC line template: An additional column for author?[edit]

Could somebody show me how to add another column, for the authors of magazine articles, to the {{dotted TOC line}} template? See here: Page:Pacific Monthly volumes 9 and 10.djvu/13 -Pete (talk) 16:26, 26 April 2021 (UTC)[]

You likely would need to create a new template for that. --EncycloPetey (talk) 17:33, 26 April 2021 (UTC)[]
Ah, OK thanks. I don't have any strong preference for this particular template -- is there another way of approaching these pages that you'd recommend using existing templates? Would it be better, for instance, to render the whole page as one big table, rather than a bunch of individual templates for each line? -Pete (talk) 18:45, 26 April 2021 (UTC)[]
Typically, I have simply used a table for complicated ToCs like this one. If the ToC is on a single page or two pages, that might be the simplest approach. I am not familiar enough with the template alternative options to comment on those. --EncycloPetey (talk) 18:50, 26 April 2021 (UTC)[]

Songs and Sonnets (Coleman) table of contents[edit]

Why is the multi-page table of contents broken? I genuinely have no idea. I thought I carefully prepared the headers/footers for this and can't fix it... Can anyone help please? PseudoSkull (talk) 02:36, 30 April 2021 (UTC)[]

Inspecting the page I find that in the instances where it shows a "|-", it combines two <td>s into a single <tr> Hmm... PseudoSkull (talk) 02:39, 30 April 2021 (UTC)[]
@PseudoSkull: It's {{TOC row ragged}}: it wraps its contents in <div>…</div>, so when transcluded you end up with <tr><td><div></div><span><span class="pagenum ws-pagenum" ></span></span></td></tr>. That is, a <div>…</div> followed by a <span>…</span>. Since the div—by nature of being a block-level element—is followed by a new line, and the span—despite being otherwise empty—has line-height, you end up with a blank line separating each page of the toc.
This is one reason why I really don't recommend using any of the TOC templates, in favour of just using plain table markup. You'll run into weird edge cases there too, but they are rarer, much more obvious when they occur, and generally easier to fix. --Xover (talk) 18:28, 30 April 2021 (UTC)[]
@Xover: this was actually a missing new line in the template: diff. Though the ragged template is probably not ideal here since the right column isn't ragged.
I know what you mean re the templates, but on the other hand, manually formatting a table in the general case is actually quite a lot of direct formatting, once you have taken into account the text alignment, page position, wrapping, vertical alignment, padding and so on (most of which need setting on every single cell). Inductiveloadtalk/contribs 19:31, 30 April 2021 (UTC)[]
Yeah, I don't say that because raw wikitables are a perfect solution: it's a pragmatic far-lesser-of-two-evils call, and the upshot of having to do all that direct formatting is the direct control it gives you. --Xover (talk) 19:52, 30 April 2021 (UTC)[]

Proofing, Formatting, and Linking in The Pilgrim's Progress[edit]

I'm brand new to wikisource. The book that has my interest and that I have chosen as a first project is the original Pilgrim's Progress published in the 1600s. I have done several pages, but would like an experienced eye to look over what I have done to make sure I am doing it right before I get too far into it.


  1. Am I using formatting correctly (for centered text and major font size changes)?
  2. Is it appropriate to link to the wikisource KJV bible as I have done in the title page and for the footnote on page 1?
  3. This footnote was originally a sidenote. Was I wrong in changing the format, and if so, how should it be encoded?
  4. Are my comments on the relative discussion pages appropriate?

--Bountonw (talk) 01:37, 2 May 2021 (UTC)[]

A few notes:
  • There's a few cases where text should be centered but isn't, like on Page 8.
  • The w:long S should be kept in the transcription, using template:ls.
  • The first letter on page 13 should use template:di. On page 21, it should also use an image (see the template's documentation).
  • Bible links can be done with Template:Bibleverse.
Glad to see a new editor! Mcrsftdog (talk) 16:50, 2 May 2021 (UTC)[]

Welcome! On point four, comments on discussion pages often go unnoticed, central discussion pages like this one draw more eyes. I would replace the 'long s' with an 's', because given a choice most readers would prefer a clean transcript. However it is done, using the template is preferred if that character is transcribed, but that is so the labour of the proofreader to display them can be avoided. CYGNIS INSIGNIS 18:25, 2 May 2021 (UTC)[]

I've had a look now. On your query 3, I think what you have done is appropriate, in fact an improvement on the sidenotes they replace. On query 2 the KJV is a reasonable assumption for the quote, but another caution: if we get more than one edition of that Authorized Version then linking the relevant part of the text will present a difficulty. 18:35, 2 May 2021 (UTC)

TOC linking for a single line with multiple parts[edit]

Does anyone have a good idea for how to link the last two entries at Lippincott's Monthly Magazine/Volume 46 (namely, "Book-Talk" and "New Books"). Both "sections" actually have 6 parts (one per issue). I'm planning to omit the "issue" tier of the naming structure, since the TOC isn't done like that, so the links are probably going to be [[Lippincott's Monthly Magazine/Volume 46/Book-Talk (1)], etc.

My problem is: where does one physically put the links? Linking the page numbers is pretty non-standard and non-discoverable since people probably assume that would lead to the Page NS.

This is an issue For Lippincott's in general, but lots of periodicals do the same and combine recurring segments into a single TOC Line when they have a per-volume TOC. Inductiveloadtalk/contribs 18:18, 4 May 2021 (UTC)[]

Also, there were per-issue TOCs at the time (at the BL, for example), but apparently the covers were either removed for binding or the bound volumes never had them. Inductiveloadtalk/contribs 20:37, 4 May 2021 (UTC)[]

When I was thinking about transcluding similar series in The New Monthly I was thinking about linking to something like Book-Talk (Lippincott) from the Main TOC which then can link to the (1), (2) via an AUX-TOC to have them all together and linked, possibly with a volume or year anchor if there are a large number of them. MarkLSteadman (talk) 22:29, 4 May 2021 (UTC)[]
I probably would put them in a Portal (assuming they're in all issues, there could be ~600 of them in the first series). Going via another page would break basic export expectations, so it'd at least need an {{hidden export TOC}} to compensate.
Another option I though of was to (re)construct the in-order issue TOCs as an AuxTOC or similar. Inductiveloadtalk/contribs 09:26, 5 May 2021 (UTC)[]
@Inductiveload: This is why PSM is like it is. The other option is just transclude them all together into one section per volume, the only issue that causes is that you can only have one _/SOURCE\_ tab and the page numbering will not flow, otherwise it works well blending things. — billinghurst sDrewth 12:46, 5 May 2021 (UTC)[]

Moving existing pages after an index has been renamed[edit]

Index:Narrative of Henry Box Brown.pdf has been renamed, but it has a number of pages, which appear to have had some proofreading done, that have not been moved to this new index. It starts with Page:Narrative of Henry Box Brown - who escaped from slavery enclosed in a box three feet long and two wide and two and a half high (IA narrativeofhenry00brow).pdf/10 and goes up to page 94. Could someone with the ability to batch a move to the new index name do so? Thanks. —CalendulaAsteraceae (talk) 09:03, 5 May 2021 (UTC)[]

@CalendulaAsteraceae: Yes check.svg Done (it was actually p4–94). Inductiveloadtalk/contribs 09:20, 5 May 2021 (UTC)[]
@CalendulaAsteraceae: Please just ask an admin to move in the Index/Page namespaces. Far easier to just do it all and less chance of mistakes. — billinghurst sDrewth 12:41, 5 May 2021 (UTC)[]
@Billinghurst: Will do! Where's the appropriate place to do that? — CalendulaAsteraceae (talk) 09:35, 9 May 2021 (UTC)[]
Any of those public spaces where admins are sitting, so here, or WS:S or WS:AN. Wherever you feel most comfortable, we aren't fussed, just best to make it clear with a good subject line. Or use {{helpme}} on the index talk:. — billinghurst sDrewth 10:13, 9 May 2021 (UTC)[]

Rename source file[edit]

When uploading the Declaration of Change of Titles (General Adaptation) Notice 1997 as File:Bc540715cb2-2570-3c-scan.pdf, I forget to rename the source file name. Can someone help to change File:Bc540715cb2-2570-3c-scan.pdf into File:Declaration of Change of Titles (General Adaptation) Notice 1997.pdf? Many thanks.廣九直通車 (talk) 05:21, 11 May 2021 (UTC)[]

@廣九直通車: Yes check.svg Done . Xover (talk) 05:46, 11 May 2021 (UTC)[]

Default layout, can't get it working.[edit]

I've seen the usage of the {{default layout}} template on Immigration Restriction Act 1901 which neatly places text in the middle of the page and allows margin text to work.

I've tried replicating that using this page in my Sandbox, but it completely ignores the default layout template, placing the margin text over on the left sidebar. Does anyone know if I'm missing anything? Supertrinko (talk) 21:45, 11 May 2021 (UTC)[]

IIRC the layouts only work in the Mainspace: and don't apply in User: space. The quick way to tell if a namespace allows them is to look for a "display options" box on the left hand bar. For me it appears under the "navigation" box. Beeswaxcandle (talk) 22:07, 11 May 2021 (UTC)[]
Oh that's helpful to know, thanks very much! Yup, I only see the display options field under the mainspace. Supertrinko (talk) 22:55, 11 May 2021 (UTC)[]

Disambiguation for John Ward[edit]

This doesn’t seem to be working: {{similar|Author:John Ward}}. It redirects to only one of the John Ward’s. I’m not sure how it is supposed to work? Cheers, Zoeannl (talk) 10:06, 21 April 2021 (UTC)[]

Yes check.svg Done Author:John Ward had a redirect on it. — billinghurst sDrewth 12:46, 21 April 2021 (UTC)[]
@Billinghurst: Thanks for this. I have another disambiguation to do. I’ve figured the format now but how do I redirect links to Author:Richard Jones which should be Author:Richard Jones (1564-1602). Cheers, Zoeannl (talk) 22:58, 30 April 2021 (UTC)[]
Hi. How do you move?—tab or drop down at the top of the page). Or how do you fix and find the links that point to the page?—Special:WhatLinksHere/Author:Richard Jones. Call me as you need, generally I will teach rather than do. — billinghurst sDrewth 23:42, 30 April 2021 (UTC)[]
@Billinghurst: I’ve replaced the links I could find. The wikidata page d:Q18672102 should be updated? and some pages Special:WhatLinksHere/Author:Richard Jones links to, I couldn’t see a link to Author:Richard Jones? Cheers, Zoeannl (talk) 05:24, 13 May 2021 (UTC)[]
Hi Zoeannl. We move existing author pages, rather than create new pages, and that will update the wikilinks at Wikidata, and maintain the history on the page both public-facing and in the page metadata. Then we would edit/replace the redirect with a disambiguation page. I have done that. I will undertake the disambiguation of the links next. — billinghurst sDrewth 05:42, 13 May 2021 (UTC)[]
Oh you have already done them, there is just the legacy of the transclusion, and they will catch up themselves in time as the cache update gets to them, or until they are push purged. — billinghurst sDrewth 05:48, 13 May 2021 (UTC)[]

Index:Tales of the long bow.pdf[edit]

Pardon, this has probably come up before. I attempted to use the IA upload tool, which indicated that it would generate a djvu from jp2, but the result was not evident to me. So I directly uploaded the pdf from InternetArchive, which was a scan of a copy in an Indian library, but no text layer appears. There is a djvu layer text file and .xml at IA, I checked the former to see that the quality of ocr was okay. CYGNIS INSIGNIS 12:40, 14 May 2021 (UTC)[]

Looks like you uploaded the original PDF from the DLI which doesn't have a text layer. The IA-generated PDF is ..._text.pdf. Note that the IA PDF is JP2-encoded so takes aeons to render compared to the DLI (which is CCITT encoded).
The IA-Upload failed at the Commons upload step due to phab:T268400 (logs), but the (large at 65MB) DJVU did generate: Inductiveloadtalk/contribs 13:02, 14 May 2021 (UTC)[]

Using modern editions of medieval texts[edit]

Can I use modern editions of medieval texts (Old English period 650-1100 AD so very much within the public domain!) without infringing copyright. Obviously the editors of these texts have done painstaking work to digitise the Manuscripts so I don't know whether the modern versions are now their copyright?

Also the Old English section of this wiki is really quite inconsistent. If anyone could point me to a WikiProject page or any key editors of the OE section that would be appreciated.

There is an entire corpus of Open Source Anglo-Saxon poetry out there ( which can be copied onto Wikisource and I have access to a number of modern editions of OE prose texts which may or may not be public domain worthy.

Edit / P.S. different editors have different opinions on how to read certain parts of certain manuscripts so to make Wikisource a sort of palimpsest for the various readings would be useful.

Rho9998 (talk) 12:55, 14 May 2021 (UTC)[]

@Rho9998: I think have just copied from others, so they're not a great source. For example, it looks like the Exeter Book was cribbed from a 1995 online version which we have some of: The Exeter Book (Jebson). In this case, all the modern content is almost certainly copyright.
In general, it's preferred if the texts have a verifiable source, which is often a scan of a book. For example
I don't really think there is such a WikiProject (or if there is, it's kept very quiet!). A major problem with OE/Saxon works is they are very often parallel texts and we have, up to now, never come up with a really satisfactory way to handle such texts on the web (it's easier in books where the recto/verso page layout is obvious).
In general, digitisation doesn't create a new copyright (this is called w:Sweat of the brow doctrine, which generally doesn't hold water in the US) Inductiveloadtalk/contribs 13:30, 14 May 2021 (UTC)[]

@Inductiveload: Thank you for your answer. That answers a few questions and inspires a few. Firstly does that mean that the modern English parts of Wikisource's copy of Jebson's edition of the Exeter Book breach copyright? Secondly, isn't it fine to create a Wikibook for each Old English text (e.g. the Beowulf MS is digitised and available online (verifiable source =British Library Digitised Manuscripts)), rather than for a book for each modern edition? And if that's okay would the contributor(s) have to make decisions about how to read the MS' handwriting etc. themselves or could they copy an edition's readings, then citing the editor somewhere? Feel free to reply on my talk page if it would clog up this page. Rho9998 (talk) 14:18, 14 May 2021 (UTC)[]

@Rho9998: For the first part, yes, any modern English parts with some level of creative input are indeed copyright. I'm not sure how much modern English there actually is in the original. The titles probably don't qualify for copyright on their own if that's what you are wondering.
As long as the text is in scope for Wikisource you can add it. Functionally, that means one of:
  • From before 1926
  • From 1926 to now, and actually published (mostly to avoid copy-pasting of blogs and things) and either public domain or freely licensed.
    • Some non-published works are OK if they're some kind of "historically interesting", and there's quite some leeway for that, especially if the contributed text is of decent quality rather than a drive-by copydump.
  • Your own translation of one of these
So, you certainly can create a Wikisource "version", as we'd term it, for each MS.
I'm not really sure about the MS handwriting thing. If the reading is unambiguous, then I suppose there is no creative input in it and it doesn't get its own copyright. But, yes, it would still be a good idea to say where you got the reading from to help others in future. If there is creative input, it probably gets a copyright of its own (and then a grey area in the middle where you can argue over the level of input). This is probably worth asking at WS:CV with a specific example of an MS and a modern text. Inductiveloadtalk/contribs 14:51, 14 May 2021 (UTC)[]
Note that transcribing Beowulf from the manuscript is not going to get you the full text; it got burned in a fire and has been flaking a bit since then, so the best editions are based on the first transcription, modified by the known problems of that transcription. I'd think there's better editions pre-1926 of almost everything in Old English.--Prosfilaes (talk) 22:30, 18 May 2021 (UTC)[]

Dotted line lists[edit]

Hi there,

Wanting some guidance on how best to present the list given in Some Account of New Zealand/Chapter 11, I've kinda used the {{dotted TOC page listing}} template to make it work, however this looks weird with some lines that get split into multiple lines if there's a dash.

Is there a better template or way to present lists in this format? Any help would be appreciated. Supertrinko (talk) 09:33, 18 May 2021 (UTC)[]

If you wrap the contents of each line with {{nobr}} then it won’t break (and you won’t have to replace all the spaces with &nbsp;). — Dcsohl (talk) 15:17, 18 May 2021 (UTC)[]
Perfect, I see it's been redirected to {{nowrap}}, which does the same job, I've used that. Supertrinko (talk) 23:42, 18 May 2021 (UTC)[]
@Supertrinko: [Aside from the fact I am not a fan of the dot leaders.] I would be more inclined to wrap it in a centred block and set a max-width (set in em). No one really wants to read a two column spanning table at 100% width on a computer. — billinghurst sDrewth 03:18, 19 May 2021 (UTC)[]

Help to publish a book from Internet archive to Wikisource.[edit]

Can we publish the given book from internet archive to Wikisource. Book link 👇 (talk) 12:41, 19 May 2021 (UTC)[]

@TewariKamal: Probably not, that work appears to be in copyright in the US (published 1968, ©1967 with a copyright notice -- 1964 through 1977). I am surprised that IA has it uploaded, not certain how they are getting away with it, not certain what release or loophole they have found. — billinghurst sDrewth 12:57, 19 May 2021 (UTC)[]
I see a 1963 edition published in India (Arthur Llewellyn Basham, 1914-1986, UK historian) and complying with copyright, so that would make it 2068 in US or (based on w:Copyright law of India) it may be 60+PMT which is still 2047 outside of US, so still no, and I still don't see how IA is hosting that work either. — billinghurst sDrewth 13:06, 19 May 2021 (UTC)[]
The copyright is probably not being enforced (though that is no excuse for anyone to be hosting it, including us). PseudoSkull (talk) 05:39, 23 May 2021 (UTC)[]

The subtitle at Page:Sheila and Others (1920).djvu/104[edit]

Talking about the part that says "Abel X*his mark* Goodfriend". How do I deal with this type of subtitle/note at Wikisource, where there is xx-smaller text above and below the letter? Do I treat it as a regular footnote (with the <ref> tag)? Or do I do something else with it?

One thought I'll volunteer is that upon selecting that text, I would like at least for "X" to take precedent over the word "his". So when you select the text, I hope it doesn't say "his X mark", but rather "X his mark".

Also I've never seen any kind of subtitle in a book like this in my life, what is it even called? Is it some kind of Canadian typographical thing? (The author is Canadian) PseudoSkull (talk) 23:03, 19 May 2021 (UTC)[]

@PseudoSkull: One way is to allow flexbox to the rescue:


Abel <span style="display:inline-flex; flex-direction:column; align-items: center; vertical-align: middle; line-height:1.1;">
<span style="order:2;">X</span>
<span style="font-size:80%; order:1;">his</span>
<span style="font-size:80%; order:3;">mark</span>
</span> Goodfriend

Abel X his mark Goodfriend

Copy pastes as "X his mark" (and will probably render as such on export, since not many e-readers handle flexbox). Vertical alignment is a hair off where you might want it, but all in all not bad, I think. Inductiveloadtalk/contribs 23:56, 19 May 2021 (UTC)[]
@PseudoSkull: I would have put it in a tooltip or footnote, instead of trying to reproduce a typographic effect that was awkward even on paper, in this specific case. Xover (talk) 05:09, 20 May 2021 (UTC)[]

Download pdf on Erotica book isn't downloading all book pages[edit]

Hi everyone! I hope you're having a nice day. If anyone has an interest in this beautiful book - the Erotica book "download" button (to download a pdf) doesn't download all the pages, but only 11 pages, not any content pages. Have a nice day. --The Eloquent Peasant (talk) 13:33, 25 May 2021 (UTC)[]

@The Eloquent Peasant: the use of {{AuxTOC}} overrides the main TOC. One solution is to wrap the main TOC in {{export TOC}}. There's a bit more detail (that I just added this case to) at Help:Preparing_for_export#Listing_pages_for_export. Inductiveloadtalk/contribs 13:47, 25 May 2021 (UTC)[]
@Inductiveload: You fixed it! Thank you! I love that book! --The Eloquent Peasant (talk) 14:00, 25 May 2021 (UTC)[]

delete Index:Aladdin-1890s.djvu[edit]

Please delete Index:Aladdin-1890s.djvu and all of the pages. I found a much better scan, see Index:Aladdin and the Wonderful Lamp-1875.pdf. Thank you!--RaboKarbakian (talk) 19:44, 26 May 2021 (UTC)[]

A problem with table[edit]

Please help with «235|-» (The Works of Thomas Carlyle/Volume 2, table of contents). Ratte (talk) 16:31, 31 May 2021 (UTC)[]

@Ratte: You needed a {{nopt}} at the top to ensure the markup for the second page really starts on a new line. Help:Page_breaks#Tables_across_page_breaks has a bit more background. Inductiveloadtalk/contribs 16:37, 31 May 2021 (UTC)[]
Thank you! Ratte (talk) 16:45, 31 May 2021 (UTC)[]

Parser functions...[edit]

Template:LR sidenote/sandbox & Template:LR sidenote/sandbox/dummy

The test cases are here:-

Template:LR sidenote/testcases

Having had to abandon an entire morning's effort because of the issue shown in the above I'm not best pleased.

Can someone PLEASE explain slowly, why in the testcases the Expected and Actual results do not match, when all that's being dealt with in the relevant template parameters is simple text or numerical values?

The sandbox is a very simplified version of some template logic I was attempting to use elsewhere to make some other templates considerably more powerful, and I very nearly had the approach working. ShakespeareFan00 (talk) 17:43, 31 May 2021 (UTC) []

issue is suspected to be parser function related and so I've reverted the sandbox to be a mirror of the main template.ShakespeareFan00 (talk) 18:46, 31 May 2021 (UTC)[]

Logic simplification?[edit]

As part of attempts to resolve an issue elsewhere, I wrote: Template:Right sidenote/sandbox/CSSline with a view to providing a standardised way of generating a CSS attribute, if and only if a non-standard value was used.

I provide a partial-specification and test-cases here: - Template:Right sidenote/sandbox/CSSline

I then used it to re-implement portions of {{Right sidenote}} in a sandbox with the goal of moving the default behaviour to a CSS class, only generating inline styles, when a non-standard value was provided, and having a situation where a template calling {{right sidenote}} did not need to know the default value for the attributes.

The relevant implementations where CSSline is called from being:- {{Right sidenote/sandbox}}, {{LR sidenote/sandbox}}, {{RL sidenote/sandbox}}.

I have concerns that the implementation here is overly complex (it uses 3 parser functions) and would appreciate another contributor using the testcases and partial-specification provided, advising on how these could be eliminated whilst retaining the robust handling this template was intended to provide. ShakespeareFan00 (talk) 05:34, 2 June 2021 (UTC)[]

All caps plus different size[edit]

Hello, I'm on this page. And, I am not sure how to do the "Synopsis of Events between the Battle..." line. Kindly suggest. Lightbluerain (Talk | contribs) 03:37, 3 June 2021 (UTC)[]

@Lightbluerain: The text in question is set in a small-caps typeface, and it looks like it's centered too. For this we have the {{c}} and {{sc}} templates. The construct above it is a "horizontal rule", for which we use the {{rule}} template. Xover (talk) 04:02, 3 June 2021 (UTC)[]
@Xover:, Thanks a lot. Lightbluerain (Talk | contribs) 15:44, 3 June 2021 (UTC)[]

Some problems about transcribing[edit]

I'm trying to work on Dictionary of the Foochow Dialect, but I am not sure about the format. Take the page Page:Dictionary of the Foochow Dialect.pdf/1902 as example, is that acceptable? How should I improve the format? --TongcyDai (talk) 12:50, 3 June 2021 (UTC)[]

@TongcyDai: I have made you {{DFD index}}, which should help.
I suggest to put each "section" as a new list, otherwise the columns will be extremely long and a reader will have to scroll up and down the entire index at each column break. Inductiveloadtalk/contribs 14:28, 3 June 2021 (UTC)[]
Thank you very much!!--TongcyDai (talk) 14:40, 3 June 2021 (UTC)[]
@Inductiveload: For Page:Dictionary_of_the_Foochow_Dialect.pdf/1901, the top part is divided into 9 parts, but we don't have Template:Rh/9, how should we deal with that? --TongcyDai (talk) 14:42, 3 June 2021 (UTC)[]
@TongcyDai: We have it now! Inductiveloadtalk/contribs 14:49, 3 June 2021 (UTC)[]
@Inductiveload: Really appreciate it! And I just found that the first character of the radical is slightly more to the left than other characters (please see Page:Dictionary of the Foochow Dialect.pdf/1901). Is the template able to show the difference? --TongcyDai (talk) 14:54, 3 June 2021 (UTC)[]
@TongcyDai: Yep, set n=0 and if should work now. Inductiveloadtalk/contribs 15:05, 3 June 2021 (UTC)[]
@Inductiveload: Thank you!!! Can you please help make a template for the main part of the dictionary, like, Page:Dictionary of the Foochow Dialect.pdf/177? --TongcyDai (talk) 15:10, 3 June 2021 (UTC)[]
@TongcyDai: I see you found it, but for the record: {{DFD entry}} (I used 一二 for the doc's chars, feel free to fix!. Fundamentally it is a table, as being too smart with CSS will kill the export possibilities. I suggest a new table for each "pronunciation" (i.e. end the tables where the gaps are). Then, when you transclude, transclude each table to its own page, which will make navigation easier for readers. With >1000 pages, even one page per letter gets very, very long. Inductiveloadtalk/contribs 15:59, 3 June 2021 (UTC)[]
@Inductiveload: Thank you soooo much!! I've just finished transcluding Page:Dictionary of the Foochow Dialect.pdf/29, hopefully I'm doing right! --TongcyDai (talk) 17:54, 3 June 2021 (UTC)[]

Chronological Table and Index of the Statutes/Chronological Table[edit]

Index:Chronological Table and Index of the Statutes.djvu contains a lengthy table representing various statutes, the years they were passed and any noted repeals up to the date it was published.

It would not be wise to transclude this as a very large single table, so it was split by Monarch, and for later portions by Regnal Year. See Chronological Table and Index of the Statutes/Chronological Table

Each of the table "Page: s" have a header, this remains the same across the whole table.

The combined table for each Monarch/Regnal year, should have the same header as the individual Page:s.

The current transclusion of the combined tables uses {{Page}} a template (which is deprecated with good reason) and the transclusion fails to fully respect dynamic layouts or Indexstyles, as the transclusion is direct. This is undesirable.

Ideally what I would like to do is to continue to using the existing "sections" via a <pages> tag instead.

What would be a "recommended" way of placing the header for each combined table, because placing a <pages> inside table syntax to achieve this is not advised? ShakespeareFan00 (talk) 16:57, 5 June 2021 (UTC)[]

Request for a script... Section tag Linter...[edit]

Wikisource uses labelled "sections" to aid transclusion of a portion or portions of a page.

Would it be possible to have some kind of script/linter that checks if those tags are matched up or if duplicated naming exists within a page? ShakespeareFan00 (talk) 10:48, 6 June 2021 (UTC)[]

Your request doesn't make sense. Sections are free labelled, so how would you make any comparison? Plus you can have multiple sections of the same name—they are independent. There is no logic to follow. Even if we did follow some logic, there is no guarantee that will even be text in the section. This where the user needs to have vigilance and check their work as they go. We do run checks on overall components of works (link on each Index: page) to see if pages are missing.

It also why I don't do s1, s2, s3, ... labelling and I match the labelling to the subpagename, it becomes very obvious immediately if things are not correct. — billinghurst sDrewth 12:15, 6 June 2021 (UTC)[]

If sections are generally free labelled, then I will clarify what I am looking for as "correct" with respect to the use case I had in mind, is a pair:-
<section begin\=\"([^"]+)\" \/>
<section end\=\"([^"]+)\" \/>

where $1 and $2, the inner groups matched for the section name) are the same for a 'begin' and 'end' tag pair encountered sequentially.

There may be any kind of text apart from other section tags in between a 'begin' and 'end' tag.

The following would be 'error' conditions:-

  • 2 'begin' tags placed without an intervening 'end' tag (with a name equivalent to the first begin.
  • 'begin' tag does not match next 'end' tag encountered sequentially.
  • 'end' tag encountered without corresponding 'begin' tag, or 'end' tag encountered before corresponding 'end' tag.

If such a script is impossible then fair enough.

ShakespeareFan00 (talk) 07:10, 7 June 2021 (UTC)[]

Page:Chronological_Table_and_Index_of_the_Statutes.djvu/363 and ongoing..[edit]

The following contains a complex index statutes.

It would be appreciated if someone could read User:ShakespeareFan00/Statute_index and comment on which of the 3 example layouts I've implemented would be the best option.

I have 3 example layouts in my userspace (none of them is ideal.) and I would appreciate the views of other contributors BEFORE I try to implement a consistent approach across the entire index.

The pages are:

If the intention advice is to provide a a basic list and not worry about matching the formatting in the scan, then I would prefer to use Example3 for simplicity, even though it doesn't match entirely with the nominal spec I tried to write. ShakespeareFan00 (talk) 06:52, 7 June 2021 (UTC)[]

Edits not patrolled yet[edit]

Hello, When I check my watchlist here, I see a red exclamatory sign before my edits. The watchlist says that that means that my edits are not patrolled yet. I don't see such things on Wikipedia. Does it mean that every editor here has to be in the Auto-patrolled user group? Or, that's only because I am new here and that red exclamatory sign would go after a particular number of edits made? Lightbluerain (Talk | contribs) 03:21, 4 June 2021 (UTC)[]

(See WS:APD) Autopatrolled status is done differently here at enWS than at other places. It is given when a user demonstrates that they have a good knowledge/understanding of our policies and style guide. There is no minimum number of edits. We also allow any user to patrol an edit, rather than restricting such to a special group of patrollers. Beeswaxcandle (talk) 04:52, 4 June 2021 (UTC)[]
Is patrolling actually paid attention to? If I filter recent changes to unpatrolled changes only, then I see there have been 50 unpatrolled changes made in the last 2 hours... and if I switch it to manually-patrolled changes only, then I can see that only 50 changes in the last week have been patrolled. It doesn’t seem like being unpatrolled means much. — Dcsohl (talk) 21:41, 8 June 2021 (UTC)[]

Diary of the times of Charles II Vol. I.[edit]

I have been validating proofed pages on Diary of the times of Charles II Vol. I, and have come across a page I believe might be in error. However, I don't know enough about coding here to correct this. Please see the top of this page: Page:Diary of the times of Charles II Vol. I.djvu/187. Maile66 (talk) 13:24, 14 June 2021 (UTC)[]

@Maile66: I think that is fixed now, but ping @Chrisguise: is this is going to work? CYGNIS INSIGNIS 13:45, 14 June 2021 (UTC)[]
Yes, it looks correct. Thanks. Maile66 (talk) 13:48, 14 June 2021 (UTC)[]
Cool, now I want to now why it happened, I think the ref system threw that syntax out. CYGNIS INSIGNIS 13:51, 14 June 2021 (UTC)[]
This was because ## s1 ## was not on its own line in this edit. Inductiveloadtalk/contribs 14:44, 14 June 2021 (UTC)[]
That should work, however, see testing of that [11] CYGNIS INSIGNIS 17:50, 14 June 2021 (UTC)[]

Missing pages require placeholders[edit]

Can someone tell me how I can insert two placeholder pages into this file?— Ineuw (talk) 20:27, 15 June 2021 (UTC)[]

@Ineuw: If you use File:Generic placeholder page.djvu, you can use djvm in "insert" mode to splice it into the file:
djvm -i "Uncle...djvu" "Generic placeholder page.djvu" 137
djvm -i "Uncle...djvu" "Generic placeholder page.djvu" 138
Then upload over the old file and adjust the pagelist. Inductiveloadtalk/contribs 20:35, 15 June 2021 (UTC)[]
Thanks for the reminder. I thought that it's done with some online SQL or Python voodoo. Offline Djvu is not a problem. And my blank page insert is named "blank.djvu". :-))))) — Ineuw (talk) 20:59, 15 June 2021 (UTC)[]
@Inductiveload: I found a replacement file which includes the missing text it's a single leaf with the two pages. Downloading it now to double check it. It is also available as an 1896, 2 volume edition of 636 pages, published by Houghton & Mifflin which seems to be a clean scan. — Ineuw (talk) 04:13, 17 June 2021 (UTC)[]

Transcluding a header at the top of a page[edit]

Hello! I'm having trouble transcluding a header at The Complete Lojban Language (2016)/Chapter 2. Section 2.8 (you can search for "2.8 The basic structure of longer utterances") should display as a header, but it initially displayed as follows:

such as le selbri [ku] (see Section 2.10 (p. 23)). === 2.8 The basic structure of longer utterances ===

So, I realised that I needed to add a {{nop}} to the previous page. I did so, but not it just looks like this:

such as le selbri [ku] (see Section 2.10 (p. 23)).

=== 2.8 The basic structure of longer utterances ===

How do I get the header to show up properly? Thank you! (Please ping me on reply.) Tol | talk | contribs 05:07, 18 June 2021 (UTC)[]

@Tol: You needed the {{nop}} at the start of the second page. Because the pages are glued together, what you actually had was {{nop}}===2.8....===. The equals has to be on a new line for MediaWiki to see it as a heading. Inductiveloadtalk/contribs 05:16, 18 June 2021 (UTC)[]
@Tol: Also note that while heading markup can be made to work, for this and various other reasons Wikisource does not use plain wikimarkup heading syntax in the works we reproduce. For the headings in this work we'd use something like {{xl|'''2.8 The basic structure …'''}}. Xover (talk) 05:24, 18 June 2021 (UTC)[]
@Inductiveload: Thank you; that makes sense. @Xover: I see; but then how would one link to the section headings? Tol | talk | contribs 16:47, 18 June 2021 (UTC)[]
@Tol: {{anchor}} and {{anchor+}}. We don't preemptively create link targets for every heading. Xover (talk) 17:03, 18 June 2021 (UTC)[]
@Xover: Ah; thanks. I think I still prefer headings, as I won't have to go back and add an anchor each time I want to link to one, but I'll try using that format in other works. Tol | talk | contribs 17:15, 18 June 2021 (UTC)[]
@Tol: the page numbers can be linked, without the need for wiki headings and throwing out anchors, a good enough solution that uses the works own structure. CYGNIS INSIGNIS 00:10, 19 June 2021 (UTC)[]
@Cygnis insignis: How would that be done? I could link directly to the page (in the Page namespace), but I don't know how I would link to a transcluded page. Tol | talk | contribs 01:21, 19 June 2021 (UTC)[]
The_Complete_Lojban_Language_(2016)/Chapter_2#21 will link to the number (small, blue, on the left) of the transcluded page. Near enough? CYGNIS INSIGNIS 01:40, 19 June 2021 (UTC)[]
@Tol: What’s the licensing on this work? I don’t see any indication in the scan or Index: of its license. — Dcsohl (talk)
02:53, 19 June 2021 (UTC)[]
@Dcsohl: See the banner at the bottom of the mainspace page. (Or, see Section 1.8). Tol | talk | contribs 04:06, 19 June 2021 (UTC)[]
@Tol: Thanks - knowing what I know of Lojban, I figured the license was good, I just wanted to be sure it was there, and I just missed it. — Dcsohl (talk)
14:36, 19 June 2021 (UTC)[]
You're welcome! Tol | talk | contribs 15:48, 19 June 2021 (UTC)[]

Check a sci-fi story import[edit]

Hello, I did a copy/paste import of a CC-By-SA licensed short story. Can someone please look at this and tell me if something is incorrect or out of place?

Also, do digital imports go through proofreading like OCR texts, or is this just good now? Thanks. Blue Rasberry (talk) 22:48, 19 June 2021 (UTC)[]

@Bluerasberry: Digital files do not need to go through the proofread process, though we would fall back to sourcing per {{textinfo}} on the talk page. Also note that there is a specific d:Wikidata:Badge to mark digital copies, that flows through to enWS (done it) and you will see it mentioned at Help:Text status. — billinghurst sDrewth 01:05, 20 June 2021 (UTC)[]
@Billinghurst: Thanks, I have never added a badge to anything, but now I see how that works. Yes, I see it in Wikidata at Inside the Clock Tower (Q107297325), and see that badges go there. Blue Rasberry (talk) 01:13, 21 June 2021 (UTC)[]

Text between section tags is not displayed[edit]

The sectioned text between the end tags of this page do not transclude to the end of this page This issue exists at the end of sectioned chapters, but the sectioned beginning is always good. Can someone please check what I am missing? — Ineuw (talk) 13:13, 25 June 2021 (UTC)[]

@Ineuw: The section had two <section end=E209 /> tags, when it should be one "begin" and one "end". Inductiveloadtalk/contribs 14:13, 25 June 2021 (UTC)[]
Many thanks again.— Ineuw (talk) 16:17, 25 June 2021 (UTC)[]

Index:Buchan - The Thirty-Nine Steps[edit]

Index:Buchan - The Thirty-Nine Steps (Grosset Dunlap, 1915).djvu - needs a small correction, and I don't know how to do it. I validated the individual Contents chapter heading pages. However, one of the pages is in error on the Contents list. Chapter X actually links to Page 202, but on the right-hand Contents listing it says "200". Thanks for your help. Maile66 (talk) 03:24, 29 June 2021 (UTC)[]

@Maile66: that was noticed and corrected in the page link, it seems, a sic template usually implies that a typo should ignored. The last index I did had the same problem, I do nothing when it happens. CYGNIS INSIGNIS 15:03, 29 June 2021 (UTC)[]
Good to know. Thanks. Maile66 (talk) 17:29, 29 June 2021 (UTC)[]

Cross-page pseudo-table[edit]

On this and the following two pages, there is a sort of table, with very complicated formatting. Would it be better to represent it as an image? I’ve extracted some text on the bottom of the pages, which would work with or without a table, but I am not sure about the rest of the text. TE(æ)A,ea. (talk) 00:36, 30 June 2021 (UTC)[]

I would do it as an image. Trying to do them as tables usually ends up with a whole lot of time and effort for large amount of compromise. I would ask important is the text, if highly relevant for search engines, then look for a way to integrate the text. — billinghurst sDrewth 14:13, 30 June 2021 (UTC)[]

Biographical query...[edit]

Index talk:Medicine and the church; being a series of studies on the relationship between the practice of medicine and the church's ministry to the sick (IA medicinechurchbe00rhodiala).pdf

The author concerned being :- A. W. Robinson, D.D., The author I linked provisionally is Author:Arthur William Robinson, but I can't at present confirm it's the same person as listed in the ToC as "Arthur W. Robinson, D.D., Vicar of All Hallows Barking, Examining Chaplain to the Bishop of London, and Rural Dean of the East City of London.". Is anyone here able to confirm I have the correct person? ShakespeareFan00 (talk) 19:32, 30 June 2021 (UTC)[]

An entry here though : looks like very strong evidence though. ShakespeareFan00 (talk) 19:41, 30 June 2021 (UTC)[]
The other name I can't pin down is Ellis Roberts. ShakespeareFan00 (talk) 19:36, 30 June 2021 (UTC)[]

How to deal with lines/dotted lines?[edit]

How to properly format lines and dotted lines in indices, like this page or pages 14-15 of this file? In the former case you simply have the page ended up like this, while in cases like the latter one, the solid lines are not transcluded.廣九直通車 (talk) 09:07, 28 June 2021 (UTC)[]

Some people want to put in dot leaders, I don't bother. Take your choice. How does it affect the readability of the work? — billinghurst sDrewth 14:18, 30 June 2021 (UTC)[]
I mean, if the transcluded dots are left intact, they'll soon pop out of the page borders like this case, and if the lines are not transcluded, I'm also not sure how long I should extend the lines, such that they are kept in the page borders. That's the problem.廣九直通車 (talk) 05:48, 2 July 2021 (UTC)[]

Automatic new line[edit]

Hello, why does the OCR text give an automatic new line as in here (see in edit source)? This makes it count like a new paragraph which it is not. Is there any easier way to get rid of those? Or, we have to do that manually? Lightbluerain (Talk | contribs) 03:52, 6 July 2021 (UTC)[]

It just happens randomly. All non-pragraph ending line breaks should be removed anyway. They're only an artefact of the printing process. Beeswaxcandle (talk) 04:41, 6 July 2021 (UTC)[]
@Lightbluerain: I've seen this before, and actually suspect it's a Mediawiki bug. The actual source of that part of the page is "well-walled town; and the great</p><p>object of the besiegers"... it's like there is an invisible zero-width space or something, but even deleting and retyping the section of text doesn't fix it.
What does fix it, though, is just deleting all the newlines from the last paragraph (like you do when proofreading anyhow). If the last paragraph is 'all on the same line,' it's not an issue. Jarnsax (talk) 04:45, 6 July 2021 (UTC)[]
Alright, thanks. Lightbluerain (Talk | contribs) 06:42, 6 July 2021 (UTC)[]

Need inputs[edit]

Hello, did I format this page correctly? Kindly give inputs especially on the last part the footnote part. Lightbluerain (Talk | contribs) 19:20, 24 June 2021 (UTC)[]

@Lightbluerain: Hi! welcome and thanks for the edits! Some notes:
  • The first word shouldn't be capitalised because it continues the sentence from the previous page
  • Format references using <ref>The reference content</ref>, where the asterisk or dagger appears in the text. See H:REF for more. You do not need to (and actually should not) use the original marker (* or †), the automatic sequential numbering is better for collecting all the references at the end of the final page.
I can make that edit for you as a demo, or you can do it, let me know which you prefer.
Other than that, it looks sensible. Wikisource has a very steep learning curve, so 1) you are doing really well and 2) if you're not sure, that is totally normal and we'll be happy to help. Just ask, here, or in IRC (link at the top of the page). Inductiveloadtalk/contribs 19:41, 24 June 2021 (UTC)[]
@Inductiveload:, do you mean that even if the original text has * or similar markers for footnotes, Wikisource is supposed to use the numbering marker only ? If so, how can we (Wikisource people) say that we "digitized the original text", while, in reality, we made some changes for our own purposes? Lightbluerain (Talk | contribs) 11:38, 25 June 2021 (UTC)[]
And, thanks for the first point. I, actually, missed the first word. Thanks. Lightbluerain (Talk | contribs) 11:41, 25 June 2021 (UTC)[]
@Lightbluerain: Because using "*" makes sense when the book uses footnotes (i.e. at the end of each page) but we use endnotes because the final document is presented as a single continuous document for each section (usually a chapter). The original work can unambiguously use a "*" on each page, but we cannot, because there could be many "*"'s in a chapter.
We make a few concessions to the practicalities of the reflowable, continuous formatting of HTML (and ebook formats for that matter), as well as not making books too onerous to proofread. Not enforcing the numbering on what used to be, but not longer are, per-page elements is one of those concessions. Other concessions can include not manually formatting paragraph indentations, not replicating most fonts in body text, removing hyphens at line-breaks, not reproducing ligatures like "ct", optionally not using long-S, not hard-coding columns when it was just a space-saving device, etc.
Wikisource does not attempt to produce perfect facsimilies of the original works: for that, you can either use the original scan as-is, or there are dedicated format like TEI XML which produce "perfect" transcriptions that capture the work more exactly (and take a commensurately larger amount of effort to encode). Wikisource does aim to produce useful works that can be, more or less, read as intended (for example: did Creasy actually care the footnote was labelled *? Probably not, he just needed it to be unambiguously linked to that footnote, and the choice of the glyph "*" was likely just convention), and where possible also provide easy access the original scan so that interested users can refer to the original material for things like "what was the actual footnote number here?". Inductiveloadtalk/contribs 11:59, 25 June 2021 (UTC)[]
Alright, thanks a lot. Lightbluerain (Talk | contribs) 16:58, 25 June 2021 (UTC)[]
@Inductiveload:, please check the page now. Lightbluerain (Talk | contribs) 17:02, 25 June 2021 (UTC)[]
@Inductiveload:, because the final document is presented as a single continuous document for each section (usually a chapter) does this mean that I need to use some specific template to show that a chapter ends or starts here for the digitized document to be rendered correctly? Lightbluerain (Talk | contribs) 17:05, 25 June 2021 (UTC)[]
@Lightbluerain: Thanks! It now looks good to me.
What it means is that normally we would transclude the pages to a separate wiki page per chapter. In this case, 15 decisive battles of the world (New York)/Chapter 1 and so on. For a practical example, see Waylaid by Wireless. This means:
  • Each page is not unreasonably long and hard to scroll up and down (especially on a mobile device)
  • It's easy to find a specific chapter.
  • When the text is exported to PDF or EPUB, each chapter starts on a new page and gets and entry in the document's built-in table of contents. Inductiveloadtalk/contribs 17:12, 25 June 2021 (UTC)[]
Alright, thanks. Lightbluerain (Talk | contribs) 17:49, 25 June 2021 (UTC)[]
@Inductiveload:, am I supposed to do this page the same way? Lightbluerain (Talk | contribs) 17:54, 25 June 2021 (UTC)[]
@Lightbluerain: yes, that should be the same. You also do not need to try to format those footnotes on the same line.
FYI, using &nbsp; is not the way to do right alignment - it will go badly wrong if the text wraps on a smaller screen: phab:F34527379. If you really need right alignment (and here you do not), use {{right}} or {{float right}}. Inductiveloadtalk/contribs 18:37, 25 June 2021 (UTC)[]
@Inductiveload:, Sure. And, I used &nbsp; because right template didn't work there since the second footnote line also had another word there which was not right-aligned. Lightbluerain (Talk | contribs) 18:04, 26 June 2021 (UTC)[]
Anyways thanks a lot.! Lightbluerain (Talk | contribs) 18:16, 26 June 2021 (UTC)[]
@Lightbluerain: If I may interject here with one last suggestion … you should put the <references/> in the Footer section of the page. Here's why: when you put together the entire chapter, you are going to transclude a bunch of pages, and thus you will have a lot of <ref> tags spanning across a whole bunch of pages. Each <references/> usage is going to display all of the notes from the entire chapter, not just the ones from the page it’s on. (You also don’t want all the footnotes appearing mid-paragraph between the word "Conqueror" and the word "to" on the following page!)
The Footer section of each page is special - it only appears when users are looking at this particular page, but not when they are viewing the chapter as a whole. (In technical terminology, it is not transcluded with the body of the page.) So if you put <references/> in the Footer of each page (for viewers of the Page space), and then another <references/> at the very end of the chapter (for readers of the chapter as a single file), it will have the desired effect. — Dcsohl (talk)
03:24, 27 June 2021 (UTC)[]
@Dcsohl:, Alright, I'll put that in the footer from now on. Thanks. Lightbluerain (Talk | contribs) 16:41, 27 June 2021 (UTC)[]
@Dcsohl:, What about this page? I don't think it would go with <references /> in the footer. Lightbluerain (Talk | contribs) 17:57, 8 July 2021 (UTC)[]
@Lightbluerain: Why not? Xover (talk) 19:22, 8 July 2021 (UTC)[]
@Xover:, the poem below the footnotes is not in the footer in the photo I think. But, I saw you took that all as footnote. Lightbluerain (Talk | contribs) 17:02, 9 July 2021 (UTC)[]
@Lightbluerain: the poem seems part of the second footnote: the main text ends at "roving bands of" … CYGNIS INSIGNIS 20:11, 9 July 2021 (UTC)[]
Alright, thanks. Lightbluerain (Talk | contribs) 16:31, 10 July 2021 (UTC)[]

Need tool: check range/set of pages for redlinks[edit]

Stevenliuyi is coming to the end of proofreading a large work of 1100 pages, Eminent Chinese of the Ch'ing Period, in two volumes vol. 1 and vol. 2. I'm (much more) slowly progressing through validation.

The encyclopedia is cross-referenced by person names, where those names may be either Mongol or Mandarin or even European, with possible accented characters, and with the Mandarin using the atrocious Wade-Giles phonetization. Stevenliuyi has duly created nearly all the subpages using those names.

Within the texts there are internal within-work links for mentioned names. And those links change from redlinks to blue as Stevenliuyi finishes more and more subpages. But not all of the redlinks.

Getting the name correct, to make the link work, is hard given all the different accented letters used. An example is p. 239 where

ECCP lkpl|Lin Tse-hsü

needed to be

ECCP lkpl|Lin Tsê-hsü

Often I can see and fix redlinks as I (slowly) validate. Sometimes Steve finds broken links in vol. 1 as he works on vol. 2. But unless we make a third pass through all the pages, how will we catch and correct *all* the redlinks?

Is there a tool here at Wikisource to find any redlinks within a range or set of pages? Shenme (talk) 03:28, 29 June 2021 (UTC)[]

@Shenme: If we are talking internal self-references, then I have a script that I used for DNB that identified redlinks. I haven't looked at it for ages, but pretty certain that I can adapt it suitably. — billinghurst sDrewth 14:16, 30 June 2021 (UTC)[]
@Shenme: I have put a list of redlinks in these pages to User:Shenme/EC redlinksbillinghurst sDrewth 07:37, 10 July 2021 (UTC)[]
Thank you! I checked your list quickly and one of those
Page:Eminent Chinese Of The Ch’ing Period - Hummel - 1943 - Vol. 1.pdf/357 refers to missing Eminent Chinese of the Ch'ing Period/Li Tsung-wan
was one I messaged Steve about, just last night. So your list looks good.
I'll make a first pass over the list and collect the ones that are mis-capitalizations, so we can fix those all together (page renames, etc.) Most of the others look like as described, e.g. "Tulisen" -> "Tulišen", to be fixed in text.
Neat, one of these looks like a goof in the original text! Page Page:Eminent Chinese Of The Ch’ing Period - Hummel - 1943 - Vol. 1.pdf/145 has text "... serve on the staff of Mien-yü (see under Yung-yen)" but then shortly below indicates there should be a separate page for Mien-yü, but there isn't one. Text says "[qq. v.]" but it's a goof.
This will really help. Thank you. Shenme (talk) 19:49, 10 July 2021 (UTC)[]
Again, thank you. And for doing Vol 2. also. Fixed just under 50 within-work redlinks. Found at least three places the original work referred to pages that they never got around to doing. Such a huge work - making the internal links all work is great! Shenme (talk) 23:06, 10 July 2021 (UTC)[]

Book download[edit]

I tried to download His Last Bow to my Kindle, but when it downloaded, it showed a table of contents, the preface, and no chapters (the links of the table of contents linked to the website). How do I download the full thing? (Is this even the right forum for this?) 01:05, 4 July 2021 (UTC)[]

Whoops, forgot to sign in! MEisSCAMMER (talk) 01:12, 4 July 2021 (UTC)[]
@MEisSCAMMER: Unfortunately, this particular text is a very old one that is not set up correctly. I've tried to tweak it so that you now should be able to get a complete download of all the chapters that are there, but there may still be other issues that make the resulting ebook suboptimal. Xover (talk) 08:10, 4 July 2021 (UTC)[]
Thanks! It worked! MEisSCAMMER (talk) 17:06, 4 July 2021 (UTC)[]
I recently moved the pages to be subpages of the work. — billinghurst sDrewth 07:40, 10 July 2021 (UTC)[]

Red-linked templates (with high usage in content pages).[edit]

Is ther a list of these?

The criteria being:-

  • Template which is red-linked from Main , Page or Translation namespace.
  • Has over 25 links from those Namespaces.
  • Does not exist (currently) as a Template on English Wikisource.

High usage suggests a 'missing' or renamed template , whereas low usage suggests a typo (a different but related issue).

ShakespeareFan00 (talk) 11:56, 10 July 2021 (UTC)[]

@ShakespeareFan00: Special:WantedTemplates. But please don't start creating redlinked templates. These are either typos for an existing template (which should be fixed) or they're from some other problem (import, deleted without cleanup, etc.) that should be handled separately. We already have far too many templates, and most of them are essentially unmaintained, so we definitely want to think twice before creating more. Xover (talk) 17:00, 10 July 2021 (UTC)[]

Center allign in footnotes[edit]

Hello, look at this K2 in the end. Did I do this correctly? Lightbluerain (Talk | contribs) 05:57, 15 July 2021 (UTC)[]

Hi. That mark is one used by the printer and binder, it is not part of the text; those are usually removed in transcripts here. CYGNIS INSIGNIS 07:21, 15 July 2021 (UTC)[]
@Lightbluerain: As Cygnis says, printer's marks like this are not usually reproduced on English Wikisource. We care primarily about the parts of a book that its author was involved in, and anything that's simply an artefact of the printing process is either left out or approximated.
But if you do want to reproduce these the technical way to do it is to place it inside the footer text field. Roughly, everything in the main ("Page body") text box will be included when we combine all the pages for presentation (what we call "transcluding" them, as a hopelessly obscure and technical term), but anything in the header and footer text boxes will be left out.
The special case that makes that rule of thumb complicated is the footnotes, because when you look at an individual page in the Page: namespace (such as Page:15 decisive battles of the world (New York).djvu/231) it looks like the footnotes are in the footer. The key to keep in mind there is that the actual notes—the <ref>…</ref> bits—are in the body, not the footer. Only the {{smallrefs}} template is in the footer, and it only gathers up and displays the preceding footnotes. Once all the pages are combined for presentation, there will be a separate {{smallrefs}} template that collects all the notes from the combined pages for display.
It took me a loong time to wrap my head around this model, but once it clicks most issues like this become fairly obvious and can be reasoned about. Xover (talk) 07:51, 15 July 2021 (UTC)[]
Alright, thanks. Lightbluerain (Talk | contribs) 16:14, 16 July 2021 (UTC)[]

OTRS Text Permission Problem[edit]

I recently found The Religion of God, a text that had its permission verified by OTRS by Bookofjude in 2008, and at that time contains an old OTRS link. However, the {{PermissionOTRS}} on that page now don't have the OTRS ticket number, and is categorized under Category:Items missing OTRS ticket ID. Can any OTRS users here help to find and insert that ticket number? Regards.廣九直通車 (talk) 08:21, 18 July 2021 (UTC)[]

Template:UNTS volume title and Page:UN Treaty Series - vol 649.pdf/1[edit]

Should Template:UNTS volume title get better improvement to make an even better replica of each cover of the United Nations Treaty Series? The scan making Page:UN Treaty Series - vol 649.pdf/1 has a footnote that I have not seen in any earlier volumes. Thanks.--Jusjih (talk) 04:09, 29 July 2021 (UTC)[]

@Jusjih: There's no reason why not. I just made the template to cover what I could see at the time. In this case, though, it looks like some kind of printing mark that we often don't really worry about, so I would say it's not really needed to replicate anyway. Inductiveloadtalk/contribs 08:06, 29 July 2021 (UTC)[]
Thanks so much. Is there any way to make our replicated texts bolder in Template:UNTS volume title to better match the underlying image of Page:UN Treaty Series - vol 649.pdf/1, etc? Especially "United Nations • Nations Unies", "New York" and the year of publication since Volume 401 in our replicas seem not bold enough.--Jusjih (talk) 02:57, 30 July 2021 (UTC)[]
@Jusjih: Yes check.svg Done let me know if there are any cases it's incorrect. Inductiveloadtalk/contribs 03:08, 30 July 2021 (UTC)[]