Wikisource:Scriptorium/Help

From Wikisource
Jump to navigation Jump to search
Scriptorium Scriptorium (Help) Archives, Last archive
The Scriptorium is Wikisource's community discussion page. This subpage is especially designated for requests for help from more experienced Wikisourcers. Feel free to ask questions or leave comments. You may join any current discussion or a new one. Project members can often be found in the #wikisource IRC channel (a web client is available).

This page is automatically archived by Wikisource-bot

Have you seen our help pages and FAQs?


Javascript errors on a particular work[edit]

In Index:Atalanta - Vol. 2.djvu, for example on p. 21, I'm getting an error in my browser's Javascript console that "TypeError: success is not a function. (In 'success()', 'success' is an instance of Object)" (triggered inside ProofreadPage's image zooming code) and the OCR text layer in the DjVu doesn't get loaded into the edit field. On other works I've tested I do not observe this.

Can anyone confirm they see the same? Or, conversely, that they do not observe this? --Xover (talk) 10:00, 17 October 2019 (UTC)

Latest Firefox on Windows 10. Not sure how to access the Javascript console, but I do have the problems I think you're describing -- OCR is not loaded (and I take it there is OCR in the original), and I can't zoom on the page image. -Pete (talk) 20:06, 17 October 2019 (UTC)
I ran into this again on a different file too, so I've filed a bug on it in Phabricator. @Tpt: I had trouble tracing this backwards so if someone actually familiar with that code could take a look it would help get us closer to whatever is the root cause here. Based on the code in ext.proofreadpage.page.edit.js I really don't see how success can end up as anything but undefined or function. --Xover (talk) 11:03, 10 November 2019 (UTC)
Update: it looks like tpt and samwilson just checked in a fix for this javascript error (unknown if that's sufficient / the root cause for the missing text layer) so that should disappear with the next Mediawiki deployment (happens every two-ish weeks I think; look for version "1.35.0-wmf.8" in the Tech News posted at WS:S). --Xover (talk) 14:31, 14 November 2019 (UTC)

Mathematical expressions in PoTM[edit]

In the Proofread of the Month (Index:Optics.djvu) all remaining pages that haven't been proofread have complex mathematical expressions on them (for example this one). What templates would one use to proofread these pages? DraconicDark (talk) 15:54, 26 October 2019 (UTC)

@DraconicDark: See Help:Math and w:Help:Math. --Xover (talk) 16:19, 26 October 2019 (UTC)
@DraconicDark: Pardon late note. I just noticed your actual question had never been directly answered. For future reference: was {{Missing math formula}} what you were looking for? 114.78.171.144 07:09, 10 November 2019 (UTC)

Need help fixing Alice's Adventures in Wonderland[edit]

The scan that we're using for the 1907 edition of Alice's Adventures in Wonderland (with illustrations by Charles Robinson) is missing 2 pages, which were likely removed from the original book before it was scanned. Efforts to locate another scan of the book have been fruitless. However, we do have a copy of the color plate that was found on Flickr (File:Alice's Adventures in Wonderland - Carroll, Robinson - S205 - The whole pack rose up in the air.jpg), and the missing text is known (and saved in an HTML comment [1]). In order to complete the transcription, it seems the best course of action would be to add 2 blank pages to the DjVu file (after page 203 (or 173 according to the book pagination)) that could be used to hold the missing text and color plate, so that they could be transcluded into the main namespace. It would be best if the blank pages were totally white (not yellow like an actual page), to make it clear that they are only placeholders for the missing pages. Kaldari (talk) 17:20, 30 October 2019 (UTC)

I've linked this discussion from WS:S#Repairs (and moves) for better visibility from editors with the appropriate tooling —Beleg Tâl (talk) 17:37, 30 October 2019 (UTC)
I have the tooling to do this, but I'm not sure when I'll be able to get it done (might not be until next week some time). But we'll also need someone to shift the existing pages to compensate for the inserted ones, and I don't have the tooling for that. Since this work is complete and transcluded, once I upload the modified version the mainspace work will be broken until the pages are shifted, so that needs to be coordinated. --Xover (talk) 07:57, 31 October 2019 (UTC)
@Xover: I can work on shifting the pages after the new file is uploaded. Kaldari (talk) 23:05, 31 October 2019 (UTC)
@Xover: I've recreated the two missing pages from images found on Flickr and Scribd. These should be inserted to become pages 204 and 205 of the new DjVu file. Kaldari (talk) 00:37, 6 November 2019 (UTC)
@Kaldari: Oh, excellent work! I'll try to get the file patched sometime today or tomorrow. --Xover (talk) 09:35, 6 November 2019 (UTC)
@Kaldari: Yes check.svg Done Apologies for the delay. --Xover (talk) 14:41, 6 November 2019 (UTC)
@Xover: Thanks! I've shifted all the page content. Now it just needs those pages revalidated: Index:Carroll - Alice's Adventures in Wonderland.djvu. Kaldari (talk) 17:01, 6 November 2019 (UTC)
@Kaldari: Done. --Xover (talk) 17:38, 6 November 2019 (UTC)

Line breaks for poetry[edit]

Hello, I am transcribing Ghalib's divan and am wondering how best to represent the text in type. I am having trouble getting the line breaks for the poetry to be consistent. Line breaks in this type of poetic text (ghazal) are traditionally represented horizontally across the page with a caesura in the middle, rather than vertically down the page in English texts.

For example,

The text should not display like this:

اٴے شہنشاہِ آسماں اور رنگ

اٴے جہاندارِ آفتاب آ ثار

It should display like this:

اٴے شہنشاہِ آسماں اور رنگ /اٴے جہاندارِ آفتاب آ ثار

(without the forward slash).

How can I format this caesura to be consistent?


https://wikisource.org/wiki/Page:Diwan-e-Ghalib_-_Urdu_(1922).djvu/17

Thank you! unsigned comment by Blackpeartree (talk) .

@Blackpeartree: in Old English poetry use the template {{cesura}}, perhaps this or something similar could be used for your work also? —Beleg Tâl (talk) 03:31, 8 November 2019 (UTC)

What to do with a table that's split between multiple pages[edit]

In Index:Sophocles (Classical Writers).djvu there is a table on this page that appears to be split between that page and the next, with about half of the table on each page. How does one proofread these pages so that the full table is accurately transcribed into the main namespace? DraconicDark (talk) 21:38, 9 November 2019 (UTC)

See Help:Page_breaks#Tables_across_page_breaks. Mpaa (talk) 23:15, 9 November 2019 (UTC)

Seeking advice for making better DJVU files[edit]

I created a DJVU file for an issue of The New Northwest. Here is its index page (which links to the file): Index:The New Northwest, October 27, 1871.djvu As you can see, the file has a bit depth of two (black and white, no shades of grey). It's readable, but not very pretty.

I'm following the Linux command line instructions here: Help:DjVu_files#Method_1_-_page_at_a_time_with_DjVuLibre, using the high-res JP2 scans available at the Historic Oregon Newspapers archive. There is an automatic conversion from greyscale to B/W taking place at the PNG to PBM conversion step. (The PBM format only supports B/W.)

I've tried using PGM as an intermediate format instead of PBM, but I end up with blank pages.

How can I get more reasonable-looking (i.e., greyscale) page scans into the DJVU format? -Pete (talk) 00:20, 18 November 2019 (UTC)

@Peteforsyth: You want to use JPEG as the intermediate format: DjVuLibre supports JPEG directly, it's just JPEG 2000 (.jp2) it doesn't handle. Convert the files using GraphicsMagick using gm mogrify -format jpeg '*.jp2' (you may need to adjust the commandline syntax for operating system differences). Then use c44 page-1.jpg page-1.djvu to turn it into a DjVu, and djvm -i wholework.djvu page-1.djvu (and … page-2.djvu, … page-3.djvu, etc.) to add each page to a multipage DjVu file.
Or you can ask me to do them and I'll get you DjVus with an OCR text layer generated with Tesseract (which may or may not be of sufficient quality to be worthwhile). I've got some custom tooling for this that is… not user friendly, let's put it that way… which converts Tesseract's hOCR output into DjVuLibre's sexpr format and adds it to the DjVu. I've generated a new DjVu using this method and uploaded it over File:The New Northwest, October 27, 1871.djvu so you can take a look.
PS. The mechanical process of scanning and assembling a DjVu does not create a new independent copyright (this is Bridgeman Art Library v. Corel Corp., aka. the "sweat of the brow doctrine"), so there are no rights for you to release under CC0. Commons uses the {{PD-scan}} wrapper template to indicate this. Instead, the file should be tagged with the licensing that applies for the original work, which I am guessing is public domain in the US due to pre-1924 publication? If so, I would suggest using {{PD-scan|PD-old-70-expired}} (which also asserts it is public domain in pma. 70 countries) or {{PD-scan|PD-US-expired|country=US}} (which only asserts it is PD in the US; the |country=US bit is to suppress the warning that Commons requires works to be PD in both the US and the country of origin, since it doesn't apply when works are first published in the US). For works published more than 120 years ago Commons accepts an assumption that the author(s) died more than 70 years ago and that pma. 70 copyrights have thus expired now, in the absence of specific evidence of the author(s)'s vital dates, so this work can be safely tagged with the former too. --Xover (talk) 04:30, 18 November 2019 (UTC)
Why not use PGM as the intermediate format? It won't work if you're trying to use cjb2, but c44 will take pgm or jpeg, and there's no reason not to use the lossless format as input.--Prosfilaes (talk) 20:12, 18 November 2019 (UTC)

Ah, thank you both -- very helpful. I appreciate your offer to make these for me, @Xover:, but I'm also trying to learn myself -- so I may do a few more small files like this just to get the hang of it.

  • Glad to know that PBM, PGM, and JPG files work in DJVU, but JP2 and PNGs don't. I've updated the Help: page linked above to state this.
  • You're quite right about copyright, stupid mistake. I can only guess that with all the "work" I had to do to produce the files, I momentarily felt that it must add up to something significant :) (Either that, or I just clicked the wrong box.) I've corrected the tag.
  • I did generate a PGM-based version of the DJVU, and uploaded that, per @Prosfilaes: instructions. -Pete (talk) 03:29, 19 November 2019 (UTC)
Hmm, even though my file is more than twice the size of Xover's, it is pretty much illegibly blurry. I'll revert the file...maybe I need to manually specify a higher resolution when creating the PGM? -Pete (talk) 03:34, 19 November 2019 (UTC)
On second thought -- neither file is truly legible. Open to suggestions about how to improve this. I tried with the setting "-dpi 300" which should be plenty, but it produced a file of exactly the same size (which I imagine is more or less identical -- I need to get a DJVU viewer for this computer to be sure.) -Pete (talk) 00:21, 20 November 2019 (UTC)

Transclusion issue[edit]

Hi all, I'm having a trouble transcluding some pages over at Mrs. Beeton's Book of Household Management/Analytical Index, in which a large number of long pages (mostly consisting of {{TOCstyle}}). I suspect this is a problem with the {{TOCstyle}} template, but it's unclear what exactly is going on. Any help/suggestions? (also feel free to comment here or on the talk page). -- Mathmitch7 (talk) 22:13, 26 November 2019 (UTC)

@Mathmitch7: Too template intensive. View the source, and see the "NewPP limit report" at the end of the source page #boom Post-expand include size, both the TOCstyle tabular nature and {{ditto}}) are killing you. — billinghurst sDrewth 11:05, 1 December 2019 (UTC)
Perhaps someone can suggest a way of doing this that is less template intensive? I was previously told to use {{ditto}} by other contributors here. Would sectionalising it by inital letter be a possible solution? ShakespeareFan00 (talk) 10:26, 6 December 2019 (UTC)
We have been through this over and over and over. These templates were set up as experimental and they stopped being developed. They are complex table of contents templates, and they are being used here for an extensively long index. View the source and tell me that what is being delivered is okay. How many concatenated tables is reasonable? KISS. What are the expectations here? — billinghurst sDrewth 00:13, 7 December 2019 (UTC)

Incrementing Roman numerals different from the djvu numbers.[edit]

In the index namespace, how can I define a djvu number "12" as the roman numeral "vi" and then have it automatically increment from that page on? — Ineuw (talk) 09:39, 1 December 2019 (UTC)

Is this [2] what you needed? --Jan Kameníček (talk) 10:30, 1 December 2019 (UTC)
Generally you should be able to define the first page, as if 12=6 then 7=1. Those are the page numbers, they don't magically start at somewhere like 6. — billinghurst sDrewth 10:44, 1 December 2019 (UTC)
Thanks to all. — Ineuw (talk) 11:56, 1 December 2019 (UTC)