Wikisource:Scan Lab

From Wikisource
Jump to navigation Jump to search
Scan Lab
Shortcut:
WS:LAB

A central resource for assistance with creation, downloading, uploading, processing and other operations on scans of texts.

Times have changed, but it still can be hard to put 600 pages in the right order!
Instructions

If you need help with a scan, add your request in the relevant section below as a new sub-section. If you can, include all the details someone will need to work on the request without further questioning. You can use {{ping project|Scan Lab}} to send an immediate notification to all subscribed Scan Lab members. Once you have been answered, ping only that user when you reply with {{re|Their username}} (do not ping the whole project on every comment).

If your request has been completed, you should acknowledge that your issue is resolved and close the section with {{section resolved|1=~~~~}}.

Participants[edit]

Add your name to Module:Mass notification/groups/Scan Lab to be notified via {{ping project|Scan Lab}}. Also add your name below with details of any particular tasks you can help with.

Participant Can help with Instructions
Inductiveload
  • General scan tasks: scraping/download, batch uploads, scan repair
  • Splitting/combining scan images/photos from a scanner or camera into scan file (with ScanTailor)
Xover
  • General scan tasks: scraping/download, scan repair, manipulating DjVu files (but not PDF)
Mpaa
  • General scan tasks: scraping/download, scan repair, manipulating DjVu files (but not PDF)

Requests for downloading scans[edit]

Instructions

If you would like scans that already exist online to be transferred to Wikisource, leave a message here. This includes batch transfers from the Internet or Hathi Trust for multi-volume works. Please include necessary bibliographic information so that scans can be uploaded to Commons with proper information and license templates. Author, country, and date of first publication. A suggested file name on Commons can also be helpful.

How to Analyze People on Sight[edit]

Can someone please upload

Jane Austen Juvenilia Volume 2 and 3[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) The scans of the manuscripts of Austen's Juvenelia are available on here and here. They're both in the PD, but I have absolutely no clue as how to download them. The images are higher resolution than the ones on the BL website, but they're in the zoomify flash format. Languageseeker (talk) 02:58, 2 February 2022 (UTC)Reply[reply]

  • Languageseeker: I know Volume the Second is in the public domain; it’s already been transcribed here. Are we sure that Volume the Third is in the public domain? It could easily fall into a copyright trap, so I just want to make sure. TE(æ)A,ea. (talk) 22:59, 8 February 2022 (UTC)Reply[reply]
    • @TE(æ)A,ea. The British Library has it listed as "Public Domain in most countries other than the UK." Languageseeker (talk) 23:07, 8 February 2022 (UTC)Reply[reply]
      So it looks like it was definitely published in 1951 (which would imply copyright expiry in 2001 in the UK as 50 years after publication), which makes the UK copyright claim weird. If true that would postdate the URAA date ... MarkLSteadman (talk) 00:06, 9 February 2022 (UTC)Reply[reply]
      That is volume 3 (Evelyn and Kitty the Bower). Volume 1 was published in 1933 (so it was in the PD on the URAA date). MarkLSteadman (talk) 00:19, 9 February 2022 (UTC)Reply[reply]

For Speech to the Mississippi State Democratic Convention[edit]

Currently backed to Index:A Mississippi View of Politics, Page 1.pdf and Index:A Mississippi View of Politics, Page 2.pdf. Could be backed to a single, more usable PDF or DJVU from the Library of Congress. Note the “Page 1” scan is from page 2, and the “Page 2” scan from page 3. TE(æ)A,ea. (talk) 01:35, 7 September 2022 (UTC)Reply[reply]

@TE(æ)A,ea. Index:New-York Daily Tribune - 1859-08-31.djvu. I used the JP2 sources so the print is nearly as clear as you can get. The high res tool has also been updated to handle LOC newspapers, so you can load the IIIF tiles directly from the LOC into the page viewer at WS if you want (see User:Inductiveload/jump_to_file#Loading high-res images if you don't know this tool). The OCR is about as good as you might imagine, but the LOC OCR was no better. The region OCR tool would be perfect for this. Inductiveloadtalk/contribs 00:02, 18 September 2022 (UTC)Reply[reply]

c:file:Life and death of the Irish parliament.djvu[edit]

Can I please get https://archive.org/details/lifedeathofirish00malo/mode/2up upl;oaded as a djvu file. It is one of those ugly butts that picks up the prefixed 0000 scan pages when done via IAUPLOAD tool. Deleted the old copy and the completed {{book}} template is collapsed below. Thanks. — billinghurst sDrewth 23:51, 30 September 2022 (UTC)Reply[reply]

@Billinghurst: Yes check.svg Done Please check the results. Xover (talk) 06:09, 1 October 2022 (UTC)Reply[reply]
{{book}} template file detail contained
== {{int:filedesc}} ==
{{Book
| Author       = {{creator:James Whiteside}}
| Editor       = {{creator:Sylvester Malone}}
| Translator   = 
| Illustrator  = 
| Title        = "Life and death of the Irish Parliament" : being the substance of two lectures delivered .in the Metropolitan Hall, Dublin by the Right Hon. James Whiteside Q.C., LL.D., M.P. Reviewed and corrected by the Rev. Sylvester Malone, C.C., Kilkee
| Subtitle     = 
| Series title = 
| Volume       = 
| Edition      = 
| Publisher    = John F. Fowler
| Printer      = 
| Date         = 1864
| City         = Dublin 
| Language     = {{language|en}}
| Description  = {{en|Essentially a response to the original lectures by James Whiteside per [[:file:The life and death of the Irish parliament.djvu]]
| Source       = {{IA|lifedeathofirish00malo}}
| Image        = {{PAGENAME}}
| Image page   = 1
| Permission   = {{PD-scan|PD-old}}
| OCLC         = 1048349507
| Other versions = 
| Wikisource   = s:en:Index:{{PAGENAME}}
| Homecat      = 
| Wikidata     = 
}}
{{Djvu}}

[[Category:Uploaded with IA Upload]]
[[Category:1864 books]]
[[Category:DjVu files in English]]

She[edit]

Can Index:She (1887).pdf be uploaded as a DJVU because Commons seems unable to render the PDF and IA Upload does not generate the DJVU. Languageseeker (talk) 15:19, 19 October 2022 (UTC)Reply[reply]

I'd already uploaded the a generated DJVU Index:She_(1887)_(shehistoryofadve1887hagg).djvu, but the text layer is offset.
Page link scan for pp. text for pp.
Page:She_(1887)_(shehistoryofadve1887hagg).djvu/16 1 4
ShakespeareFan00 (talk) 18:13, 19 October 2022 (UTC)Reply[reply]
@Languageseeker, @ShakespeareFan00: New file is at File:She (1888).djvu. Note, I used 1888 as that is the year printed on the title page. If there is something I'm missing and this was somehow really published in 1887, then request a file move at Commons before you move or create an index for it (Commons file movers can't rename a file if it is in use). Xover (talk) 19:05, 19 October 2022 (UTC)Reply[reply]
ShakespeareFan00 Xover Thank you. Xover, you're right, this is the revised 1888 edition. Good catch. Languageseeker (talk) 02:57, 20 October 2022 (UTC)Reply[reply]

The old wives' tale[edit]

Can someone upload a DJVU from [1] Languageseeker (talk) 02:57, 20 October 2022 (UTC)Reply[reply]

Is this possible for someone to soon so that it can be included in the December MC? Languageseeker (talk) 23:41, 19 November 2022 (UTC)Reply[reply]
@Languageseeker: That Hathi scan is geolocked so I can't get at it. IA has some decent scans of various 1911 US editions if that helps? Internet Archive identifier: oldwivestale0000benn, Internet Archive identifier: oldwivestale00benn_0, and Internet Archive identifier: cu31924013586940. Xover (talk) 11:20, 20 November 2022 (UTC)Reply[reply]
@Xover Thanks for looking. I want this specific edition because it is the original printing. Languageseeker (talk) 01:17, 2 December 2022 (UTC)Reply[reply]
Not entirely sure what this was, but it was messing up the L2/L3 hierarchy PseudoSkull (talk) 20:35, 1 January 2023 (UTC)Reply[reply]
== {{int:filedesc}} ==
{{Book
| Author       = Michael Maier
| Editor       = 
| Translator   = {{anonymous}}
| Illustrator  = 
| Title        = Atalanta running, that is, new chymicall emblems relating to the secrets of nature
| Subtitle     = 
| Series title = 
| Volume       = 
| Edition      = 
| Publisher    = 
| Printer      = 
| Date         = 
| City         = 
| Language     = {{language|en}}
| Description  = Manuscript on paper of Michael Maier, Atalanta fugiens, translated into English from the 1618 German edition.<br/>Binding: Recent binding of marbled paper boards, polished calf back, top edge cut and gilt, other edges plain and original.<br />First two and last four leaves are of different, probably eighteenth-century, paper, probably binder's sheets used in an earlier binding of the volume.<br />In English.<br />Mellon MS 88, acquired with the Duveen collection. Gift of Paul and Mary Mellon, 1965.<br />Script: Written by a single copyist in English secretary and italic hands.<br />Watermarks: 1) Strasbourg lily with initials "WR" and countermarked "IHS" with a cross ascending from the horizontal of the central letter, very similar to Churchill 401 (dated 1625), but without initials on the countermark. 2) A large, crowned fleur-de-lys and with a countermark "VI," not identified.
| Source       = {{IA|mellon48atalanta}}<br/>https://collections.library.yale.edu/catalog/15959780
| Image        = {{PAGENAME}}
| Image page   = 
| Permission   = 
| OCLC         = 
| Other versions = 
| Wikisource   = s:en:Index:{{PAGENAME}}
| Homecat      = 
| Wikidata     = 
}}
{{Djvu}}

== {{int:license-header}} ==
{{PD-scan|PD-old-100}}{{PD-US-unpublished}}

[[Category:Uploaded with IA Upload]]
[[Category:Atalanta Fugiens]]
[[Category:English-language manuscripts]]
[[Category:Michael Maier]]
[[Category:Paul Mellon Collection]]

I'm not 100% on the bibliographic information, but the Yale University Library is apparently satisfied to categorize it under "Alchemy--Early works to 1800", so it seems safe. As far as I can tell the licenses should be something like PD-old-100 plus PD-US-unpublished. Shells-shells (talk) 04:12, 26 October 2022 (UTC)Reply[reply]

@Shells-shells: In light of Index:Atalanta running (ia mellon48atalanta).djvu, is this request still current? Xover (talk) 07:19, 1 December 2022 (UTC)Reply[reply]
@Xover Yes check.svg Done with IA-Upload. Sorry I forgot to update this request; feel free to archive. Shells-shells (talk) 08:13, 1 December 2022 (UTC)Reply[reply]

Adelaide Ristori[edit]

Can someone upload a DJVU from [2] Languageseeker (talk) 03:29, 20 October 2022 (UTC)Reply[reply]

@Languageseeker it is already available at Commons: File:Memoirs and artistic studies of Adelaide Ristori; (IA memoirsartistics00rist).pdf. Mpaa (talk) 21:46, 31 October 2022 (UTC)Reply[reply]
@Mpaa That's a different book. Languageseeker (talk) 15:45, 1 November 2022 (UTC)Reply[reply]
@Languageseeker soory, wrong link File:Studies and memoirs ; an autobiography (IA studiesmemoirsau00ristrich).pdf Mpaa (talk) 15:56, 1 November 2022 (UTC)Reply[reply]

TASJ[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) Could someone please download this and upload it as File:TASJ-1-1-2.pdf, please? It was published in 1884, and is thus PD in the US, but author’s lives and whatnot make this a local upload. TE(æ)A,ea. (talk) 01:11, 7 December 2022 (UTC)Reply[reply]

@TE(æ)A,ea.: Is it important that it be PDF (vs. DjVu)? And would File:Transactions of the Asiatic Society of Japan, vol. 1-2 (1874).djvu be an acceptable name? Xover (talk) 18:46, 11 December 2022 (UTC)Reply[reply]
  • Xover: The extra 1 is for series 1 (of five, currently). The request for PDF (and naming scheme) was because I was thinking of uploading the other volumes from Google Books myself (with the same scheme), and also because this volume needs some pages reordered (if you wouldn’t mind working on that after you get the file here). TE(æ)A,ea. (talk) 21:31, 11 December 2022 (UTC)Reply[reply]
    @TE(æ)A,ea.: Happy to work on it, but I simply don't have sane tools for working with PDFs so it would have to be DjVu in that case. As for file name, it's just a suggestion based on the principle that the file name should be descriptive. But how about File:Transactions of the Asiatic Society of Japan, series 1, vol. 1-2 (1964).djvu (to leave room for having both the 1874 original publication as well as this 1964 reprint, should that ever be relevant)? I have a DjVu ready and can upload it sometime late tomorrow if that's ok. Xover (talk) 22:57, 11 December 2022 (UTC)Reply[reply]
    • Xover: If you’re better with DJVU, then that’s fine with me. “TASJ” is fine as a shortening. I don’t think 1964 should be used vs. 1874 because (1) the 1964 reprint is an exact reprint, without editing, and (2) there are actual volumes of TASJ first published in 1964. I just think that 1874 is clearer in this case. TE(æ)A,ea. (talk) 02:21, 12 December 2022 (UTC)Reply[reply]
      @TE(æ)A,ea.: Uploaded to File:TASJ-1-1-2.djvu (minimal quality control, bare bones info template). I uploaded it to Commons because their cutoff for just assuming PD even absent firm author info is pub. +120 years (so anything published before 1901 or thereabouts, currently). Xover (talk) 07:29, 12 December 2022 (UTC)Reply[reply]
      • Xover: Thank you! I marked a few DJVU pages for deletion there, if you don’t mind deleting those pages and readjusting the pagelist. TE(æ)A,ea. (talk) 19:51, 12 December 2022 (UTC)Reply[reply]
        @TE(æ)A,ea.: Yes check.svg Done From some other copies it looks like there is a fold-out map or illustration at /23 with a subsequent blank sheet, so I left those in place. Xover (talk) 08:16, 16 December 2022 (UTC)Reply[reply]
        • Xover: I just got a scan of the images—there are four fold-out plates in a row. I don’t think it’s necessary to add those pages back, though, so I thank you for your work with this index. TE(æ)A,ea. (talk) 18:14, 16 December 2022 (UTC)Reply[reply]

The Peeler[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) Could someone download “The Peeler” from The Partisan Review, volume 16, number 12, please? The story is one of few of hers in the public domain. TE(æ)A,ea. (talk) 21:32, 11 December 2022 (UTC)Reply[reply]

@TE(æ)A,ea.: This item isn't downloadable from IA (it's a "Books to Borrow" scan). It's also not obvious that any part of this would be in the public domain. Xover (talk) 08:23, 16 December 2022 (UTC)Reply[reply]
  • Xover: This issue of The Partisan Review was not renewed, unlike many other issues of the same periodical. In addition, this story was also not renewed. With an IA account, one can borrow the issue and download the appropriate pages. TE(æ)A,ea. (talk) 18:14, 16 December 2022 (UTC)Reply[reply]


Fungi From Yuggoth[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) Could someone download Fungi From Yuggoth from [3]. It should be PD-no-notice. Languageseeker (talk) 15:51, 20 December 2022 (UTC)Reply[reply]


Big Sur[edit]

At Hathi: https://babel.hathitrust.org/cgi/pt?id=uc1.31822033766825 It says copyright 1962 but it just went up at Librivox and I could access it. So, lets download it? --RaboKarbakian (talk) 16:16, 29 December 2022 (UTC)Reply[reply]

@RaboKarbakian indeed this appears not to have have had copyright renewed when it should have been (around 1962 + 28 = 1990): Index:Big Sur - Kerouac - 1963.djvu. Inductiveloadtalk/contribs 17:29, 29 December 2022 (UTC)Reply[reply]
Inductiveload That was quick! I was coming here to discuss getting the pdf so I can remove the watermarks from it. Maybe we can still do that?--RaboKarbakian (talk) 17:50, 29 December 2022 (UTC)Reply[reply]
@RaboKarbakian the watermarks are baked into the images. The PDF from Hathi is just a big, ordered, collection of JPG and PNGs and has watermarks in the images too. They're not causing a problem that justifies the effort of removing them page-by-page as far as I can tell. I can't download the whole book as a PDF, but I can provide the downloaded images from HT if that's what you're after? (edit now it's uploaded: in a ZIP here) Inductiveloadtalk/contribs 18:02, 29 December 2022 (UTC)Reply[reply]
Inductiveload HA! They aren't either what you said. You reminded me of mogrify....do you have evince installed? Or more importantly xpdf? pdfimages takes Hathi pdf apart. There is a flag to retain the page numbers and that should be toggled. I have scripts that make running it easier. Upload that original Hathi pdf and let me try getting rid of the watermarks before you dismiss me....--RaboKarbakian (talk) 21:22, 29 December 2022 (UTC)Reply[reply]
@RaboKarbakian I can't upload the PDF because like I told you, I can't download the PDF of the entire book as non-institutional member, all I can get is page-by-page. However, you are right that the JPX in the page-wise PDF doesn't have the watermark burned in. However, I do not have tools to download those PDFs as they're not available on the HT Data API as far as I know, and I also don't have robust tools to process those PDFs, so still, all I can offer you is the images and you can feel free to edit them to remove the watermarks if you like. Inductiveloadtalk/contribs 22:00, 29 December 2022 (UTC)Reply[reply]
Inductiveload you downloaded all 200 images individually? You should download as pdfs individually. Had I known you were manually downloading things, I would have done this myself. Forgive me. I will download the 200 pdf and get rid of the watermarks. Shouldn't take more than a day, depending on the internet.--RaboKarbakian (talk) 23:33, 29 December 2022 (UTC)Reply[reply]
I use the Data API, of course I don't download anything manually, that would complete madness! Batch downloading the book takes roughly 2.5 second per page plus a few more minutes for conversion and upload, but it's hands-off once I put the details in. Inductiveloadtalk/contribs 23:38, 29 December 2022 (UTC)Reply[reply]
Inductiveload or Xover https://drive.google.com/file/d/1isPrbKNqCcpNZ4wLgyBklDzZ9lcCtm53/view?usp=share_link is a xz file with the mixture of .jp2 and .pbm and https://drive.google.com/file/d/17whtK6g4Q0Tt58R2FsCFfJ389zZcUYWY/view?usp=sharing a zip archive of the watermarks that were stripped from the pdf. If you could use these files for the DJVU, that would be nice.
Inductiveload that link to the API was very interesting. I can see how you were confused about the watermarks being embedded or not.--RaboKarbakian (talk) 18:32, 30 December 2022 (UTC)Reply[reply]

Vol 9 of the Works of John Locke[edit]

From [4]. This will complete the set. Name should be The Works of John Locke - 1823 - vol 09.djvu so it matches the other volumes. MarkLSteadman (talk) 15:11, 1 January 2023 (UTC)Reply[reply]

@MarkLSteadman done. Mpaa (talk) 20:50, 1 January 2023 (UTC)Reply[reply]

My big three for 2023[edit]

@Inductiveload:? PseudoSkull (talk) 20:37, 1 January 2023 (UTC)Reply[reply]

@PseudoSkull c:File:Conflict (Prouty).djvu. Note: p190 is problematic. Mpaa (talk) 19:08, 2 January 2023 (UTC)Reply[reply]
@PseudoSkull c:File:Dangerous Business (Balmer).djvu done. Some pages are problematic. I have no access to the third one. Mpaa (talk) 23:01, 2 January 2023 (UTC)Reply[reply]
@PseudoSkull Last one. Index:Jalna.pdf Languageseeker (talk) 05:44, 3 January 2023 (UTC)Reply[reply]

Letters from the Battlefields of Paraguay[edit]

@Inductiveload: - I've located an 1870 printing at https://books.google.co.uk/books?id=VtwFAAAAQAAJ which enable the version we have to be match and split. ShakespeareFan00 (talk) 11:56, 24 January 2023 (UTC)Reply[reply]

The Art of Japan (Brinkley)[edit]

I’ve borrowed both volumes, and have scanned in the first volume so far. Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) Could someone create a PDF/DVJU file from the images here, please? Thank you. TE(æ)A,ea. (talk) 03:41, 30 January 2023 (UTC)Reply[reply]

Doing Yann (talk) 12:55, 30 January 2023 (UTC)Reply[reply]
Uploading now: File:Brinkley - The Art of Japan, vol. 1.pdf. DjVu tomorrow. Yann (talk) 22:53, 30 January 2023 (UTC)Reply[reply]
@TE(æ)A,ea.: File:Brinkley - The Art of Japan, vol. 1.djvu. Yann (talk) 13:01, 31 January 2023 (UTC)Reply[reply]

Finding scans[edit]

Instructions

Requests for locating scans for existing works at Wikisource, or works you wish to add yourself but cannot find scans for. For general text requests, see Wikisource:Requested texts.

The Graphic[edit]

I'm looking for w:The Graphic for October 1886 and January 1887 that contains the original, bloodier version of She. Languageseeker (talk) 02:59, 20 October 2022 (UTC)Reply[reply]

The Criterion Volume 2 and 3[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) Would it be possible to locate Volumes 2 and 3 of The Criterion? I'm especially trying to complete The Woman Who Rode Away that began in Volume 3. Languageseeker (talk) 18:36, 23 December 2022 (UTC)Reply[reply]

The Laws and Acts of the Parliament of Scotland[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) We have a 1681(?) combined edition of what was effectively republished as Volumes 1 and 2 of the 1685 reprinting. However there is apparently a "third" volume https://books.google.co.uk/books?id=ixY-AAAAcAAJ which extends the work nearly up to the Act of Union. Would it be possible for this to be added to English Wikisource at some later date?

I am also noting that IA has some volumes of the "The Acts of the Parliament of Scotland" which was published by the Record Commission in the Early 19th century. Would it also be possible for a suitable set of volumes to be located for that work. The original is a 12 volume set covering most of the pre Uninon Scottish statutes :) ShakespeareFan00 (talk) 10:04, 22 January 2023 (UTC)Reply[reply]

Railway Construction and Operation Requirements for Passenger Lines and Recommendations for Goods Lines of the Minister of Transport[edit]

There is a 1950 edition of this that is used as a reference for hisorical standards in the Heratige Railway sector. (such as for example a Structure gauge, diagram.)

There may be later versions of this, but it's specfically pre 1972 versions, which would be of interest.

The 1950 (and 1963 reprint) are understood to be an expired crown copyright.

ShakespeareFan00 (talk) 11:08, 28 January 2023 (UTC)Reply[reply]

Scan repair[edit]

Instructions

Request repair work on existing scans here.

When requesting page insertion, rearrangement or deletion, always include the page numbers (as marked on the pages) as well as the position of the page within the scan file. This makes it much easier for the repairing user to locate the defect in the file and fix it, as well as allowing a double-check against mistakes.

Please do not use this page to request repairs on works that you don’t really care about: the backlog at Category:Index - File to fix is a known backlog. If you want to help with those, you can add {{missing pages}} to those indexes if they do not already have it, along with details of the missing pages.

File:Wells - The First Men in the Moon, 1901.djvu[edit]

The current scan has a replaced cover while the original cover is available at [[5]]. Can the repaired cover be replaced with the correct one. Languageseeker (talk) 16:40, 10 October 2022 (UTC)Reply[reply]

@Languageseeker: This file is in use across multiple projects so we can't just overwrite it. If the issue is significant we can upload a different scan and migrate all the pages over (but that quickly snowballs into quite a bit of work, so you need to consider how pressing the issue is). Xover (talk) 18:35, 12 October 2022 (UTC)Reply[reply]
@Xover I think that the cover should be replaced because this is spreading inaccurate information across multiple wiki projects. These projects are claiming that the replaced cover is the original cover when it is not. For example of the correct cover, see [6] or [7]. Languageseeker (talk) 01:58, 20 October 2022 (UTC)Reply[reply]
@Languageseeker: I don't disagree. But 1) we don't own the articles on the several Wikipedias that currently use this cover, and 2) we're clients of Commons, which has policies that prohibit overwriting files in this circumstance. So… we can upload a new file with the correct cover and migrate our dependents over to that new file. We can propose (possibly through a WP:BOLD edit) that those other Wikipedias switch to using this new file. And we can note in the file's description on Commons that the file uses a non-original cover. Xover (talk) 07:04, 20 October 2022 (UTC)Reply[reply]

Index:Anatomy of the human body (1918).pdf[edit]

Can proofread pages be merged from Index:Anatomy of the Human Body(Lewis-1918).djvu and Index:Anatomy of the Human Body(Lewis-1918).djvu be deleted. Languageseeker (talk) 15:28, 19 October 2022 (UTC)Reply[reply]

@Languageseeker: Because of reasons I built a new DjVu from Internet Archive identifier: anatomyofhumanbo00grayrich and patched it with the missing pages from Internet Archive identifier: 101532328.nlm.nih.gov (pp. 1091–1092, 1139–1140). I have moved all pages over from Index:Anatomy of the Human Body(Lewis-1918).djvu to the new DjVu at Index:Anatomy of the Human Body (1918).djvu, but have not yet deleted any of the old indexes. Please let me know if there's something that necessitates using specifically the scan from Index:Anatomy of the human body (1918).pdf instead.
Also, please note that I have uploaded the new DjVu locally on enWS since Warren Harmon Lewis's (1870–1964) contributions are still in copyright in the UK until 2034. The old scan files on Commons will have to be deleted for this reason. Xover (talk) 06:58, 20 October 2022 (UTC)Reply[reply]
Oh. And absent actual evidence to the contrary, I see no reason to assume that pp. 29–32 are "missing". p. 28, which is independently numbered in roman numerals (xviii), is a bibliography that starts at A and ends with Zeitschrift für Wissenschaftliche …. It is highly unlikely that you'll find much that's sorted after Z and W. So anything missing would have to be 4 full pages of a completely different section, crammed between the contents+bibliography and the start of the first chapter. And precisely that section would then have to be missing from every single scan identified so far. So… Unless someone finds a scan that has a plausible set of pages numbered 29–32, I'm going to go ahead and assume this is a pagination error at the printers. Xover (talk) 11:04, 20 October 2022 (UTC)Reply[reply]
From Wikipedia -Warren Harmon Lewis (June 17, 1870 – July 3, 1964) was an American embryologist and cell biologist. , Was he working in the UK when editing this work? ShakespeareFan00 (talk) 08:57, 20 October 2022 (UTC)Reply[reply]

Index:The Statutes of the Realm Vol 9 (1708-13).pdf[edit]

Page realignment needed following New scan upload.

Existing scan postion. New scan postion. pp.
Page:The_Statutes_of_the_Realm_Vol_9_(1708-13).pdf/5 Page:The_Statutes_of_the_Realm_Vol_9_(1708-13).pdf/1 Half-title
Page:The_Statutes_of_the_Realm_Vol_9_(1708-13).pdf/7 Page:The_Statutes_of_the_Realm_Vol_9_(1708-13).pdf/3
...

Template:Statutes_of_the_Realm[edit]

Helpfully a newer a set of scans have been uploaded, but the migration process of existing efforts to the new scans needs to be completed, re-alignment of pages.

Can someone please take a look at all the volumes in the linked template and re-align pages accordingly?

I've left a note for the uploader asking them to update the source file information at Commons. I think these are Now the HT scans of the mid 1960's reprint. ShakespeareFan00 (talk) 14:44, 26 October 2022 (UTC)Reply[reply]

Index:The History of the Church & Manor of Wigan part 2.djvu[edit]

I've found the missing page in an alternate scan.- https://archive.org/details/historychurchma02bridgoog/page/n14/mode/2up Can it be added?

The relevant search on Archive.org also showed there were 2 more parts ( IA has ex Google Scans as:) being.

Would it be possible for someone on scan lab to have a look at the scans - (https://archive.org/search.php?query=creator%3A%22Bridgeman%2C+George+T.+O.+%28George+Thomas+Orlando%29%2C+1823-1895%22) and get English Wikisource/Commons with a complete volume set?

(located as part of an aside to some ongoing Lint Error cleanups.)

c:Category:Book no. 6, Banguê Book Collection[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) Could someone convert the images in this category into a DJVU file? I believe it's something simple, but I couldn't find a practical way to do it. Albertoleoncio (talk) 14:38, 1 December 2022 (UTC)Reply[reply]

@Albertoleoncio In your defence: it's actually not that simple (each step is not that hard, but all together it's a bit of a pain). It's complicated by the fact that the images aren't named "lexicographically", so they don't naturally sort in the right order. You also have to download them all in the first place, which is cumbersome (I use a script, AFAIK there is no "official" solution provided by Commons). Then, they need to be split using a tool like Scan Tailor, and that is complicated by not all pages being exactly split in the same place. And then you can use tools like djvm to convert to a DjVu pages and glue them all together.
In this specific case, the tight binding also makes it impossible to see all the text, as a lot of it has vanished into the binding, and the flat-bed scanning style hasn't helped (a v-cradle scanner opens the book less and thus doesn't hide so much of the text around the curve of the other pages).
However, I have split the pages as best I can: File:Banguê Book, number 6.djvu. I have also used a rather high image size to avoid over-compressing the delicate writing. I did not attempt to OCR it as it would just be junk. Hopefully it works for you!

Also, please could you review and fill in the information template on that file page, as I have done only a cursory first draft. As for the license, while the JPGs have been claimed to be CC-BY-SA, I think this work and photos of the pages of it are more properly licensed as Public Domain. 00:03, 5 December 2022 (UTC) Inductiveloadtalk/contribs 00:03, 5 December 2022 (UTC)Reply[reply]

@Inductiveload: I have no words to thank you for all this work. Downloading all the images was even easy using WLD (in German, but ok) and IDM (paid, but I already had a license for other reasons), but I had such a hard time with djvm (either it's outdated or it's not very clear) that I gave up and tried it in pdf, but I also gave up because I couldn't find any reliable tool that works offline and didn't charge me for the service.
If you know any manuals or step-by-step guides on how to do this process, that would be wonderful. In addition, I have already adjusted the license tags. Again, thank you very much! Albertoleoncio (talk) 00:55, 5 December 2022 (UTC)Reply[reply]
@Albertoleoncio I'm glad you like it! There aren't any step-by-step guides that I know of, as there's not really any "one" process to do it. I have a completely horrible (and public) script at https://github.com/inductiveload/wstools/blob/master/wstools/make_document.py which is what I usually use. It's not a secret, but it's not really designed to work for anyone but me, as a completely bulletproof program would take too much time to write! There are two real "tricks" it uses: one is to covert the images to intermediate formats that can then be shifted to DjVu and the other is taking the Tesseract OCR output as a hOCR file and then inserting it into the DjVu file in the right place. Neither are conceptually hard, but they're also not completely straightforward in practice.
You can obviously always ask here. If you have just a few works to do, you can also upload the images in a zip to the Internet Archive and they'll make a PDF with OCR for you automatically, but it'll take a few hours to generate. Inductiveloadtalk/contribs 16:21, 29 December 2022 (UTC)Reply[reply]

Index:Thuvia, Maid of Mars.djvu[edit]

Is it possible to replace this djvu from google with this full-color version from IA. The publisher is different, but they appear to use the same plates for the text. Alternatively, the IA text can be uploaded as a separate file and the text transferred which is probably the better way. Languageseeker (talk) 15:45, 20 December 2022 (UTC)Reply[reply]

@Languageseeker as a new index Index:Thuvia, Maid of Mars - Burroughs - 1920, Grosset and Dunlap.djvu as it's a different edition. I don't really see the point of transferring the proofread text, though, especially as the Grosset & Dunlap edition has rather fewer plates than the existing McClurg one. If you want to use the plates from the Grosset edition at the IA, they should be extracted from the upstream files at the IA anyway, never from the PDF/DJVU. Inductiveloadtalk/contribs 16:41, 29 December 2022 (UTC)Reply[reply]
@Inductiveload I didn't notice the differing number of plates. I think I'll use the available plates from the Grosset & Dunlap edition to replace the google ones in the McClurg scan and leave the Grosset edition for another occasion. Languageseeker (talk) 17:31, 29 December 2022 (UTC)Reply[reply]

Index:Analysis and Assessment of Gateway Process.pdf[edit]

See the note by an anon contributor at Page talk:Analysis and Assessment of Gateway Process.pdf/28. --Jan Kameníček (talk) 00:10, 23 December 2022 (UTC)Reply[reply]

Yes check.svg Done : page inserted and proofread. Inductiveloadtalk/contribs 16:09, 29 December 2022 (UTC)Reply[reply]

File:Oration Dedication.pdf[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) The first and second pages are from the binder, and can be removed. PDF pages 4–11 need to be split. The other two pages are fine. PDF or DJVU, your choice. File name, perhaps Oration Delivered on the Occasion of the Dedication of the New Hall of Cooper Lodge. TE(æ)A,ea. (talk) 21:32, 26 December 2022 (UTC)Reply[reply]

@TE(æ)A,ea. Here it is c:File:Oration Delivered on the Occasion of the Dedication of the New Hall of Cooper Lodge.djvu Mpaa (talk) 23:27, 26 December 2022 (UTC)Reply[reply]

Page:The Works of John Locke - 1823 - vol 01.djvu/76[edit]

Should be replaced with page 80 from [8]. MarkLSteadman (talk) 15:14, 1 January 2023 (UTC)Reply[reply]

@MarkLSteadman done. Mpaa (talk) 21:11, 1 January 2023 (UTC)Reply[reply]
@Mpaa Thank you for your help! Unfortunately it seems now the text and images following the insertion are now off by one. Can you please shift the text forward one page starting at Page:The Works of John Locke - 1823 - vol 01.djvu/77 through Page:The Works of John Locke - 1823 - vol 01.djvu/341 (e.g. the text on 77 should be on 78)? MarkLSteadman (talk) 21:34, 1 January 2023 (UTC)Reply[reply]
@MarkLSteadman My bad, I inserted the page and forgot to delete the bad one. On its way ... Mpaa (talk) 21:47, 1 January 2023 (UTC)Reply[reply]


File:The Strand (Volume 73).pdf][edit]

Notifying all members of Scan Lab (more info · opt out): (User:Inductiveload, User:Xover, User:Mpaa) This file should be converted to a DJVU and locally uploaded as The Strand Magazine (Volume 73).djvu due to potential copyright issues. I uploaded to Commons to make it accessible in the MC and easier to download than the individual pages from HT. Languageseeker (talk) 00:36, 3 January 2023 (UTC)Reply[reply]

Index:Historic highways of America (Volume 11).djvu[edit]

Pages 58 and 59 of this index are missing, and pages 60 and 61 have been included twice in their places. Page 57 is currently at Page:Historic highways of America (Volume 11).djvu/61 and, e.g. Page 60 at Page:Historic highways of America (Volume 11).djvu/62 (incorrectly) and Page:Historic highways of America (Volume 11).djvu/64 correctly. If someone could fix this, that would be great. Thanks, TeysaKarlov (talk) 21:48, 13 January 2023 (UTC)Reply[reply]

@TeysaKarlov done. Mpaa (talk) 14:17, 14 January 2023 (UTC)Reply[reply]
@Mpaa Thanks. However, do I have to do something on my end to see the new pages, as for, e.g. Page:Historic highways of America (Volume 11).djvu/62, page 60 still shows up instead of page 58. TeysaKarlov (talk) 22:11, 14 January 2023 (UTC)Reply[reply]
@TeysaKarlov It is a cache issue, try to purge that page or the File page. Otherwise just wait some time, it should fix it by itself. Mpaa (talk) 22:48, 14 January 2023 (UTC)Reply[reply]

Index:The Works of H G Wells Volume 7.pdf[edit]

Page xi (which should be on Page:The Works of H G Wells Volume 7.pdf/21) and xii (Page:The Works of H G Wells Volume 7.pdf/22) of this index are missing, and pages xiii and xiv have been included twice (as per discussion page). It would also be nice if this could be fixed, although the Atlantic edition is admittedly slow going. (As for the above, purging worked for one page and didn't for the other, so I think I'll just keep waiting). Thanks, TeysaKarlov (talk) 04:11, 17 January 2023 (UTC)Reply[reply]

@TeysaKarlov I'll try to get this fixed today. Languageseeker (talk) 01:22, 18 January 2023 (UTC)Reply[reply]
@TeysaKarlov File fixed. Languageseeker (talk) 14:18, 19 January 2023 (UTC)Reply[reply]
@Languageseeker Thanks for that. Two less pages to go. TeysaKarlov (talk) 19:24, 19 January 2023 (UTC)Reply[reply]

Index:The House of Mirth (1905).djvu[edit]

It seems that the automatic page crop on pg 184 and 185 zoomed two far in. Would it be possible to replace these two pages from the raw scans on ai?

Done. Mpaa (talk) 23:13, 21 January 2023 (UTC)Reply[reply]


Index:A voyage towards the South Pole, and round the world. Performed in His Majesty's ships the Resolution and Adventure, in the years 1772, 1773, 1774, and 1775 (IA b30413953 0002).pdf & Index:A voyage towards the South Pole, and round the world. Performed in His Majesty's ships the Resolution and Adventure, in the years 1772, 1773, 1774, and 1775 (IA b30413953 0001).pdf[edit]

These were uploaded from IA in good faith as part of Fae's mass transfer of IA books.

Missing scans for pages at ends of both volume 1 and 2. Not clear if IA actually has these, and I don't have a Wellcome Collection login to check their copy has the relevant scan pages. Any suggestions?ShakespeareFan00 (talk) 12:47, 24 January 2023 (UTC)Reply[reply]

See also[edit]

  • Commons:Graphic Lab at Wikimedia Commons - they can help with general image problems
  • Image extraction - guidance for extracting images from scans
  • Requested texts - general text requests. Many of these also need scans to be located.
  • Category:Index - File to fix - contains indexes that have various defects. Please do add templates like {{missing pages}} if needed to indicate what the problems are, but please do not bring the files here unless you would like it fixed to allow work in the near future.