User talk:Languageseeker

From Wikisource
Jump to navigation Jump to search


Hello, Languageseeker, and welcome to Wikisource! Thank you for joining the project. I hope you like the place and decide to stay. Here are a few good links for newcomers:

George Eliot, por François D'Albert Durade.jpg

You may be interested in participating in

Add the code {{active projects}}, {{PotM}} or {{Collaboration/MC}} to your page for current wikisource projects.

You can put a brief description of your interests on your user page and contributions to another Wikimedia project, such as Wikipedia and Commons.

Have questions? Then please ask them at either

I hope you enjoy contributing to Wikisource, the library that is free for everyone to use! In discussions, please "sign" your comments using four tildes (~~~~); this will automatically produce your IP address (or username if you're logged in) and the date. If you need help, ask me on my talk page, or ask your question here (click edit) and place {{helpme}} before your question.

Again, welcome! — billinghurst sDrewth 05:33, 23 February 2021 (UTC)[reply]

Missing pages Index[edit]

Hi, I have a couple of problems with doing this in this way: a) those page numbers exist in full text, so I can't see that they are actually missing; b) if a scan is deficient, then it should be replaced with a complete scan. Sometimes that does mean creating a new scan out of two (or more) incomplete scans. Interpolating pages from different Indexes in transclusion is not a good way of presenting a text. Beeswaxcandle (talk) 02:35, 5 March 2021 (UTC)[reply]

I would prefer a full scan as well, but one does not exist. Do you want me to upload the entire text from Google that is also missing pages? My fear is that if I upload a new scan based on my own interpolation of two incomplete scans, then we will lose the high-quality images from IA and the link to the original IA text. Languageseeker (talk) 02:38, 5 March 2021 (UTC)[reply]
What is the evidence that those pages are missing? They have the same page numbers as the first three pages of the index, which follows directly on from the glossary. I cannot see that there are missing pages in this edition of the book. Beeswaxcandle (talk) 03:00, 5 March 2021 (UTC)[reply]
You're right. You have an Eagle-eye. So, there must of been two editions put out in the same year. Then can you delete Index:How_to_Have_Bird_Neigbors_-_Missing_Pages.pdf Languageseeker (talk) 03:11, 5 March 2021 (UTC)[reply]

Eminent Women series[edit]

Hi, I see you have added several alternate editions to the page Portal:W. H. Allen & Co.. However, these don't belong on this page because they were published by a different company. They belong on a portal for that company, with a cross-reference.

The scans you've uploaded are problematic for several reasons: a) they have the Google disclaimer as the first page. Commons have deemed that page to not meet their PD requirements and it must be removed prior to uploading a file. b) the Google disclaimer page pushes the page numbering out by one. This causes problems with dealing with obverse and reverse pages in some templates. c) most of the early Google scans are awful to work with, particulary in respect of images, but also little care was taken about skewed pages and cleaning the scanner from blots and speckles.

My intention in putting Eminient Women back on the front page was not to increase the amount of work to be done, but to get some more of what is there already done. Beeswaxcandle (talk) 17:51, 5 March 2021 (UTC)[reply]

I thought the goal was to preserve the entire series. This should include the American editions and subsequent revisions of the text because that is all part of the series. If you wanted to preserve only the W. H. Allen & Co. then that is preserving only part of the series. Please don't delete these alternate editions because it substantially improves the presentation of the series. Languageseeker (talk) 18:01, 5 March 2021 (UTC)[reply]
I didn't say anything about deleting them. I'm attempting to provide guidance on scan selection and upload requirements; as well as not mixing different publishers on a single publisher portal. My last comment was a throw away explanation. I have no problem with adding the other editions, I just worry about the ever-increasing backlog of uploaded Indexes waiting to be dealt with. Beeswaxcandle (talk) 18:12, 5 March 2021 (UTC)[reply]
Thanks for the Guidance! :) Since there are so many alternative editions, do you think it might make sense to move this to it's own page? Also, I tried to delete the Google page, but the IA upload tool would not allow me to. I'll wait to go through the rest of the series so that I don't overwhelm everybody. Languageseeker (talk) 18:20, 5 March 2021 (UTC)[reply]
Probably best to make it a Wikiproject page. That way there's a central co-ordination point, but the individual publishers are kept distinct. Not sure why IA upload tool wouldn't let you take out the first page. It's always offered it to me. Beeswaxcandle (talk) 18:35, 5 March 2021 (UTC)[reply]
A Wikiproject page sounds like a good idea. But, I would like to keep a similar format where there is the Title of the work followed by a list of its editions. I think this makes a really cool way to see the evolution of the texts, which ones sold the most, and how they changed over time. How about we make an "Eminent Women Series" Wikiproject as the main hub for these and then we state in the Description that W. H. Allen & Co. published the British Editions and Little, Brown, and Company published the American Editions? Languageseeker (talk) 19:35, 5 March 2021 (UTC)[reply]
The layout of Wikisource:Wikiproject Eminent Women series can be structured however you wish—as long as you have {{process header}} at the top. Beeswaxcandle (talk) 19:41, 5 March 2021 (UTC)[reply]
I'll try to find some time for it this weekend. In the meantime, is there anyway that you could delete Index:George_Eliot_(1904_Mathilde_Blind).djvu. I mixed up the filenames on Common and had it renamed on Commons. Once it got renamed, I made another Index with the correct name Index:George Eliot (1888 New Edition Mathilde Blind).djvu. I don't want the poorly named one causing confusion. Languageseeker (talk) 19:45, 5 March 2021 (UTC)[reply]
@Beeswaxcandle: I created the Wikiproject page. Is there anyway to update the frontpage to the Wikiproject page? I think that we if we only present the British editions, it might dissuade some American users who may not be comfortable proofreading in British English. Languageseeker (talk) 15:38, 6 March 2021 (UTC)[reply]
Yes check.svg Done . However, a high proportion of the edits to the books already done or in progress were made by editors that I know to reside in North America. Beeswaxcandle (talk) 17:20, 6 March 2021 (UTC)[reply]

Index:Modern Literature Volume 3 (1804).djvu[edit]

You going to look for the other 2 volumes as well? It would be nice to have complete sets of multi-volume works like this.ShakespeareFan00 (talk) 09:33, 11 March 2021 (UTC)[reply]

@ShakespeareFan00:I will in the end. Currently, I have over 1,500 items to merge and split and the IA tools is a bit broken.Languageseeker (talk) 03:21, 12 March 2021 (UTC)[reply]

See also the comment here - c:User_talk:Fæ#File:California_Digital_Library_(IA_americanbibliogr06evanrich).pdf, If there's a set you think should exist on Commons, Fæ I think has some bulk upload scripts that can work with IA search queries, which include regexp style matches on 'grouped' identifiers :). (I've found many IA works use sequential identifers for the volumes of a multi-volume set.). I would suggest having a chat with them, as the example he gave was also able to identify what already existed on Commons :) ShakespeareFan00 (talk) 20:53, 17 March 2021 (UTC)[reply]

@Languageseeker: -Also c:User_talk:Fæ/IA_books#query , if you are technically able you might consider asking Fae, if it's possible to have access to the sources for those scripts, so someone (not me) can develop a 'multi-volume' IA-upload tool (blue sky thinking)  :) ShakespeareFan00 (talk) 21:04, 17 March 2021 (UTC)[reply]

File:Early western travels, 1748-1846 V13.djvu[edit]

This appears to be a volume of a multi-volume series. ShakespeareFan00 (talk) 09:41, 11 March 2021 (UTC)[reply]

@ShakespeareFan00:I will in the end. Currently, I have over 1,500 items to merge and split and the IA tools is a bit broken.Languageseeker (talk) 03:21, 12 March 2021 (UTC)[reply]

Index:The Works of Samuel Johnson ... A journey to the Hebrides. The vision of Theodore, the hermit of Teneriffe. The fountains. Prayers and meditations. Sermons.v. 10-11. Parliamentary debates.pdf[edit]

Metadata on Index page seems to be missing? ShakespeareFan00 (talk) 09:57, 11 March 2021 (UTC)[reply]

@ShakespeareFan00: Thanks, I’ll add that ASAP.Languageseeker (talk) 03:21, 12 March 2021 (UTC)[reply]

Index:Japan, its history, arts, and literature (1901 V3).pdf[edit]

Metadata failed to upload with work? ShakespeareFan00 (talk) 09:07, 12 March 2021 (UTC)[reply]

Match & Split[edit]

Hi, I see you are using the Match & Split (User:Phe-bot) pretty heavily. That's cool -- I haven't seen too many people using it lately, and I think it's an under-utilized tool for generating new content here. But, I noticed you have multiple jobs queued up one after the other, which has meant a wait of some hours for my job. Here's how the queue looked a few hours ago, and as of now my Women of the West job still hasn't been started:

8 jobs in split queue.

  • [14/03/2021:14:52:08] Languageseeker en User:Languageseeker/split13
  • [14/03/2021:15:29:34] Languageseeker en User:Languageseeker/split9
  • [14/03/2021:15:58:11] Languageseeker en User:Languageseeker/split4
  • [14/03/2021:16:01:57] Languageseeker en User:Languageseeker/split3
  • [14/03/2021:16:30:21] Languageseeker en User:Languageseeker/split7
  • [14/03/2021:19:35:25] Peteforsyth en Women_of_the_West/Montana
  • [14/03/2021:19:59:19] Languageseeker en User:Languageseeker/split6
  • [14/03/2021:20:01:59] Languageseeker en User:Languageseeker/split5

Is this just a temporary project you're working on, or are you likely to continue using it so heavily for a longer period of time? If the latter, I wonder if we might come to an arrangement of not queueing up too many jobs at a time, so that neither of us slows the other's work down too much. Thoughts? -Pete (talk) 00:05, 15 March 2021 (UTC)[reply]

@Peteforsyth: Great to hear from you. Let's definetly work together. This is part of a much larger project (~1, 500) texts that I'm planning to merge-and-split, so I don't want to hog the merge-and-split bot. What do you think would be a good arrangement? Limiting the number of jobs per day? Running it at night? Maybe, you can let me know when you're planning to use the bot? Languageseeker (talk) 00:39, 15 March 2021 (UTC)[reply]
Thanks for the reply, and for your attention to the job for Women of the West. I don't really know what the best solution is; I don't know your workflow, but it seems like the simplest short term solution might be to just avoid queueing up more than 2-3 match and split jobs at once, which means if somebody ends up behind you, at least the wait wouldn't be more than a few hours. But I'm not sure if that's workable, with a big project. It's my understanding that Phe maybe has not released the code for this bot (?) It would be nice if more than one user could run this program on their server, so we're not limited to one job at a time. But I'm not much of a code person, so I'm not sure how feasible that is. -Pete (talk) 06:35, 16 March 2021 (UTC)[reply]
@Peteforsyth: No problem! I keep a close eye on the match-and-split bot and I didn't want you to lose your work. I think that it's difficult because I'm importing something along the order of 400,000 pages. That means that I really need as much of the bot time as possible. I don't want to hog the resource, but I don't want to waste it as well. In my opinion, the match-and-split bot queue exists so that everyone can get into line and wait. I fully understand that it's no fun to wait when you only have 10 pages to split and a large queue ahead of you. However, if the queue was not there, then the bot would just be sitting idly by. I have a few moments throughout the day that I can add to the queue and I try to make sure that I add jobs so that the bot is running continuously. In between, anyone else can add to the queue. It's a short time crunch that will resolve in about a month or two. Until then, I have to extend an advance apology for any delay. Languageseeker (talk) 02:51, 17 March 2021 (UTC)[reply]

Wikisource:The Philippine Islands, 1493-1803[edit]

Why is this in the Wikisource namespace? If it is a publication series, it should either be in the Main namespace or Portal namespace. The Wikisource namespace is for documents and discussion about Wikisource itself, not for hosted publications. --EncycloPetey (talk) 04:23, 15 March 2021 (UTC)[reply]

Index pages...[edit]

Great efforts..

Can I trouble you to add the pagelists on Index pages when creating? ShakespeareFan00 (talk) 18:02, 15 March 2021 (UTC)[reply]

You might find Help:Gadget-ImportPagelist useful for doing this. ShakespeareFan00 (talk) 18:02, 15 March 2021 (UTC)[reply]

@ShakespeareFan00: What such a gadget exists? So happy. Only question is why is it not enabled by default. Seems very useful. Like importing TOC for biodiversity heritage library books. (Don't tell me that such a gadget exists as well.) Thanks for the information. Languageseeker (talk) 18:23, 15 March 2021 (UTC)[reply]
It's not fool-proof, you still need to manually identify image pages for example.. ShakespeareFan00 (talk) 22:17, 15 March 2021 (UTC)[reply]
@ShakespeareFan00: That seems true of all metadata. It's all made by people and sometimes has errors. However, like all metadata, I think that automatically adding in page numbers is a good start and makes it easier to correct mistakes. Languageseeker (talk) 00:23, 16 March 2021 (UTC)[reply]

Pericles (Yale)[edit]

Have you checked that the scan is complete and accurate? For all other volumes, I have verified (page by page) the source file before uploading. --EncycloPetey (talk) 23:00, 16 March 2021 (UTC)[reply]

@EncycloPetey: Nope, I haven't gotten a chance to check the pages. It took me a while to get this to download because for about a week or two, the Google pdf was defective. I hope it's complete. Languageseeker (talk) 02:54, 17 March 2021 (UTC)[reply]
Thanks for letting me know. I'll check the PDF myself then. --EncycloPetey (talk) 02:55, 17 March 2021 (UTC)[reply]
@EncycloPetey: Felt bad. Just checked. It's complete. Languageseeker (talk) 02:57, 17 March 2021 (UTC)[reply]
Thanks again. It will still need a re-edit though. The text layer for a complicated poetic work like this needs to be better than the standard Google OCR. I've asked someone who know how to do this to generate a DjVu file with a better OCR text layer. Once that happens, we'll be able to more forward. Thanks for grabbing the Google file though. The last time I checked, the file could not be downloaded. --EncycloPetey (talk) 03:04, 17 March 2021 (UTC)[reply]

Notice that because the volumes are part of a series, there is a standard format and title naming convention, with links for those set up in advance. Choosing a new title and file naming format will mean all those prepared links won't work. Usually, when we have a published set belonging to a single series, we use a standard naming pattern for the whole series to avoid confusion. --EncycloPetey (talk) 03:21, 17 March 2021 (UTC)[reply]

@EncycloPetey: I noticed that the DJVU images are very low quality so I created another PDF from the uncompressed images at Haithi Trust. Do you think it might make sense to create an index from it and then move the DJVU OCR to the PDF? file:Pericles (1925) Yale.pdf Languageseeker (talk) 01:35, 24 March 2021 (UTC)[reply]
No, the entire rest of the series is in DjVu format on commons. Having one file from the series in a different format with a different naming scheme would not make sense. The low quality of the original was the key reason we had not pulled the scan in the first place. For the rest of the series, scans of much higher quality usually have turned up at IA sometime after the work enters public domain, but this volume has not appeared yet at IA. There are also persistent issues at Wikisource that make working with DjVu much easier and more reliable than working with a PDF. --EncycloPetey (talk) 20:27, 24 March 2021 (UTC)[reply]
@EncycloPetey: What about regenerating the DJVU from the PDF? The new PDF is 60mb with 40mb of images while the existing DJVU is only 2.24mb. Languageseeker (talk) 22:32, 24 March 2021 (UTC)[reply]
That is a question for @Xover: or @Inductiveload:, as I do not have the knowledge of the specifics behind how the current DjVu was created or whether the difference in file size is significant. --EncycloPetey (talk) 23:59, 24 March 2021 (UTC)[reply]
Uhm. Why would we generate a DjVu from the PDF when the original scan images are available? File size in general is a poor proxy for quality, and even more so across formats; so it is entirely possible that a ~2MB DjVu is higher quality than a ~60MB PDF. What is the actual problem we're trying to solve here? --Xover (talk) 07:13, 25 March 2021 (UTC)[reply]
Oh, this was about the completely crap scan at Google Books (309x464)? Yeah, that's completely unusable. I grabbed mdp.39015053493998 from HathiTrust (2538x3914) and uploaded it over File:Pericles (1925) Yale.djvu (checked, but only cursorily). Incidentally a good illustration of why file size is a poor proxy for quality in the general case: this scan encoded well in bitonal so the result is just 4.7MB even though it contains ~70 times as many pixels. Ping: EncycloPetey, CC: User:Inductiveload (FYI). --Xover (talk) 08:29, 25 March 2021 (UTC)[reply]

Index:Uncle Tom's cabin, or, Life among the lowly (1852 Volume 2).djvu[edit]

This seems to be a duplicate? Can you look over these again please? ShakespeareFan00 (talk) 10:53, 23 March 2021 (UTC)[reply]

There are several distinct printing of Uncle Tom's cabin. The first has a plain title page, the second says Ten Thousand Printing on the Title Page, and, if memory serves me, the third says Seventieth Thousand. I'm not sure about all the details of the textual variants, but they do appear to be distinct editions. I've treated the one with a plain title page as the original printing. Languageseeker (talk) 12:45, 23 March 2021 (UTC)[reply]
Now resolved , ThanksShakespeareFan00 (talk) 18:19, 23 March 2021 (UTC)[reply]

Duplicate index[edit]

Hi. I happened to notice that another user added a file (that you brought here: Index:Madame Roland (1896).djvu) for a work that appears as proofread and partly validated, the first (1888) edition. I had a quick glance and it seems to be merely a reprint, am I missing something? CYGNIS INSIGNIS 15:18, 25 March 2021 (UTC)[reply]

Appendix 2 of General Washington's Dilemma by Katherine Mayo[edit]

Hi, I am wondering whether you could tell me where things stand with regard to this matter here: Unfortunately, so far this has been attributed to the wrong edition of this book. Thanks. Arbil44 (talk) 18:37, 26 March 2021 (UTC)[reply]

I hope this might help the situation?
Captain Henry Greville's letter from 'General Washington's Dilemma' by Katherine Mayo (Jonathan Cape, 1938)
Arbil44 (talk) 16:04, 27 March 2021 (UTC)[reply]
@Arbil44: If this is published in the UK, then it's under UK copyright. I added the book that you linked in the original post. Languageseeker (talk) 17:42, 27 March 2021 (UTC)[reply]
Languageseeker, I have been given the go-ahead by Nthep who is one of the editors who offer copyright guidance. See here: [[1]]. I hope they will be kind enough to reassure you here. That is why I provided the frontispiece of the Jonathan Cape edition (which has an Appendix 2), because it makes no sense to link to the Harcourt, Brace edition, which has no Appendix 2. It is useful for every other page of the book though, because it is online, so is linked. Arbil44 (talk) 18:26, 27 March 2021 (UTC)[reply]
I don't know what the issues are here but this book is definitely out of copyright in the UK. The author died in 1940 so even with the URAA renewed copyright the book became PD (in the UK) on 1 January 2011. As far as the US copyright is concerned the copyright from 1938 was not renewed so the text would be PD in the US after 25 years. Appendix 2 is unpublished in the US so is PD on the basis of it being more than 70 years since the death of the author. Nthep (talk) 19:47, 27 March 2021 (UTC)[reply]

Setting of Index page and file data[edit]

With regards to your creation of index pages like Index:The Works of H G Wells Volume 14.pdf, could you please take a little more care in setting up the fields correctly, which I assume you are reading directly out of some kind of database and setting automatically? For example, it should, at minimum, be something like:

  • Title: ''[[The Works of H. G. Wells]]''
  • Volume: [[The Works of H. G. Wells/Volume 14|Volume 14]]
  • Author: [[Author:H. G. Wells|H. G. Wells]]
  • Volumes: either a manual list on each page, or, I recommend, a template like {{The Works of H. G. Wells volumes}} (which I have created for you).
  • Key: Works of H. G. Wells, The
  • Publisher and Location fields as appropriate.

Failing to fill this in when uploading makes more work for others, especially if you're using a script (I assume you are due to the lack of input validation) and can fix it "at source" rather than expecting others to repetitively edit 14 separate pages. I am firmly the opinion that "dumping" indexes is a good thing, as it's a background task that makes it easier for others to get started, but there is a level at which it actually starts to create maintenance work for others rather than saving set-up work.

You have also not entered the information into the Commons file page's information template. Inductiveloadtalk/contribs 14:26, 3 April 2021 (UTC)[reply]

@Inductiveload: These files were added manually and it took me over a day to do all of them. I really wish that we had a Haithi Trust and Google Book uploader to save time , but sadly we do not. Part of the problem is the the Index page is not reading all the metadata on Commons. For example, the Common files have Series title and Volume set, but the Index does not make much use of them. Much of your formatting could be done automatically by just reading the Commons data and saving times for both me and other users, e.g. * Volume could be set as [[%Series title%/Volume %Volume%|Volume %Volume%]] Languageseeker (talk) 16:38, 3 April 2021 (UTC)[reply]
This work is not in a series, as far as I can tell. The title of the (28-volume) work is "The Works of H. G. Wells". If there were a series, it would be something like "Atlantic Collected Editions" (or whatever). This is why it doesn't work.
We have the Fill index gadget, and it's on by default, which does exactly what you want (and more), but it doesn't work if the Commons file has incorrect or missing metadata.
Improving this kind of thing is on my list and has been for a long time. Meanwhile, but not having a tool, while frustrating, is not an excuse for making a mess for others to deal with. Furthermore, if you'd just asked "why doesn't this data import properly", someone could have told you what was wrong with it, before you assumed the tools are defective.
Further furthermore, if the Fill index gadget doesn't produce a good result, for whatever reason (say, you set the author to Lennox, Charlotte, ca. 1729-1804. and it can't figure out that should be Charlotte Lennox), that doesn't mean you should just slap "save" and let someone else deal with it. You should fix the link yourself to be [[Author:Charlotte Lennox|]]. What you can do is report the failure of the script to detect that case as a bug, either at WS:S or directly to me (since I can maintain that tool). Inductiveloadtalk/contribs 16:57, 3 April 2021 (UTC)[reply]
@Inductiveload: Funnily enough, this series is commonly referred to as the "Atlantic Edition"s. It's a series because it's 28 volumes released as a special collection over 3 years limited to 1,500 copies containing all of Well's previously released novels with new corrections. The Atlantic Editions are generally used for critical editions of Well's works. I didn't realize that you maintained the gadget and I'll leave you a comment in the future. Would it be possible to have a refresh metadata button when editing the Index page so that if the metadata get fixed on Commons its possible to apply these changes to Wikisource? I'm not trying to make your or anybody else's life more difficult. However, if I don't add the Index pages, then it's hard to tell that these files have ever been uploaded to Commons because nothing will inform a person that they exist already. I know that there's always more work to be done, but I also have limits. That's what I appreciate about this site. If I spend 2 hours to get the images from Haithi Trust, OCR them with Tesseract, upload them to Commons, add basic metadata, then somebody else will do the rest. I don't have infinite time, but I'm trying to improve the collection by creating Index for the editions that the author contributed to of major English work. Right now, this site still lacks scan-backed copies of many of the most fundamental texts in the English canon. If we want to attract users, then these are the works that will garnish the most attention. That's why I'm trying to track down and add Index pages for these works. Languageseeker (talk) 02:04, 4 April 2021 (UTC)[reply]
For the purposes of Wikisource, this is a single work with multiple volumes. The series title is not used in the index page, which is why it doesn't import. You might also notice that the book template at commons has no title, because it also expects a book to have a title. In this case, the title is The Works of H. G. Wells. There is no series, unless the Atlantic Editions has other sets of authors' collected works.
Like I said, I don't have an issue with pre-emptive creation of index pages. But dumping of any pages that are clearly sup-par is bad etiquette. There's enough maintenance to go around, and it's up to everyone to avoid adding to the backlog when they can. Clearly missing titles and broken volume links are obvious issues that you can see when you preview or save an edit. You don't have to get it perfect, but you should at least aim for "not broken".
I added a button to the sidebar yesterday to re-fill the index data. It's titled "re-fill index data".
I'm not saying I don't appreciate that you made the files, but please pay attention when a page looks broken. And ask if you want advice, because someone can tell you before you create more work for yourself and others. You didn't need to know I maintained that gadget to ask at WS:S why all your index pages are coming out broken. Inductiveloadtalk/contribs 06:58, 4 April 2021 (UTC)[reply]

File:Hamlet Q2.pdf[edit]

This file needs to be remade (I have not checked the other Hamlet editions to see whether they have problems).

The PDF you uploaded of Q2 has two pages of the original displayed on each page of the scan. This breaks page numbering and many other features that the Index work normally provides. --EncycloPetey (talk) 04:58, 13 April 2021 (UTC)[reply]

Yep, that’s why I hadn’t created the Index for them yet. I tried using scan tailor and it created a mess so I posted in the Commons help section. Languageseeker (talk) 05:26, 13 April 2021 (UTC)[reply]

Speedy deletion request[edit]

Hello. May I ask what you mean by "West 6" in the speedy delete request of Index:First Folio (West 150).pdf. Can you provide a link there, please? Thanks. --Jan Kameníček (talk) 08:06, 13 April 2021 (UTC)[reply]

Index:Hamlet, First Quarto, 1603 (Huntington Shelfmark 69304)[edit]

This is not ready to be transcribed. The source file should first be fixed. --EncycloPetey (talk) 00:50, 18 April 2021 (UTC)[reply]

The typical reader coming to Wikisource will prefer a modern edited edition over a Quarto or Folio. Quartos are the realm of the specialist. What about sectioning off the page into "Edited editions", "Folio editions", and "Quarto editions"? --EncycloPetey (talk) 23:00, 21 April 2021 (UTC)[reply]

@EncycloPetey: I’m happy if there were headings for the three, but I wouldn’t want to create separate pages for each. I’m also fine with version pages for each Quartos expect for Q2b which only one extant copy. Languageseeker (talk) 00:03, 22 April 2021 (UTC)[reply]
No, not separate pages. We place all editions of a work on the same page. --EncycloPetey (talk) 00:06, 22 April 2021 (UTC)[reply]
@EncycloPetey: Take a look at Hamlet (Shakespeare) and tell me what you think. Languageseeker (talk) 00:32, 22 April 2021 (UTC)[reply]
Something like that yes, but I've moved "scholarly editions" to the top, as those will be of more general interest to the average reader. The Quartos and Folios will be of interest to the specialist. --EncycloPetey (talk) 01:20, 22 April 2021 (UTC)[reply]
@EncycloPetey: Looks good. Languageseeker (talk) 01:58, 22 April 2021 (UTC)[reply]

You are all over the place. Finish your works.[edit]

IMNSHO You are leaving spots of crap all over the place. Please focus on completing quality works—and less on trying to reorganise WS in your own vision—and then we can bring them forward and display them. What you are doing at the moment is a bloody mess leading to rubbish and confusing presentation. Focus, please. — billinghurst sDrewth 03:12, 18 April 2021 (UTC)[reply]

@Billinghurst: I thought civility was a key part of Wiki doctrine. I cannot finish creating scan links because Commons is having serious issues with uploading files. For now, I'm trying to make sure that there are transcriptions projects for the major authors in English. Languageseeker (talk) 03:23, 18 April 2021 (UTC)[reply]
Then park it somewhere. Or develop it in your user ns, that is the purpose of user ns. Don't go leaving this stuff all over the place. Main ns is not a development space, it is our main display space. We keep telling you this. Ears on, and give us a break. — billinghurst sDrewth 03:31, 18 April 2021 (UTC)[reply]
So much for not judging. o_O -Pete (talk) 03:58, 18 April 2021 (UTC)[reply]

First Folio[edit]

The First Folio of Shakespeare was published in 1623, not 1621. It is also a very challenging work to proofread because of the older font and the sheer density of text on each page. --EncycloPetey (talk) 02:51, 26 April 2021 (UTC)[reply]

@EncycloPetey: Thanks for the correction. I know that the First Folio is super challenging, but it's also important enough that it should be here. The no expiration section is designed to get major texts that are challenging proofread and validated. Languageseeker (talk) 17:20, 26 April 2021 (UTC)[reply]
It might be better to begin with Hamlet Q1, which is still lengthy and complex, but which at least has a text layer to start from and won't take nearly as long. I tried working on our copy of the First Folio facsimile (which has all the same text as the FF since it's a facsimile) and found that it was excruciatingly difficult to tackle. Since the FF has no text layer, there is no starting point for any editor. That obstacle will mean that most potential editors don't even make an attempt. --EncycloPetey (talk) 17:31, 26 April 2021 (UTC)[reply]

It might be good for 'someone' to establish in the index talk page such points as to whether long-s should be preserved or substituted, and undoubtedly other points that should be known before attempts are solicited. For instance in Page:West 192 005.jpg, "Grauer" or "Graver"? Shenme (talk) 13:27, 27 April 2021 (UTC)[reply]

@Shenme: I added a note about replacing long s with feature a {{ls}} on the index page. Otherwise, the original orthography should be preserved. Languageseeker (talk) 13:33, 27 April 2021 (UTC)[reply]

Is it possible that the text at somewhere like is suitable for use? I haven't checked closely, because I don't know enough about First Folios to say if the text between copies are similar enough. Our image for this page is here. Inductiveloadtalk/contribs 13:45, 27 April 2021 (UTC)[reply]

@Inductiveload: The First Folio is really tricky because printers corrected errors as they printed and the importance of this book makes such differences important. If someone discovers that there is an extra comma in Hamlet in a line of dialogue between him and Ophelia in West 190, I'm sure that they will turn this into an academic article arguing that this new comma means that we need to fundamentally rethink the relationship between Hamlet and Ophelia. I'm not sure which Folio UVic based their electronic edition on, so I'm wary of match-and-splitting it least we end up with a situation like the Great Gatsby. My goal is to produce a scan-backed edition of one-specific First Folio that can be used and cited. I think that typing in is the safest bet for now. Languageseeker (talk) 15:20, 28 April 2021 (UTC)[reply]

First Folio (West 192) questions[edit]

I remember seeing _somewhere_ a note about "vv", as seen in

I vvould vvith ſuch perfection gouerne Sir:

What to do with these here, e.g. Tempest page 7?

It's a bit strange that there are both "vvould" and "would" on the same page! Shenme (talk) 02:30, 30 April 2021 (UTC)[reply]

@Shenme: Proof it as you see it: vv for vv and w for w. Typesetting was a strange business. Maybe, they just ran out of w. Who know? Languageseeker (talk) 02:37, 30 April 2021 (UTC)[reply]


So I think I have some rudimentary statistics set up for the MC. Please be patient as we get started, because it might need tweaks, and it'll run manually at first.

Also the progress bars on the tiles can only update when the bot runs. Looking at the PHP, it's likely we can add to the ProofreadPage extension but that needs some thought, effort and then a review/deploy cycle, so even if I had time to do it, it probably won't happen soon. Inductiveloadtalk/contribs 19:39, 30 April 2021 (UTC)[reply]

@Inductiveload: I’m utterly gobsmacked that you were able to pull off any kind of stats. A gigantic kudos. When should we make it live? Also, if it’s going to be in the CoTM section, then the template should be changed : text -> texts. Lastly, should we put a running count anywhere like they do in frWS? Languageseeker (talk) 19:46, 30 April 2021 (UTC)[reply]
It'll get better as we collect a few days' stats, and a running total will start making more sense. For the first month, the running total in the table will be the MC global total anyway.
Re starting, tomorrow is the 1st? Since CotM (lol, M) is pretty much moribund, I don't have an issue with changing CotM to MC (rather than just adding MC). I set up {{Collaboration/MC}} preemptively. It should automatically update based on the monthly data structure (you can set month manually until midnight, I think UTC, so about an hour from now), and it calls out the "sprint" works particularly (uses the same data source). Probably as good idea to verify at the Scriptorium first, but I'd support it.
BTW, we now have 13505 pages in the MC, so I think we have enough works, or we'll spread effort so thinly that no work will ever get done before expiry even with thousands of edits a month. Inductiveloadtalk/contribs 22:46, 30 April 2021 (UTC)[reply]
@Inductiveload: Small nit on the stats. The front page count is off by one because the category also includes the May 2021 page. Is there any way to base the count on Indexes in the category? Languageseeker (talk) 03:06, 1 May 2021 (UTC)[reply]
Small note on stats: day 1 is undercounted and day 2 overcounted because it turns out the pr_index DB table needs the indexes to be purged to be accurate, which I didn't realise until day 2. I'm working on a "perfect" system that will actually use revision timestamps, which will allow far better accuracy as well all sorts of other bling-bling statistical frippery. But that needs its own DB to keep track in any kind of sane timeframe. Inductiveloadtalk/contribs 19:10, 3 May 2021 (UTC)[reply]
I think the stats are now up and running on an automatic basis. It's been a bit of a mission with new things to me, but think I have a fairly general system working now. I guess we'll see exactly how well it rolls over at the end of the month! Inductiveloadtalk/contribs 01:33, 7 May 2021 (UTC)[reply]
You were right. Stats really a such a vital part. I'm extremely grateful that you are part of this site. You've improved it significantly even in the few months that I've been here. Hopefully, the Monthly Challenge will work out and will become the central place to direct all new users to. It's really exciting to see the texts being work on. Languageseeker (talk) 03:05, 7 May 2021 (UTC)[reply]

" After thirty-days, this proposal will be considered passed."[edit]

Pull your head in. — billinghurst sDrewth 10:12, 8 May 2021 (UTC)[reply]

match and split[edit]

The attempt to match and split from a text to Index:Early western travels, 1748-1846 (Vol 1 1904).djvu, et al, does not appear to have been successful. Are you intending to add footnotes and other content that does not appear in the Page ns.? CYGNIS INSIGNIS 12:05, 13 May 2021 (UTC)[reply]

Not really. This was an import from PGDP and I don't intend to work on the text further. Languageseeker (talk) 12:50, 13 May 2021 (UTC)[reply]
Would you agree the pages in those indexes should be deleted? CYGNIS INSIGNIS 17:16, 13 May 2021 (UTC)[reply]
{re|Cygnis insignis}} The text has been proofread at another site so most of it is quite high quality. Languageseeker (talk) 20:04, 13 May 2021 (UTC)[reply]
Is this a sincere response? CYGNIS INSIGNIS 21:35, 13 May 2021 (UTC)[reply]

Images - proofread vs problematic[edit]

FYI, pages like Page:The New Negro.pdf/389 should not be "proofread" until the image is extracted (cropped, rotated, cleaned if necessary). If you're using a raw page image, it's still "problematic". Inductiveloadtalk/contribs 22:12, 13 May 2021 (UTC)[reply]

Nevermind, I see that the images have been updated in situ. Carry on. :-) Inductiveloadtalk/contribs 19:57, 14 May 2021 (UTC)[reply]
Yep, my strategy is to put them in the right place and then place them on the Help Desk on Commons for image extraction and background removal. Languageseeker (talk) 19:59, 14 May 2021 (UTC)[reply]

for tracking a series of targeted indexes[edit]

If you put text, eg. "Monthly challenge 2021 May" onto the Index: page somewhere, you will be able to use the the key parameter of Special:IndexPages. I explained its use back in 2011 ... Wikisource:Scriptorium/Archives/2011-12#Getting a progress bar but here is a short example [2]

I used to do something like <font color="white">November2009 POTM</font> so it wasn't visible when someone viewed the page, though it was able to be picked up by the search function. You would have to do a css version as the lint grumblers won't like it. — billinghurst sDrewth 11:59, 19 May 2021 (UTC)[reply]

Don't play games with me[edit]

{{new texts/item|LINK TO WORK|Author name|YYYY|nowiki=yes|display=Preferred title to display}}

There is nothing about advertising or other components outside of the title, the author, the year. So please keep within the guidance. — billinghurst sDrewth 12:04, 19 May 2021 (UTC)[reply]


Please keep that template's use to your project space, it is makes an abysmal mess on author pages. I am also uncertain why you are encouraging the download of pdfs, In find that they are inferior to djvus for ease of transcription.

They exist because the IA tool is currently broken. It's not easy to create a list of 20 volumes. Languageseeker (talk) 12:31, 19 May 2021 (UTC)[reply]


Hello, please make sure to revert all the edits instead of only the most recent one. I undid the two other edits by the (now locked) user. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 09:54, 23 May 2021 (UTC)[reply]


"You asked for this work to be featured and we generally do not remove texts because that creates significant problems". your edit summary

I don't recall asking, where was that? CYGNIS INSIGNIS 06:00, 8 June 2021 (UTC)[reply]
@Cygnis insignis: See the conversation here Wikisource talk:Proofread of the Month#Propose_work_by_Chesterton. I deeply respect you as a user and I don’t want to edit war with you. However, the MC is a community project and I’m not sure why you feel so strongly that this work needs to be removed from it. You are absolutely free to work on it while it’s being featured and I would appreciate you doing so. However, it sets a very poor precedent if we allow users to remove any text they want from a community collaboration. Languageseeker (talk) 12:11, 8 June 2021 (UTC)[reply]
I see where I mentioned it at POTM, it wasn't taken up there so I erased the notion. Guess I owe you apology for that, sorry. I would not have proposed that it be featured as one of many recently uploaded index for others to start, because I decided to work on that during the next few weeks. The subsequent part of your reply is based on a misunderstanding, I think, and tending toward casting aspersions, please don't do that. Your new project is of little interest to me, it is not of course compulsory. CYGNIS INSIGNIS 12:51, 8 June 2021 (UTC)[reply]
Restore your edit, I guess I gave it up with the "anyone is welcome", I'll find something else to work on. CYGNIS INSIGNIS 12:55, 8 June 2021 (UTC)[reply]
@Cygnis insignis: Didn't mean to cast any aspersions. Sorry if I sounded like that. Is there any reason that you feel like you can't work on it right now? It would be a great contribution to Wikisource. Languageseeker (talk) 13:05, 8 June 2021 (UTC)[reply]
It would be a fine collaboration for newer users, easy to proofread, short sections and funny. However, most indexes are completed by the uploader, one gets to know the things to look out for, for instance, and is interested in seeing it through. It is certain any other proofreaders will have different ideas about how it is done, again, not as interesting as proofreading. Regular contributors are generally interested in there own business, when they are expanding on our works. CYGNIS INSIGNIS 13:44, 8 June 2021 (UTC)[reply]
:::: @Cygnis insignis: I’m sorry that you feel that way. I still think that you can contribute to that work, but I can understand if you don’t feel comfortable. Let me know if they’re anything I can help with in the future. Languageseeker (talk) 14:19, 8 June 2021 (UTC)[reply]

Volumes and page break[edit]

Hi. There is no requirement to publish compilation works as volumes, and in fact, there is plenty of good reasons to not do so. It is not an artefact that needs to remain in our page titles. Also with the use Template:Page break, it is preferred in the age of transclusion and automated marginal page numbers to use {{page break}} which presents without a name in the divider. The old form was for when we didn't transclude pages and needed to force page numbers. We retain it for the old old works, and where we may join errata into a work. Thanks. — billinghurst sDrewth 02:49, 27 June 2021 (UTC)[reply]

Welcome back[edit]

I've been keeping the MC warm for you :-) Inductiveloadtalk/contribs 15:51, 7 September 2021 (UTC)[reply]

@Inductiveload: Thank you! I feels great to be back after my hiatus. I see that you've made lots of great changes in the meantime. Glad to be back! :) Languageseeker (talk) 02:02, 8 September 2021 (UTC)[reply]

New texts[edit]

Hi, please keep older entries when adding new texts (this has been fixed now). Thanks. Mpaa (talk) 22:44, 16 September 2021 (UTC)[reply]

@Mpaa: Sorry about that. Won't happen again. Languageseeker (talk) 00:28, 17 September 2021 (UTC)[reply]

Rambles in Germany and Italy in 1840, 1842, and 1843[edit]

Hello. You have recently nominated Rambles in Germany and Italy in 1840, 1842, and 1843/Preface and Rambles in Germany and Italy in 1840, 1842, and 1843/Table of Contents for speedy deletion with the rationale "duplicate". Can you specify what they are duplicates of, please? Thank you very much. --Jan Kameníček (talk) 08:45, 17 September 2021 (UTC)[reply]

Template:Re:Jan.Kamenicek The content on those pages got moved to Rambles in Germany and Italy in 1840, 1842, and 1843/Volume 1 to fix the epub export. Languageseeker (talk) 11:11, 17 September 2021 (UTC)[reply]
I see, thanks. --Jan Kameníček (talk) 11:23, 17 September 2021 (UTC)[reply]

image size[edit]

i undid, you can change the image size in your preferences. Cygnis insignis (talk) 17:56, 17 September 2021 (UTC)[reply]

@Cygnis insignis: Thank you! I did not know that preference existed. Languageseeker (talk) 12:38, 18 September 2021 (UTC)[reply]
At Gutenberg there is a transcriber's note suggesting that the image can be enlarged by clicking it, that also works here. Cygnis insignis (talk) 12:50, 18 September 2021 (UTC)[reply]

Local uploads[edit]

cf. File:AGC.pdf. When uploading files locally on enWS due to Commons-incompatible licensing outside the US, please tag them with {{do not move to Commons}} with either suitable values set in the |expiry= and |why= parameters, or an explanation in prose placed somewhere suitable (the |permission= field of the book template is often a good place). Files without such tagging can be transferred to Commons at any time, where they will be deleted and the user doing the transferring getting dinged for uploading a copyvio. Xover (talk) 06:03, 27 September 2021 (UTC)[reply]

descriptive summary for new texts[edit]

Hi. When you are adding works to Template:New texts it would be great if you would be able to add a descriptive summary, as is requested on the instructions for that page. Thanks for your help there. — billinghurst sDrewth 15:50, 30 September 2021 (UTC)[reply]

Strand uploads[edit]

As you might see on Phab, the server-side upload is underway. I'll add index pages soon. If you want to pick out any Sherlock Holmes from them, now is the time! Inductiveloadtalk/contribs 18:46, 30 September 2021 (UTC)[reply]

October challenge set-up[edit]

I know the other works still need to be added, but I think you have added too many new works for the month. I think one of the novel-like works should be dropped (Elizabeth Fry over Valperga). However, I see now that a number of these works aren’t too long (only around 100 pages or so), so maybe I’m worrying too much. Perhaps the London or Burroughs instead, or in addition? Footsteps is also a bit long. (These concerns come from a time when projects struggled to finish one 200 or 300 page work, so they may be overstated.) Thanks for your work on the monthly challenge, by the way; it’s great to see a lively collaboration. TE(æ)A,ea. (talk) 23:01, 30 September 2021 (UTC)[reply]

I'd like to second the thanks for setting up October, it's looking like a good one!
There do seem to be rather a few new works, but now we're off and running, let's just see how it goes, I think. At least some of them are already half-done. Inductiveloadtalk/contribs 06:47, 1 October 2021 (UTC)[reply]
@TE(æ)A,ea., @Inductiveload: Thank you for the kind remarks and thoughtful comments. For me, this month is a bit of an experiment. I'm trying to figure out if the texts that failed earlier did so because there were too many texts or because the texts were the wrong kind. My hypothesis that May had too many hard texts that did not attract users. So, this month, I've picked popular works that should attract a broader audience. Also, I've tried to select quite a few shorter and partially done texts to make it easier for users.
Also, I'm trying to use the MC challenge to attract new users to WS. Hopefully, if a user sees a text that they would like to proofread in MC that will incentive them to join and contribute. As the user bas grows, it should be possible to increase the number of texts in the MC. Lets see and hope for the best. Languageseeker (talk) 01:43, 2 October 2021 (UTC)[reply]

Bowdler’s Shakespeare[edit]

I have created pagelists for the volumes, excluding volume 5; most are fine, but a few need to have some duplicated pages removed, and all need the leading Google page to be removed. I have also created the navigation template here. TE(æ)A,ea. (talk) 20:36, 9 October 2021 (UTC)[reply]

@TE(æ)A,ea.: Thank you! Languageseeker (talk) 02:28, 10 October 2021 (UTC)[reply]

Secret Garden[edit]

There is a full cover also, (File:Secret Garden-Kirk-0001.jpg} for the header and/or if you want it for the first page. Do you want some wikidata for it (I like that for the images), I made a title page for use there..--RaboKarbakian (talk) 23:16, 17 October 2021 (UTC)[reply]

@RaboKarbakian: Thank you so much for your swift and excellent work! I've noticed details in those pictures that I've never see before. Wikidata is always a great idea in my opinion. Languageseeker (talk) 01:09, 18 October 2021 (UTC)[reply]