Wikisource talk:WikiProject Royal Society Journals

From Wikisource
Jump to: navigation, search

Incomplete source[edit]

I think we must be allowed to complete djvu from more than one source, when a first source with clean scan exists and is nearly complete and a second source can be used to complete the first. For example volume 6 on commons lack of the front matter page, a few other front matter issue and two of the three index pages. Phe (talk) 09:15, 15 May 2010 (UTC)

Change in sub-page[edit]

I've modified the description to separate Preface, dedication, index and front cover. Name actually used are /Epistle Dedicatory, /Title Page, /Preface, /Index, /Table of contents, should we name the two first /Epistle dedicatory and /Front page (or /Front cover?)? Phe (talk) 06:35, 16 May 2010 (UTC)

Paper in Latin[edit]

from second volume many paper are in Latin, some volume are mostly in Latin, one way to deal with that will be to duplicate index: on la:, mark page in English or Latin as empty on en: or la:, but this doesn't fit very well with our per issue sub-page. Any idea how to organize the work? Phe (talk) 12:04, 15 June 2010 (UTC)

Index duplicated as described above, page are corrected based on their lang, either on en: or la:, page not corrected on one wiki are transcluded from the other in the Page: namespace, this way we can use the normal <pages index="" transclusion in main to get the full text on both wikis. Drawback, duplicated text on the two wikis (but no duplicated Page: so statistics will remain ok). Another way, more difficult to handle, in main only transclude the text in the native lang of the wiki, use the prev/next article to link directly to the right wikis in a transparent way for the user. Drawbacks: first people will go with an interface in latin, for reader only it's not easy to figure out what happens. Second, it doesn't fit with our per-issue page in main (Latin/English mixed). I'm for the first one, even if we mix Latin/English on both wikis in main. Phe (talk) 09:27, 24 June 2010 (UTC)
If the Royal Society is similar to other pre-twentieth century scientific journals, they probably switch between English and Latin based on the context. For example, taxon descriptions were typically written in Latin until the 20th century. This means there may be pages with both English and Latin. Kaldari (talk) 19:39, 23 July 2011 (UTC)

DJVU Conversion and Crop[edit]

Hi there. I've read the news about the torrent here and would be willing to do some coding to attempt to automate the conversion & cropping procedure. However, I'm on a 2G wireless link in India right now and thus can't download a source. If someone could message me with a URL to a single PDF (ensuring that it's representative of the set) then I could attempt to write the code. Pratyeka (talk) 13:48, 22 July 2011 (UTC)

I would be very happy to. Give me a bit to download one of the files (they are zipped into blocks, so I'll need to get a whole block). Inductiveloadtalk/contribs 14:10, 22 July 2011 (UTC)
Here you go! Inductiveloadtalk/contribs 22:01, 25 July 2011 (UTC)
Solution. Use pdfimages to extract images, which should come out in order and exclude watermarks. Then, use imagemagick's convert tool to rebuild a new PDF, ie: pdfimages original.pdf page- (makes page-00.jpg, page-01.jpg, etc.) then convert page-*.jpg new.pdf (makes new.pdf without watermarks). Try that and let me know if it works fine on the whole collection. If not I will try to assist with problem areas. Pratyeka (talk) 06:29, 26 July 2011 (UTC)
It's a minor note but, instead of imagemagick, it might be simpler to build the the DjVu at that stage instead of rebuilding a PDF and then converting it to DjVu later. - AdamBMorgan (talk) 11:44, 26 July 2011 (UTC)
Thank you, Pratyeka. I will have script ready to do that in a few hours. I can now do the conversion upload myself, as djvu conversion, OCR and upload is not trivial to automate for someone who is not set up, as a lot of the software needs to be rebuilt from source. Inductiveloadtalk/contribs 14:02, 26 July 2011 (UTC)
Additionally, the logic to recombine volumes of journals is not so easy to do, so it may be some time before I have uploadable results.Inductiveloadtalk/contribs 14:33, 26 July 2011 (UTC)


Cropping the pages to remove the watermarks seems like a drastic measure. It changes the aspect ratios of the pages (which are important for proper printing and collating). Surely there must be a better way to remove the watermarks. Also, are we even sure that cropping actually removes the watermarks in a PDF, i.e. would they still exist outside the margins? I think we should leave the watermarks for now until we have a less destructive way to remove them. Perhaps they all share some object ID in the PDF that could be used to remove them programmatically. Kaldari (talk) 19:45, 23 July 2011 (UTC)

Cropping won't typically work to remove watermarks in PDFs with layers. Otherwise you are right about uneven aspect ratios throughout the PDF when cropping. If you can find a way to remove the watermark without altering the base PDF, you sure would save a heck of a lot of pointless "scrapping and assembling" around here. — George Orwell III (talk) 20:46, 23 July 2011 (UTC)
It's easy to remove the watermarks with Adobe Acrobat Professional. In Tools -> Advanced Editing -> TouchUp Object tool. Ivanov224 (talk) 15:07, 24 July 2011 (UTC)

I can remove all watermarks in 14 days (may be shorter) but how then I give the watermark-free PDFs? Ivanov224 (talk) 20:37, 24 July 2011 (UTC)

Work order[edit]

Scientifically, the later issues are of much more value than the early issues (especially for biology). I would encourage people to work from 1923 backwards rather than from 1665 onwards, but of course I suppose this is up to the preference of the uploader :) Kaldari (talk) 19:49, 23 July 2011 (UTC)

Watermark removal?[edit]

I can remove all watermarks in 14 days, may be less. I'm a prepress man and I'm not familiar with wiki's interface so I need assistance. Does anyone want me to remove the watermarks? If someone wants, we could share the work. I've also combined the metadata TXT files in a searchable single PDF - if that helps someone... Ivanov224 (talk) 21:12, 24 July 2011 (UTC)

Images, files and the like are mostly handled by Wikimedia Commons. If you repost this in their main discussion area I'm sure you'll find someone with the requisite knowledge to collaborate with. Prosody (talk) 21:30, 25 July 2011 (UTC)

Watermark copyright[edit]

The document content may be argued to be public domain, however the watermark with copyrighted logo is not. These should be removed before uploading to Commons rather than afterwards otherwise there is what appears to be a perfectly valid reason to speedy delete these uploads from Commons. -- (talk) 15:52, 25 July 2011 (UTC)

Probably nobody reads the discussion. I agree with you and... still nobody seems to read this. I wish to help the project but if nobody responds soon, I will have to go back to my job duties and wish you 'Good luck'. :) Ivanov224 (talk) 19:46, 25 July 2011 (UTC)
I would be happy to raise a mass deletion request on Commons if nobody is bothered here. -- (talk) 19:57, 25 July 2011 (UTC)
Things go slower around here than on Commons. Any discussion about deletion of files should go there, as we only host the text, not the files, at Wikisource. What we really need a simple script that can nix the watermark from the PDFs. Cropping might leave the watermark intact, just off the edge of the page, and would mess up the aspect ratios. Does anyone know how to programatically modify PDFs? I have had a look, but I can't see a non-GUI pdf manipulator out there with that capability. Inductiveloadtalk/contribs 22:10, 25 July 2011 (UTC)
I'm not suggesting to crop the pages but to remove the watermarks. Don't know of a programatically way to modify PDFs. At least not in such way. I'm offering to remove the watermarks by myself. A watermarked archive to be uploaded somewhere -> then I download it -> remove the watermarking -> upload it again. That's it. :) Ivanov224 (talk) 02:33, 26 July 2011 (UTC)
WM-UK received an official complaint about these images yesterday. As they have been uploaded with the copyrighted logo on every page, the potential PD rationale for content itself is moot. -- (talk) 07:02, 26 July 2011 (UTC)
I've just posted on Wikimedia Commons' Village pump. Hope someone reads it. Ivanov224 (talk) 09:16, 26 July 2011 (UTC)
I have raised the same point at Commons:Commons:Bots/Work_requests#WikiProject_Royal_Society_Journals_Copyright_Problem with regard to the copyrighted logo. -- (talk) 22:16, 26 July 2011 (UTC)
Sorry to appear a little slow here but which files are we discussing? Are they the PDFs in Commons:Category:Philosophical Transactions of the Royal Society (1665-1886) and its subcategories? I have only spot-checked a few files in Wikisource's own Category:Philosophical Transactions of the Royal Society (1665-1886) but I have not seen any watermarks. I haven't found anything in other categories yet. Do we have any files on Wikisource that contains a watermark? - AdamBMorgan (talk) 11:54, 26 July 2011 (UTC)
The first link: Commons:Category:Philosophical Transactions of the Royal Society (1665-1886). Just checked volume 1 and when you scroll down you'll see a file like this: Philosophical Transactions - Volume 1 p0-Epistle Dedicatory.pdf Volumes 2-5 have the watermarks intact and probably all the others too. Ivanov224 (talk) 18:08, 26 July 2011 (UTC)
All volumes lower that 38 have full, compiled, DjVu files, so the PDFs (which would have been uploaded years ago, unrelated to the current "leak") can be killed if they are not up to Commons' standards without affecting Wikisource. Inductiveloadtalk/contribs 18:16, 26 July 2011 (UTC)
So you actually don't need me? Probably only for volumes bigger that 38, right? Ivanov224 (talk) 18:25, 26 July 2011 (UTC)
This WikiProject has already made some progress in obtaining many volumes of the PhilTrans (see Wikisource:WikiProject Royal Society Journals/Uploading progress), but there are other RS journals out there, and we are still deficient in many areas. I would say that you could still be very useful to us. I have just found out that there is already an (semi-)automated effort to get these PDFs up at the Internet Archive, so the centre of attention might be working out where to find volumes of works that we don't already have, and that aren't already on the IA or elsewhere. This will save duplication of effort on your part. Inductiveloadtalk/contribs 20:15, 26 July 2011 (UTC)

Getting back to the watermark removal thing - I've been thinking about how to achieve this from a different angle based on existing feature requests/fixes/bugs/etc. for DjvuLibre. Rather than remove the watermark from the .PDF prior to conversion to a .Djvu, I was thinking it might be possible to either filter that specific color out in conversion OR make that color the default background instead of white so its presence becomes nuetralized rather than removed (making a third export/save with an all white background & all black text possible).

I get pretty close to matching it with #bdbdc0         but its still a hair or two off from an exact match to what GoogleBooks, etc. use for watermarks by default.

Anybody out there have an idea on how to get that Hex or binary for the color in question? TIA -- George Orwell III (talk) 20:44, 30 July 2011 (UTC)

Internet Archive and articles[edit] now contains all the 18,500 recently released papers from the Phil. Trans. --Nemo 16:42, 9 August 2011 (UTC)

What about non-RS journals?[edit]

Is there no Wikisource:WikiProject Journals on wikisource (yet)? --Piotrus (talk) 03:54, 8 September 2011 (UTC)

We do have Wikisource:WikiProject Academic Papers, which is not currently very active. You are welcome to join, contribute to or extend that project. Category:Periodicals and Category:Journal articles are also areas where these works are collected. RS Journals are special case because of the large corpus of available PD text.
I am available for batch uploads and general bot work should you find a large collection of files that neither Commons nor Wikisource already has. Inductiveloadtalk/contribs 15:46, 8 September 2011 (UTC)


I was looking at the HathiTrust scan of volume 2 and it looks like it's a different edition--namely, starting at page 489 there's an additional text (a separately published letter apparently), and page numbers are adjusted to compensate. There may be more differences, not sure.

Wikimedian-in-Residence at the Royal Society[edit]

The Royal Society, the UK's science academy, is recruiting a Wikimedian-in-Residence to help them work more closely with Wikipedia. One of the main aims is to improve access to information about scientists from underrepresented groups.

The position is part-time (one day per week) for a fixed term of 6 months. See here for more information and details of how to apply. For additional information please contact francis.bacon [AT] Solomon7968 (talk) 15:20, 23 September 2013 (UTC)