Wikisource:Scriptorium/Archives/2010-06

From Wikisource
Jump to: navigation, search
Warning Please do not post any new comments on this page. This is a discussion archive first created in June 2010, although the comments contained were likely posted before and after this date. See current discussion or the archives index.

Contents

Announcements[edit]

Wikimedia Commons Picture of the Year contest open![edit]

Dear Wikisource users,

Wikimedia Commons is happy to announce that the 2009 Picture of the Year competition has now opened. Any user registered at a Wikimedia wiki since 2009 or before with more than 200 edits before 16 January 2010 (UTC) is welcome to vote.

Over 890 images that have been rated Featured Pictures by the international Wikimedia Commons community in the past year are fighting to impress the highest number of voters. From professional animal and plants shots, over breathtaking panoramas and skylines, restorations of historically relevant images, images portraying world's best architecture, maps, emblems and diagrams created with the most modern technology and impressing human portrays, Commons features pictures for all flavours.

Check your eligibility now and if you're allowed to vote, you may use one of your accounts for the voting. The vote page is located at: Commons:Picture of the Year/2009/Voting.

Two rounds of voting will be held: In the first round, you can vote for as many images as you like. In the final round, when only 20 images are left, you must decide for one image to become the Picture of the Year.

Wikimedia Commons is looking forward for your decision in determinating the ultimate featured picture of 2009.

Thanks, Wikimedia Commons Picture of the Year committee http://commons.wikimedia.org/wiki/Commons:Picture_of_the_Year/2009 --The Evil IP address (talk) 17:23, 14 May 2010 (UTC)


Proposals[edit]

Consideration to introducing Rollback privileges[edit]

The following discussion is closed: insufficient support to progress with the idea
Purpose. I am looking to propose that enWikisource looks to implement the rollback facility; though before taking this to the community, I was wishing to see what level of support there is among administrators as we would be undertaking the allocation of responsibility.

Background. I think that it is now opportune for enWS to consider the introduction of the rollbacker. While we have not set the highest hurdle for adminship (with good results), with our higher traffic, we now have people who participate in recent change checking, yet not necessarily involved in the wider aspects of the site. I believe that it would therefore be beneficial to have rollbacker. It would also allow the allocation of a corresponding bit to m:global rollbackers as we seek fit, while not necessary, it would elements of usefulness, and still allow us to maintain a control. (This latter part would be part of a separate proposal)

Comment. Wikisource has had the AUTOPATROLLED capability in place for an extended period of time, and from a subjective point of view it has been working admirably well. That is we set a basic principle of what we were looking for in persons given the autopatrolled bit, and we allowed administrators to allocate that bit based on their assessment of the knowledge, experience and attitude of the candidate, and whether they knew that they were a candidate for the bit or not.

Action. To implement this it therefore requires the support and involvement of administrators to allocate the bit, hence I am here seeking your opinions on this suggestion, and if considered suitable, your in-principle support to take it to community.

Note that this topic was first broached in this forum in late 2009, and recently at Wikisource:Administrators' noticeboard to gauge the opinion of admins who would administer the permission. The response from administrators has been supportive, with the neutral and negative comments relating to the need or that it may cause a stratification (eliteness) of people with permissions or otherwise.

Issues raised

  • That a person granted the permission may turn out to be an ill-advised allocation
  • The time period between being granted autopatrolled permission, and the granting of rollbacker permission
  • That we should continue to have a liberal approach to appointing administrators, and not clutter our wiki with another hierarchical level.

My suggestions to address these issues is how our general approach to the granting rollbacker permission would be that they should be a member of the community of good standing, and that the granting administrator has clearly sighted and identified that the person is able to manage patrolling of Special:RecentChanges as part of their decision-making process.

I too would not be looking to change our position on nominating and appointing administrators, though at the same time I see and acknowledge that there are some editors who only wish to look to manage patrol and don't wish to extend their capacity further. — billinghurst sDrewth 11:25, 16 March 2010 (UTC)

If by rollback is meant that a user can delete their own creation, then it's recommended that one should look at accounting and bookkeeping systems' rules regarding this issue, which has been addressed long before Wikipedia. The reason is because it's also a multiuser effort, and apart from protecting the system's integrity, the work of others must be protected when deleting. If Wikisource archives deleted pages, then that's an adequate "audit trail". In more restrictive systems, this translates into deletions of creations in the same session. Here on Wikisource, this translates to pages where there were no edits by others. Any restriction applied should be rational, such as only in one's own namespace, and consider implementing it on a trial basis for a period of some months?— Ineuw (talk) 13:09, 16 March 2010 (UTC)
"Rollback" means you get a button that reverts the latest changes to a page by a vandal. It's just a faster way to address vandalism (compared to navigating to the correct page diff, clicking to edit the old one, and saving). The button appears on the recent changes screen, on your watchlist, and on page diffs. Nothing gets deleted; all old page versions are still maintained. For more info, see Help:Reverting#Rollback. —Spangineerwp (háblame) 13:24, 16 March 2010 (UTC)
Thanks. Got it.— Ineuw (talk) 13:28, 16 March 2010 (UTC)
  • Support - The level of vandalism seems to have increased recently and it is not a trend the is likely to suddenly go down. We have some users like User:Jan1nad who are very devoted to the project and vandal/recent change, patrol but are not ready for admin (yet). This tool provides a logic step in the developement of volunteers skills. User:Spangineer had a couple good sugestions that common sense indicates are best practices. JeepdaySock (talk) 15:44, 16 March 2010 (UTC)
  • Still neutral with concerns: I'm not opposed to giving more users more capabilities, but Jeepday's example is already making me worried. I've looked over Jan1nad's contributions and I'm not sure why he's "not ready for admin (yet)"? It's not exactly fair to make one user the example for something like this, but when the rubber hits the road, I'm not sure I'm going to like this. I don't want to end up with WP's admin model, and it seems like Sherurcij‎'s concern is pretty valid. Ultimately though it's not that big of a deal so long as it's not abused, and I don't think that's a major risk. —Spangineerwp (háblame) 23:03, 16 March 2010 (UTC)
    What you are discussing is a cultural aspect, rather than the use of a tool. All the extra permissions should be solely as considered as tools to do a job (and a janitorial job), not as an elevation of status. I don't want to focus on other places, I would like to see us develop our culture, and set in place our best practice principles, while we do learn from the movements of other places.
    I share Sherurcij's concern, and I don't want to pick up bad habits from elsewhere, the denial of a useful tool is not the way forward. Our implementation around Autopatrolled and our general approach has been successful. Rollback similarly is another RecentChanges tool is is solely a management tool, and is neither a stepping stone, nor a gateway. Its allocation demonstrates an administrators confidence in that person's competence and interest in the particular area of patrolling. I would not see it be given to people as a prize for time on site, more as the allocation of the tool to better undertake the task that they are currently doing. With regard to the culture of encouraging people to be Administrators, I think that the existing history would demonstrate that we/I do do that as well, and I wouldn't be looking to change my attitude nor practice in that regard. — billinghurst sDrewth 00:04, 17 March 2010 (UTC)
    I agree. The tool itself isn't the issue, and ultimately culture is in the hands of people, regardless of the tools available. —Spangineerwp (háblame) 02:57, 17 March 2010 (UTC)
  • Can anybady identify a single contributor here who (a) is active in reverting vandalism; (b) can be trusted with rollback; and (c) is not ready, suitable or willing to undertake full adminship? Hesperian 03:14, 17 March 2010 (UTC)
Depends or your definition of "active". While I do not actively patrol, I nuke it whenever I see it. I use Twinkle for, which has rollback functionality, and can be revoked if misused. But you'll never catch me adminning, I'm kind of like a reverse Hulk here: The more powerful I gets, the madder I gets. ;) Paradoctor (talk) 07:30, 17 March 2010 (UTC)
  • I would suggest User:Jan1nad as ready for rollback, very active patroler but needs a bit more experience before asking them about becoming an admin. I would also suggest my alter ego User:JeepdaySock (though I did not think about until recently). I do a lot of patrolling with it, when I am not comfortable (security concerns) logging in to my admin account. Rollback run amuck is pretty easy to fix, admin run amuck could be painful. Jeepday (talk) 00:13, 20 March 2010 (UTC)
  • Support - This is a good proposal by Billinghurst (talkcontribs). Will help in vandal work, while at the same time it should be relatively easy to obtain and remove, if necessary. I also agree with the above comment by Jeepday (talkcontribs), that this will be a good practice to have, to help introduce some user development in a positive and constructive practice. -- Cirt (talk) 04:12, 21 March 2010 (UTC)
  • Support Why not? The rollback privileges has been proven beneficial in other projects and is working great at the English Wikipedia and Wikimedia Commons among other projects that have enabled rollback. With additional users to revert vandalism, it can reduce the trivial workload of admins. Snake311 (talk) 09:52, 7 April 2010 (UTC)
  • I don't think it is good for wikipedia and I don't see a need for it here. Cygnis insignis (talk) 08:37, 14 April 2010 (UTC)
  • I don't see a need. If I read this right, en:ws currently gets ~10k edits per month which is about 13 per hour. I occasionally have a glance at Special:RecentChanges and I think have only caught vandalism once while doing that. I'm far more likely to pick something up from my watchlist but haven't seen anything there yet. If I did catch something I'd revert it regardless of having a rollback link on my screen or not. I doubt that any of the 41 admins who already have the rollback button are run off their feet reverting vandalism now. My concern is that a mid-level status might encourage gnomish non-contributors to come here just for the vandalism-patrol. And that might flow into a subtle change in attitudes about granting admin rights. This project has a nice collegiate feel which could be diminished by introducing a middle-class of editors. Better to readily grant admin rights to anyone who is trustworthy and wants them. (I posted this against my better judgement. I have no interest in adminship myself and generally avoid these discussions if I can.) Moondyne (talk) 09:50, 14 April 2010 (UTC)
  • While this is a smaller project than WP and Commons I still think it could be of use here, so I support. Maximillion Pegasus (talk) 21:01, 11 May 2010 (UTC)
  • Pictogram voting comment.svg Comment - Wouldn't it be easier to simply give the 'rollback' right to the 'autopatrolled' user group? Users in this group are already trusted and usually have some experience editing here at wikisource. Maximillion Pegasus (talk) 19:32, 20 May 2010 (UTC)
    That wouldn't be my preference. I would prefer to be liberal with the application of autopatrolled as that is a demonstration that they are doing the right thing. Patrolling is a different beast, and a separate assignment. — billinghurst sDrewth 11:42, 21 May 2010 (UTC)
  • Weak support I've done some recent change patrolling and reverted a few bits of vandalism and (I assume) good faith mistakes. Rollback would have made that easier but is not essential.--Longfellow (talk) 09:54, 31 May 2010 (UTC)
being bold to close, it lacks sufficient support IMNSHO to progress further at this point in time. — billinghurst sDrewth 11
51, 31 May 2010 (UTC)

Other discussions[edit]

Questions[edit]

How many pages on your watchlist?[edit]

Kind of a silly question, yes, but I remember someone mentioning back when I was a new wikisourcer that it's a good idea to make your settings so that all pages you edit are automatically added to your watchlist—just so that we've got plenty of eyes looking out for vandalism. Hopefully that reminder is useful to someone, but in addition, I'm curious to know how many pages others have on their watchlists. The Federalist just put me over 5k, which I suspect is quite low compared to some of our more prolific users. Anyone? —Spangineerwp (háblame) 03:45, 17 February 2010 (UTC)

11,000. Hesperian 03:50, 17 February 2010 (UTC)
9,000. Cygnis insignis (talk) 04:01, 17 February 2010 (UTC)
23. Really. Paradoctor (talk) 08:30, 17 February 2010 (UTC)
1200. Moondyne (talk) 08:32, 17 February 2010 (UTC)

I rarely add them these days, unless I have offered someone assistance. I find it easier to call up RC with 2k edit buffer then just do a daily scan and scream; oh, and welcome and patrol. :-) — billinghurst sDrewth 10:58, 17 February 2010 (UTC)

Just over 18,000. I've been receiving a lot of change notification emails from Wikisource while I've been preoccupied elsewhere. ;-) John Vandenberg (chat) 17:28, 6 April 2010 (UTC)
2,878 a lot of my edits get watched or if I find something interesting. --Mattwj2002 (talk) 00:18, 17 May 2010 (UTC)

Editing uploaded images[edit]

Trying to make a long story short, I configured IrfanView image editor (Windows) to batch remove the yellow backgrounds from the images of the PSM project,with decent results. Unfortunately, the discovery of the process occurred after uploading over 1000 un-retouched images. Now, I am attempting to edit them by using the extension setup for external image editing from here and having no luck. I posted help requests on every wiki where image files are being edited/managed, but received no reply (2 weeks). I narrowed down the issue to three questions which would determine whether I continue to pursue offline editing.

  1. Is it worth the effort time wise to go this route to fix the uploaded images?
  2. Do externally edited images have to go through the same process of upload to replace the old?
  3. Is there someone who is knowledgeable with extensions & scripts, and interested in resolving this challenge? I am willing to put in the time to get the answers.

Rather than clutter up this forum with code and details, I placed additional info on the problem on my talk page. — Ineuw (talk) 18:30, 20 April 2010 (UTC)

  • I think the image improvement looks great. I have no idea about the technical solutions, but removing the yellowing is a worthwhile task. JeepdaySock (talk) 10:48, 21 April 2010 (UTC)
  • First off, this looks great—I spend ages whiting out the backgrounds of illustrations. One note about the source of your illustrations, though: remember that the Internet Archive has much higher quality scans than these DJVUs. If you look at the IA JPG, compared to the DJVU version, you can see the huge loss of detail caused by the DJVU compression which is optimised for text, not images. Can I recommend that you take your initial image from the IA directly, rather than from the compressed DJVU?
  • However, I think it is certainly possible to:
  1. Automatically download a file (the name is directly related the image path, and simple scripting tools can grab images from a web address)
  2. Run a script on the file (you have that done already)
  3. Reupload (pywikipedia has an upload script that can overwrite files).
  • I'll look into it for you, but I'm quite busy at the moment, so I can't do it directly. Inductiveloadtalk/contribs 14:49, 25 April 2010 (UTC)

Hi Inductiveload. Thanks for the offer of help. The image edit is absolutely not rush, as I also have my plate full.

Regarding the image differences between copying the Djvu image when in edit mode (and uploaded to the Commons), vs. the same image copied from the .jpg file from IA, can someone please explain the quality difference because I see none. In my (limited) understanding, since the Commons image is much larger, should it not be also the better & clearer resolution? I am confused. - Ineuw (talk) 18:56, 29 April 2010 (UTC)

It's more helpful if they're the same size, but even at thumbnail size, the first one looks blocky on his left cheek (right of viewer) and the gradients are pretty rough. I don't know where the second came from, since the IA jpegs are the same resolution as the DJVU files. If the image from the DJVU is larger, it's not scanning data, it's the output of some scaling algorithm.--Prosfilaes (talk) 23:27, 29 April 2010 (UTC)
Ineuw, you are not zooming into the IA JPG enough! The full size JPG is much larger (in this case thousands of pixels wide). You can see every detail of the print at that scale, and it is not blurred and blocky like the DJVU. You have to zoom in to 100% (the scale is shown in the top right), then you can right click and save the image JPG to disk. If you do not zoom in, you will be served a lower res JPG. This is the full 1900x2000 JPG I got and cleaned up: File:James Clerk Maxwell test only 2.jpg
The original scans are actually stored as JP2 files, these are cropped, rotatated and shown online as high res JPGs (like the one I just uploaded). The DJVUs are derived from the same source, but much more heavily compressed (much more than the JPG, which is also a bit compressed) and are only a few MB, about the same as only a few JPGs. It is not surprising that the degradation is dire, but for text it doesn't usually matter. For pictures, it's terribly destructive.
Prosfilaes: resolution is not equivalent to quality (but high quality needs high resolution). Inductiveloadtalk/contribs 00:09, 30 April 2010 (UTC)
Also, and this is a Commons thing, can you add images to relevant categories when you upload them? For example, this one should be in commons:Category:James Clerk Maxwell (I added it already, but if there are 1000 or so more images not sorted by subject but only by source, that is bad for finding images at Commons). If fact, because you added them to the PSM category, they will now always be missed by category checking bots and editors looking for missing categories, so they will forever remain uncategorised unless someone sees them. Inductiveloadtalk/contribs 00:21, 30 April 2010 (UTC)

Image quality and categories[edit]

Thanks for everyone's above input.

  1. After a couple of attempts, I managed to discern the quality differences between an enlarged IA .JP2 image and the same .Djvu image and yes! the online quality is definitely superior. This requires a major rethink of my method of processing. I was also looking for a .JP2 version of the volumes, so that I can extract the images locally, but they are not offered for download. So, processing the images from the online copy is problematic for the time being, and I will suspend this part of the proofreading for awhile. Also considered converting them to another format, like .PNG or .SVG, but don't yet know enough if there is an advantage to this. Finally, only some images are worth replacing/upgrading, and will be content if I am able to change their "s***ty" background colour.
  2. In the beginning, I did categorize the images by subject and there is no shortage of categories. Then, I noticed that many categories are the file names, which confused me. I searched the Commons for images from the same time period and discovered that almost all PSM images exist elsewhere on the Commons, properly categorized and of a far superior quality. After all, PSM is about 15%-20% original work and the rest are collated articles and pictures which appeared elsewhere in professional publications of the day, like Nature, etc. When I get around to improving the uploaded images, categories will also be added. - Ineuw (talk) 22:24, 2 May 2010 (UTC)
  • The JP2's can be downloaded as a zip file. Go to the main details page for a volume (e.g. http://www.archive.org/details/popularsciencemo02newy) and click on the "HTTP" link. In that list of files is one ending "...jp2.zip": this is the one you want (i.e. http://ia360629.us.archive.org/3/items/popularsciencemo02newy/popularsciencemo02newy_jp2.zip). "...orig_jp2.tar" is similar, but much larger for no benefit, because the pictures are not trimmed or rotated. No sweat about categories, so long as you don't plan on dumping thousands of uncategorized images on Commons in the long run! As for SVG images, you can't just convert from a raster like that - all you will do it lose detail horribly. You'd have to redraw the picture as a vector by hand, and this will take ages, if it is even possible. You will also not really be using the original artwork, which is kind of point! Cheers Inductiveloadtalk/contribs 06:49, 3 May 2010 (UTC)
  • Also, a personal preference, and you can ignore me if you like, but I prefer images without the captions, as these can be put in using Wiki markup, and having a reduplicated caption looks silly to me. If you will be editing and reuploading, now would be a good time to edit them out. Inductiveloadtalk/contribs 06:58, 3 May 2010 (UTC)

Reference indicator variation in PSM[edit]

Starting with Volume 15, article references in PSM use asterisks * and crosses †‡ instead of numbers. Is there a template or an HTML tag that handle such a convention?— Ineuw (talk) 19:03, 22 April 2010 (UTC)

Don't bother with them, they are not needed. The scheme is not workable anyway, when multiple pages are transcluded there would probably be several asterisks * and crosses †‡ on the main-space page. See Help:Editing_Wikisource#Footnotes. Cygnis insignis (talk) 14:15, 30 April 2010 (UTC)

Thanks. I came to the same conclusion but had to check. - Ineuw (talk) 19:11, 30 April 2010 (UTC)

Stories from the Arabian nights[edit]

Today, all the images from Stories from the Arabian nights (1907) by Laurence Housman were deleted from Commons, as they weren't out of copyright in their home nation. Some of them were by French illustrator Edmund Dulac, died in 1953. I have a copy on my hard drive, but I'm running late, and can't upload them now; go ahead and nudge me later if I forget about it.--Prosfilaes (talk) 19:54, 23 April 2010 (UTC)

Interesting as I don't see them listed here. Alternatively if you can produce a list of them, then we can undelete them and import them with their histories, then delete them again. Also it would be useful if you could point out the deletions as it would be expected that CommonsDelinker should be used at Commons, and that would seem not to be the situation on this occasion. — billinghurst sDrewth 12:45, 24 April 2010 (UTC)
The files are:
File:Arabian_nights_aladdin_Bedr-el-budur.png File:Arabian_nights_wicked_brothers_advanced.png File:Stories_img.png File:Arabian_nights_aladin_aladin_in_the_cave.png File:Arabian_nights_wicked_brothers_pirouze.png File:Arabian_nights_ali_baba_blindfold.png File:Stories_from_the_Arabian_nights_front.png
So there's no need for me to reupload, you can just copy them directly from Commons?--Prosfilaes (talk) 02:24, 25 April 2010 (UTC)
Yes check.svg Done If there is ever a future need, one can use Special:Import prior to their deletion. — billinghurst sDrewth 06:47, 25 April 2010 (UTC)
I just checked the pictures, and none of them are there. All the data and history from Commons exists, but the pictures themselves are missing. See here, for example.—Zhaladshar (Talk) 14:30, 30 April 2010 (UTC)
(Expletiveration) It obviously only imports the casing and not the image itself. Had me fooled and what a nuisance. Prosfilaes it looks as though you will need to add your images to the casings. If that is now no longer possible, then I will have to go back and grab them individually. — billinghurst sDrewth 15:03, 30 April 2010 (UTC)
Sorry; I deleted them when you said you had them.--Prosfilaes (talk) 16:50, 30 April 2010 (UTC)

Yes check.svg Done for real — billinghurst sDrewth 05:56, 1 May 2010 (UTC)

Google copyright claims on public domain works[edit]

I was going over Help:side by side image view for proofreading, and part of the file describes removing Google copyright notices from .djvu files. I believe that this actually is legal to do, because it is my understanding that public domain works cannot be covered by copyright even when they are obtained from a faithful scan ("non-transformative" with no creative input). The practice has even been dubbed "copyfraud". Even so, one assumes that Google spends quite a bit of money on the bandwidth needed to serve up a full page of copyright notice, "requesting" that all users make only non-commercial use of the books, and there might be some reason for it. There may be some political factors to consider also - Google obviously has done a great service at some expense to digitize all these books - perhaps in a few years, once libraries have been given a fair chance to throw out the originals, someone in Congress will suggest that they be rewarded for this hard work with the right to control the commercial use of these books or derivative works thereof in perpetuity.

But I'm not a lawyer and I wouldn't know how to find the best sources to describe the current applicable law. I'd suggest it may be well worth the trouble if others here on Wikisource can do so. Mike Serfas (talk) 18:46, 24 April 2010 (UTC)

There is no new copyright gained by scanning an image. Google requests that users only make non-commercial usage because that would be most beneficial to Google, but the Google opening page starts out by acknowledging that these books are public domain. University libraries are not, as a general rule, going to start throwing these books out; they were bitten badly by the promise of newspaper microfilms, can see the mediocre quality of Google scans and have already set up buildings to store infrequently used books. Perhaps someone in Congress will be suitably incentivized into trying to pass a bill giving Google special rights, but I see no reason to stress over all the possible bad things that can happen in the future.--Prosfilaes (talk) 02:34, 25 April 2010 (UTC)
And they may possibly claim copyright over the front page, which is the only difference between the output and the input. Hence we prefer to have it removed, and so does Commons, however, it is not a file killer. At this end, we kill the OCR'd text, and mark it without text. — billinghurst sDrewth 06:23, 25 April 2010 (UTC)
I since encountered Commons:When to use the PD-Art tag and Commons:Reuse of PD-Art photographs. It would be nice if those pages explicitly stated a position on the plain text, but it's a beginning.
I would still worry that the Google trademarks on every page (lower right) might be used against the project in some way, especially considering the comment about "low quality" above - i.e. we might be distributing files with Google's trademark that are taken to represent the quality of their work... and providing an opening for later trouble.
I'd further worry that some trademark-infringement law might be passed that criminalizes the act of removing the trademark from the document, perhaps without much public discussion or awareness of the issue. For comparison, consider the H.R. 2652 (1997) by Howard Coble, part of an international push that actually did burden Europeans with "database rights". (See news overview, bill text) Part of this (I think it was in the later attempt to enact the legislation as a treaty, as it isn't obvious in the bill text) was to criminalize the removal of attribution information from databases. I think it would be safest for Wikimedia to design a program to wipe out these trademarks from the .djvu files preemptively, rather than waiting until news reports of such a bill appear a few weeks before passage. Mike Serfas (talk) 12:58, 25 April 2010 (UTC)
I wouldn't fret it. If there needs to be some action in the future, then so be it, to my understanding what we are doing is legal. Leave possible future actions for the future IMNSHO, and I doubt that such any such legislation could be punitively applied retrospectively. — billinghurst sDrewth 13:23, 25 April 2010 (UTC)
I saw an identical trouble into Gallica, but I ignored it (obviuly I mention the source of the scans when I posted them into Commons). Nevertheless sometimes I wonder if it will be a good idea to change proofread interface, and to optionally remove the image of the page on the right.... it would allow to edit confortably Page: content browsing the original source, or a "personal use only" download of it, into a separate browser/Adobe window (a kind of proofread work without any file upload into Commons in cases of controversies about copyright). --Alex brollo (talk) 09:15, 3 May 2010 (UTC)

PD materials on JSTOR etc.[edit]

Related but tangential to the discussion on Google Books above. There's a large amount of public domain texts on JSTOR and other such portals, covered under their respective terms of service, which ban uses Wikisource requires. What's the significance of these? Do they bar use of the materials on Wikisource? Prosody (talk) 04:51, 26 April 2010 (UTC)

Wikimedia can use them; they fall right under PD-Art. Their terms of service have no impact on a third party. However, they may have an impact on you and the institution that provides you access, and note that people have got seriously annoyed over similar actions before. While it annoys the hell out of me that JSTOR has effectively engaged in a conspiracy to remove these works from library shelves and make them only available from JSTOR--most libraries that pay for JSTOR have moved the journals they have physical copies of that they get from JSTOR into cold-storage, unavailable to any but librarians--I think it might be in your best interest and that of the institution that's paying the JSTOR bill to avoid large-scale uploads, and if you do upload anything, make sure all the JSTOR logos and watermarks are off of it.--Prosfilaes (talk) 14:32, 26 April 2010 (UTC)
It comes down to terms of service, not copyright law. Whether the terms are legally applicable is another argument and it would be interesting to see whether they ever get to court. Also note that they try to bind up public domain and copyright articles under the one binding statement, almost like the Star Wars, part one, scene with the trader ... Republican credits and Jedi mind tricks. — billinghurst sDrewth 17:14, 26 April 2010 (UTC)
The problem is, if they can figure out which library you're working from, the odds it will get as far as the law is unlikely. JSTOR can cite the breach of contract and cut off their access to a service that costs tens of thousands of dollars a year. They're more likely to call up the library and have your access cut off, and if you're getting this access because you're affiliated with a university, the university may well consider that a violation of their policy for appropriate use of computing services and respond appropriately.--Prosfilaes (talk) 19:19, 26 April 2010 (UTC)

I would see more JSTOR PD content here and on Commons. Please keep in mind that in Europe a directive forbids contracts regarding database-protected works which doesn't allow not-essential uses (see e.g. § 87e UrhG German). What Prosfilaes writes is simply paranoia --Histo (talk) 16:42, 8 May 2010 (UTC)

Well, some of us aren't in Europe. I don't think it's paranoid to think that violating the TOS of an organization that your school has a fifty or hundred thousand dollar a year contract with might not have good results.--Prosfilaes (talk) 02:13, 10 May 2010 (UTC)

Vector vs. ProofreadPage Index pages[edit]

Did anyone else notice that editing ProofreadPage Index pages with the Vector skin doesn't work at all? --Amir E. Aharoni (talk) 17:42, 27 April 2010 (UTC)

Not I. That said, I am in a denial dinosaur phase and have yet to migrate to vector. — billinghurst sDrewth 01:37, 28 April 2010 (UTC)
Can you paste a page link here? - Ineuw (talk) 21:53, 27 April 2010 (UTC)
Working fine for me and I have switched to vector for the moment. Edit summary box is very small though. — billinghurst sDrewth 03:54, 28 April 2010 (UTC)
My favorite example is, of course, Index:Gesenius' Hebrew Grammar (1910 Kautzsch-Cowley edition).djvu, but it is broken everywhere in the Index namespace.
I think that i see the situation in which it breaks: The enhanced editing toolbar must be enabled under Preferences->Editing. Without the enhanced toolbar it works. --Amir E. Aharoni (talk) 06:10, 28 April 2010 (UTC)
I can replicate that. Even find that if you are in monobook, and check that option that it forces vector onto you. I would suggest reporting this to oldwikisource:ProofreadPagebillinghurst sDrewth 14:32, 28 April 2010 (UTC)
Reported a bug: mediazilla:23340
Somewhat curiously it is not broken in Windows Internet Explorer. GRRRR. --Amir E. Aharoni (talk) 17:32, 28 April 2010 (UTC)
Yes check.svg Done - Roan fixed this. --Amir E. Aharoni (talk) 15:17, 3 May 2010 (UTC)

InductiveBot bot request[edit]

Hi! I'd like to request bot rights for my bot, User:InductiveBot, which has already been used at commons. It is a pywikipedia based bot, and has been used at Commons for batch uploads and some simple textual work, such as replacements and category creation (all successfully).

I would like to use this bot here for things like replacement of obsolete templates and trivial tasks, such as adding links between volumes, that will otherwise be a chore to do by hand. The bot is run on a specific set of pages which are selected beforehand, and it is started manually and watched over while running. For most tasks, each edit is confirmed by hand, and if not, it starts out manually vetted before I allow it to continue without.

An example of the work I would like to do is to remove the "large" templates from each of the DNB Index: pages, and replace with "x-larger" (reasons for this can be found on WS:PD). This bot will not be used for wholesale site-wide changes of this sort, only for jobs where I have checked that the replacement is valid on each page.

Thanks Inductiveloadtalk/contribs 01:58, 30 April 2010 (UTC)

As per Wikisource:Bots, there is a request to run the bot for small number of iterations, and once that is received, then our 'crats watch it in action, and when they are comfortable they will allocate a flag. — billinghurst sDrewth 14:58, 30 April 2010 (UTC)
  • probably worth running a trial over a slightly larger set of edits with some differences, to demonstrate that it is not solely a discreet typeset fix. FWIW I am happy with what was done. — billinghurst sDrewth 11:33, 1 May 2010 (UTC)
  • I ran a successful test on the first 50 or so PSM pages, removing {{gap}} templates, replacing {{gr}} and catching no-indent pargraphs. It's all based on replace.py and some regexes, so I can't really demonstrate any more powers - it works with these regexes, it'll work with anything! I will set up an Index: creation task soon for a volume work too. Inductiveloadtalk/contribs 09:37, 3 May 2010 (UTC)
Edits look good to my eye, and using an established wiki tool, I support a bot flag. — billinghurst sDrewth 12:51, 3 May 2010 (UTC)
  • Bot flag granted--BirgitteSB 04:42, 8 May 2010 (UTC)

Bring out your dead[edit]

I was going to do a mock of the "Bring out your dead" scene from Monty Python and The Holy Grail, however, I have decided to spare you all from my wicked twisted sense of humour.

We know that there are those users who have their valued contributions, and are just too shy to nominate them to WS:FTC, however, don't sell yourselves and your works short. If your treasure is validated and is deserving of a wider audience, then please bring it forward for discussion. Naturally we will offer to help shine it or to work upon the little dents with a bit of polish, though all for a good cause. So please consider the works that you have completed for being a featured text. Thanks. — billinghurst sDrewth 13:01, 1 May 2010 (UTC)

PediaPress and validated texts[edit]

I was considering using the "Create a book" option for a few items on Wikisource when I came across a problem. I'm not sure if this has been raised before but "Create a book" and PediaPress do not seem to play well with Wikisource's validated pages. If this has come up before, how do I solve my problems? If this has not come up before, is there a way for this to be fixed in the future? My example is Wikisource:Books/Bull-dog Drummond, which was recently a Proof-read of the Month subject. When I first tried to make the book, the preview pages at PediaPress turned out to be just the code from each chapter, eg. <pages index="BulldogDrummondSapper.djvu" from=28 to=49/>, rather than the actual text. After reading through the help files, I though it might work if I put in each page manually. I have only done this for the prologue, which is the version curretly under Wikisource:Books, but it still won't work. Each page comes through (including the headers and "invisible" notes about page quality) as a separate page, with a separate heading, and not flowing into one another. As it stands, I can't really see a way around this without copying and pasting the text into a new page, which lacks both elegance and efficiency. (Also, Category:Books is a bit of a mess - the book creator puts saved books in here automatically but it is also used for Wikisource texts.) - AdamBMorgan (talk) 22:32, 1 May 2010 (UTC)

I cannot help with the rendering problem, though have found that for the default category we need to create the file Mediawiki:Coll-bookscategory. For our starting default, I will create Category:PediaPress books, which falls under Category:Works. The other details for display are addressed in the release notes. — billinghurst sDrewth 11:22, 2 May 2010 (UTC)
May be worth us asking for assistance at m:Book tool/Feedback
billinghurst sDrewth 11:47, 2 May 2010 (UTC)
You could use {{subst: to create the chapters or whatever. This new page may need to be refreshed, to show changes in the page: namespace, this could be done using the edit history. Cygnis insignis (talk) 14:28, 2 May 2010 (UTC)
My solution of the trouble: see User:Alex brollo/Books/Equitation. I built a plain transclusion of pages Page: into a group of unused Talk pages of Equitation; book tool doesn't work with tl|Page, but works well with usual ranslusioni, and with plain section trasclusion (with #lst: syntax) too. I suppose that it will be pretty simple for a bot to build similar plain transclusion pages, then to build a pdf book code from them. --Alex brollo (talk) 14:03, 3 May 2010 (UTC)
I've followed both suggestions and created a new page (just for the prologue so far: Bull-dog Drummond/Prologue/PediaPressVersion). It doesn't need substitution; as Alex states, normal transclusion works. It's still a little awkward, though. I created a new bug report on meta for this as the previous one didn't seem to be exactly the same problem. At the moment, of the four books currently on the bookshelf, only one works (The Time Machine) because it isn't based on proofread scans. - AdamBMorgan (talk) 13:07, 4 May 2010 (UTC)
You're right! My beloved Equitation too didn't run in the version posted into bookshelf. I updated it with the version posted into my personal bookshelf, so now there's one PDF working book based on proofread scans, and coming from live transclusion (not on substitution!). Is there some pythonist listening? My time is gone, I'm hardly fightings with API's and Vector interface, but I guess that a python programmer could write a script for automatic book code building based on plain transclusion in a very short time! In the meantime: I had to move all images of Equitation into jpg files from a djvu container, since book tool - when I tested it - could not read [[File:NameOfFile.djvu|page=numberOfPage]] syntax. A bug tickhet have been opened,I don't know if the trouble has been fixed. --Alex brollo (talk) 23:19, 4 May 2010 (UTC)

Special:IndexPages[edit]

You have added interwiki-links to this page! How did You do that? -- 06:06, 2 May 2010 (UTC)

there's a system message for this, see MediaWiki:Proofreadpage specialpage text ThomasV (talk) 06:56, 2 May 2010 (UTC)
Thank You! -- Lavallen (talk) 10:27, 2 May 2010 (UTC)

New bot introduction[edit]

Hi all, I just registered a bot account: User:Alex_brolloBot. I'm the driver a very busy bot into it.source: it:User:Alebot, usually devoted to "unusual and hard tasks" since it mainly executes original python scripts written by me.

I presume that the bot's work here would be mainly some occasional edit/upload for a couple of friends. I just got a request from a friend, met into pt.source (see my talk page), so I presume that edits will be occasional (mainly uploads of good external OCR's into nsPage). Well, I read your policies, and I wait your comments. Thanks! --Alex brollo (talk) 10:44, 2 May 2010 (UTC)

Today I put the bot at work. It's uploading the external OCR of Index:An introduction to linear drawing.djvu from page 26. The first group of uploads have a small mistake, I forgot the param minorEdit=False for .put method of Page instance. --Alex brollo (talk) 22:40, 7 May 2010 (UTC)
Hi! I am a little hesitant to endorse a bot that we don't know anything about. What language is it written in? Is it built on a known framework such as pywikipedia? Does it operate automatically or manually and is it supervised when operating? What kind of edits can it do (uploading OCR only, or other things)? What is the "normal" edit rate? These are the kind of things that we like to know before a bot flag is considered. Thanks! − Inductiveloadtalk/contribs 17:08, 8 May 2010 (UTC)
Thanks for attention. Really, the only two aims of the bot here are:
1. to do an upload OCR work for a friend;
2. to gain some experience bot in driving a bot into a "unknown landscape" and to see a little deeper what's happening here in en.source (and I found VERY interesting news and tricks to stole!)
There's no real need to flag it, since edits will be absolutely infrequent. It's driven by me personally with a strict supervision. It uses pywikipedia scripts for basics, and my own python scripts (usually launched from interactive, Idle interface) "to do real things" . It's not flagged, so you'll see its edits in Recent Changes list. I'll use User:Alex brolloBot page to list and comment briefly its contributions. It's very similar to it:User:Alebot, one of the most busy bots into it.source (tens of thousand of contributions into a year!). Take a look to Alebot contributions if you like. --Alex brollo (talk) 11:55, 14 May 2010 (UTC)
Ok, that's exactly what I wanted to know. I have no objections in that case. If you could just make a short note on the bot's user page about the basics of the bot (i.e. supervised, pywikipedia-based, infrequent ad-hoc tasks, etc) that will allow others to quickly see what it is later on when this thread is archived.
Support unflagged use of proven bot by experienced user (review in future if flag is needed)
Inductiveloadtalk/contribs 15:10, 14 May 2010 (UTC)
Support. Basically, I concur with Inductiveload. Note that, technically, your bot violated the edit throttle guideline (putthrottle over 60 seconds for unflagged bots). I'm not sure why this guideline makes a difference between flagged and unflagged bots. Maybe it's so new bots can be blocked before they can do too much damage in case they go rabid. Anyway, I think you should either get a bot flag despite your stated intent to run the bot infrequently, or, if you do not yet feel entirely confident with the "unknown landscape" yet, let it run slower. I trust your experience if you trust yourself, and then there is simply no need to have your bot's edits on the recent changes list. Please also check out WS:BOTR and Wikisource:Bots/scripts. Especially regarding the latter page, it would be great if you published your own python scripts.--GrafZahl (talk) 17:35, 14 May 2010 (UTC)
Opppss... I see, I didn't consider throttle point. Yes, I'll publish my scripts, but I guess they will turn out unuseful... they are written just for personal, interactive use, badly documented, and reviewed daily... into an horrible personal "python slang"; no more that a collection of ad hoc routines.. often they are called after some interactive, command-line work, and their meaning is absolutely a mystery if a user doesn't know that point. So, I'll think that their best location is into a subpage of the bot itself: I'll post them into User:Alex brolloBot/Python scripts, "as they are". --Alex brollo (talk) 21:31, 14 May 2010 (UTC)

Request for gif --> djvu conversion[edit]

I have about 35 individual .gif images, one for each page of a pamphlet published in the 1850s. My skills when it comes to djvu conversions are limited to pdf --> djvu conversion, so if anyone has a more robust tool and the ability/time to convert a collection of gifs into a single djvu file and send it to me, please let me know and I'll email you the files. Thanks. —Spangineerwp (háblame) 20:54, 4 May 2010 (UTC)

Check my user page for a link to a script that will do that. You will need ImageMagick and DjvuLibre installed. You may need to alter "IMGDIR", "DJVUDIR" and "IMKDIR" depending on your installation locations. It is rough and ready, but it works. Inductiveloadtalk/contribs 23:59, 4 May 2010 (UTC)
Spangineer, are they online somewhere to be grabbed? — billinghurst sDrewth 02:19, 5 May 2010 (UTC)
Thanks guys—Alex is taking care of it. I'm not a fan of DjvuLibre, or, more accurately, it's not a fan of me. GUIs get along with me much better. As for the files, they are available online [1] but they were kind of a pain to download so I figured this way would be easier. —Spangineerwp (háblame) 02:36, 5 May 2010 (UTC)

Virgil: The Georgics, poem, verse and transclusions[edit]

Does somebody find a solution to avoid a gap at every page's end and new page's beginning when they are transcluded? For instance between page 16 and page 17 here? Thanks if you can help! --Zyephyrus (talk) 19:25, 5 May 2010 (UTC)

I made two changes (to the transcluded page 16 and 17, that I will reverse after a bit, until you see it) that supposedly fix the issue. Click on the page you linked to see it. According to the documentation of the poem extension, using a "compact" option should remove that blank line. And it does, but what it does then is add a blank line between the first two lines and another between the last two. So, it doesn't really help. As far as I know, there's no other way. Maybe what we are seeing is a bug and should be reported so that we get appropriate behavior.—Zhaladshar (Talk) 22:35, 5 May 2010 (UTC)
<poem> is simply problematic with that extra bit of space that it generates, though in itself it isn't generally the issue until you have subsequent tags on continuing pages, something that may be peculiar to WS. With lots of formatting, you can <noinclude> DIV and SPAN tags to allow formatting to flow through separate transclusions. If you try to <noinclude> the end </POEM> tag and the next corresponding start <POEM> tag on the subsequent page, it fails to work properly. I see that the ideal solution is to get POEM and NOINCLUDE to play together better and then the spacing issue becomes irrelevant. My feel is that would be the ideal place for a bug report. — billinghurst sDrewth 04:20, 6 May 2010 (UTC)
After a hard fighting ;-) the trouble hes been solved into it.source adding this code to common.css, stoleb from fr.source::
/* classe poem : utilizare con <poem> per i testi in versi */
 
.poem { 
        margin-bottom: 0em; 
        margin-top: 0em; 
        line-height: 1.6em;
        margin-left: 2.5em;
        text-indent: 0em;
}
.poem p { 
        margin-top: 0em ! important; 
        margin-bottom: 0em ! important; 
        text-indent: 0em !important;
}
Try it into your common.css then import it into general common.css it it solves the issue.
Take a look to it:Georgiche/Libro_primo to see the result into it.source.
There are a couple of more troubles:
  1. how to insert a left spacing into the first verse into a continuing poem; see it:Pagina:Canti_(Sole).pdf/280;
  2. how to insert a blank row when a needed at the beginning of a page; see it:Pagina:Canti (Sole).pdf/115
  3. how to build, and where to insert, a template like it:Template:R that adds a number to the right of the verse, using a float=right style; you'll see a conflict between float=right and the <br /> tag added by poem; but you are lucky since you don't use here a similar template (the fr.source and the it.source are struggling with the issua, while de.source uses a different trick). We are working about. --Alex brollo (talk) 07:48, 6 May 2010 (UTC)
FWIW, the only formatting that we have relating to .poem at the moment is .poem p { text-indent: 0; } At the same time, I would still prefer to be able to wrap an end tag for poem inside noinclude tag, as if one applies CLASS or STYLE attributes to poem tags one still wishes for those attributes to carry over across the work, and it makes the formatting issue of common.css irrelevant.
Personal comment, we do a horrid job of describing our CLASS attributes and available formatting from the common.css file. — billinghurst sDrewth 08:24, 6 May 2010 (UTC)
 ;-) Just try... the problem is, that poem is a class itself. How can you apply a class to a class? Here the resulting html code from Georgiche:
<div class="poem">
<p><span><font size="5">C</font>iò che più pingui e floride le messi<br>
Renda, e in quale stagion romper la terra,<br>
E a l’olmo giovi maritar la vite;<br>
So you have to assign your style to both .poem and .poem p (as fr.source friends did) if you want to get any result.--Alex brollo (talk) 10:16, 6 May 2010 (UTC)
Thank you, all of you! and thank you ThomasV who has fixed it! --Zyephyrus (talk) 16:40, 8 May 2010 (UTC)
That's great to hear. For a problem that's been around for so long, I'm glad it was a quick fix.—Zhaladshar (Talk) 19:07, 8 May 2010 (UTC)

Totalmente Drama no Interior[edit]

Another - Totalmente Drama na Animação

Ideas? spam or legit but wrong wiki? George Orwell III (talk) 22:11, 7 May 2010 (UTC)

My rudimentary knowledge of Portuguese tells me that it's an article about a television program. Belongs on pt.wikipedia.org, if anywhere. And that's a big if. Looking into it now... —Spangineerwp (háblame) 22:34, 7 May 2010 (UTC)
Looks like bunk. Perhaps a bastardization of pt:Total Drama Action, but the details (like people) don't match, and besides, pt:Total Drama Action is much better. Speedy G5. —Spangineerwp (háblame) 22:47, 7 May 2010 (UTC)
Thanks for the translation & quick reply. Same story with the other page above? George Orwell III (talk) 23:08, 7 May 2010 (UTC)
Yes; just deleted it. —Spangineerwp (háblame) 01:17, 8 May 2010 (UTC)

Greek spelling[edit]

  1. Can anyone please check the spelling of the Greek word at 1911 Encyclopædia Britannica/Hieratic?
  2. If there's a better place to ask for such things, please tell me where.

Thanks. --Amir E. Aharoni (talk) 07:10, 8 May 2010 (UTC)

Looks OK to me - iota with rough breathing, epsilon, rho, alpha (it could have a macron, but that's not necessary), tau, iota, kappa, omicron with acute accent, final sigma. TomS TDotO (talk) 10:05, 8 May 2010 (UTC)

No, Wikisource isn't educational[edit]

http://archiv.twoday.net/stories/6328277/ --Histo (talk) 16:43, 8 May 2010 (UTC)

Subsequently recanted at Troll-L as someone calls it. — billinghurst sDrewth
It is a story, isn't it. One of the biggest pieces of fiction, not based on reality. It is amazing the sort of non-substantiated verbiage people throw around in fighting for a cause. Note that it wasn't this author who started the statement, it was m:User:vvv. Anyway, whatever, don't join the bunfight. <shrug> — billinghurst sDrewth 04:29, 9 May 2010 (UTC)
Navel-gazing nonsense. Apparently something isn't educational unless it is pure didactic explication. One cannot possibly learn anything important from reading The Grapes of Wrath, To Kill a Mockingbird, or Animal Farm; and you'll learn much more from reading a second-rate Macbeth study guide than by reading the play itself. Bah. Hesperian 06:03, 9 May 2010 (UTC)
The good stuff is certainly educational. No doubt you can find some stuff that isn't but it's not as if we had lots of pornography as Commons is alleged to have.--Longfellow (talk) 09:30, 9 May 2010 (UTC)
What could we possibly learn from the Encyclopædia Britannica, Complete Encyclopaedia of Music, Debates in the Several State Conventions on the Adoption of the Federal Constitution, or indeed any of the many works of fiction and non-fiction written over the centuries? Civilization only started when the English Wikipedia was founded, everyone knows that. :) —Pathoschild 18:47:32, 09 May 2010 (UTC)
... And Gesenius' Hebrew Grammar, which is actually used by students and professors. --Amir E. Aharoni (talk) 18:54, 9 May 2010 (UTC)
Actually, I think that even before the purge on Commons, we had a lot more pornography than Commons; Wikisource:Erotica is a decent start on the available written pornography.--Prosfilaes (talk) 23:41, 9 May 2010 (UTC)

Grammar 1911[edit]

I finished the first proofreading and formatting 1911 Encyclopædia Britannica/Grammar. More proofreading is certainly welcome, especially of the Greek words, since i don't actually know Greek.

A couple of particular questions:

  1. There's a Greek quote from the Iliad (search for "Il."). The printed encyclopaedia refers to "Il. i. 89", but this is most likely a mistake, as i found the quoted passage in Ἰλιάς Ῥαψωδία Ι - that is, the book of Iliad marked by the Greek letter Ι, not the Roman numeral i (I). I simply wrote "ix." (9) instead of "i." and added a source comment. Is there a more structured way to correct such mistakes?
  2. Is it appropriate to add this article to Category:Grammar?

Thanks in advance. --Amir E. Aharoni (talk) 18:04, 8 May 2010 (UTC)

Google Docs OCR[edit]

Just a beta but:

http://googlecodesamples.com/docs/php/ocr.php

--❨Ṩtruthious ℬandersnatch❩ 14:46, 9 May 2010 (UTC)

DjVu text layer[edit]

Hello folks, we would need some technical help about DjVu.

As part of a cooperation project with the Bibliothèque Nationale de France, they gave us book scans in PNG and OCR in XML ALTO (a higly detailed XML dialect : presentation (in french) ; XSD).

We are currently working on generating DjVu files with the images, and since we already have an OCR we would like to add a text layer. But we struggle to find detailed information about this layer.

Our question is : how should the text be formatted ? We can put plain text, but we could try to add some wiki mark-up (style), or we could keep some of the detailed layout information defined in the ALTO...

It is more a DjVu question than a WS question. While we generate the DjVu, we also generate the wikitext to initialise later the WS pages (so WS would not need this layer). And after all, other folks might want to retrieve the DjVu files from Commons for whatever purpose.

You may see one of our working page about this issue : fr:Wikisource:Dialogue BnF/Couche texte DjVu

Do you have input regarding those questions ?

Jean-Frédéric (talk) 19:22, 9 May 2010 (UTC)


The DjVu text layer doesn't just assign text to a page. Every word of text is associated with a bounding box that gives its position on the page. Without this location information, you cannot import OCR text into a DjVu file. Hesperian 23:39, 9 May 2010 (UTC)
Thanks for the answer. As stated above, the OCR supplied by the BNF are in XML ALTO format, so we do have higly detailed information about position for every single word. Here is a sample :
<String ID="PAG_1_ST000010" STYLEREFS="TXT_4" HPOS="1330" VPOS="876" HEIGHT="37" WIDTH="201" WC="0.99" CONTENT="SUPÉRIEUR"/>
As part of our processing, we could easily translate this into whatever syntax the DjVu needs. Our problem is that we do not what it is.
As you seem to be quite knowledgeable on this, could you please give us some pointers on the DjVu text layer format ? I scoured the web, to no avail...
Thanks, Jean-Frédéric (talk) 09:43, 10 May 2010 (UTC)
I am presuming that you have seen some of the white papers at [[2]] — billinghurst sDrewth 10:14, 10 May 2010 (UTC)
I have, and after checking several of them, I had assumed they were only about the implementation specifications and technical advantages of DjVu. Seems I was mistaken, so I will carefully read every one of them.
(Seems I pass for the idiot who does not even search before asking. Well, I did, but it also appears this is not totally untrue... Sorry for the inconvenience.)
Jean-Frédéric (talk) 10:48, 10 May 2010 (UTC)

One of the djvulibre binaries extracts the text layer in a text-editable format. The format is pretty intuitive. You should have no problem converting your XML ALTO data into that format. Once you have done so, the same binary can be used to import it into the DjVu. Hesperian 11:22, 10 May 2010 (UTC)

I have successfully created Djvu files from images + OCR text, where I don't know the coordinates. This is no big loss for Wikisource, since the text layer extraction just returns the text, and doesn't care about the coordinates anyway. This is how I proceed:
  1. For each JPEG image, create a djvu file using the program c44 (or use cjb2 for TIFF images).
  2. To merge all djvu page files into one big djvu file, run: djvm -c book.djvu page1.djvu page2.djvu page3.djvu ...
  3. Prepare a script file from all the text files, using the syntax shown below. Be careful to use escape characters properly for the included text.
  4. To add the text layer, run: djvused -f scriptfile book.djvu
select 1;
set-txt
(page 0 0 1 1 "text for page 1")
.
select 2;
set-txt
(page 0 0 1 1 "text for page 2")
.
save
--LA2 (talk) 09:56, 17 May 2010 (UTC)

DropInitials and alt text[edit]

I want to call everyone's attention to a problem involved in using images to substitute for letters (or words). When copied, the text will not be accurate, but will rather include the filename of the image. Fortunately, there is an easy fix for this - the alt property of the image should be set to the text being replaced. See my edit at [3] for a case in point. --Eliyak T·C 02:32, 10 May 2010 (UTC)

This is an important point, and I have attempted to do this, but I'm not sure how to make it work. If I hover over the initial in the example above (after disabling popups), which uses alt= foo, I get nothing. I had got the idea somewhere, probably a foolish misinterpretation, that the function was enabled with |alt-text= foo; this shows like anything else that is not a parameter. Cygnis insignis (talk) 15:12, 10 May 2010 (UTC)
You can get title-text (i.e. hover-text) by using an unnamed parameter, such as |foo. However, it is the alt-text (|alt=foo) that controls what text is copied when the image is selected with text. --Eliyak T·C 17:52, 10 May 2010 (UTC)

mw:Extension:LilyPond and Bugzilla[edit]

Tim Starling has recently addressed at what he sees are the shortcomings of Lilypond that he believes need to be addressed to get at bugzilla:189#c105. As we seem to be the wiki that has the most benefit it would be fantastic if one of our developers might be able to look at the Bugzilla report and see if we can achieve anything. Thanks — billinghurst sDrewth 13:02, 11 May 2010 (UTC)

Meanwhile, what about something like nl:Sjabloon:Muziek? It looks good (not enough, I suppose). -Aleator (talk) 23:23, 12 May 2010 (UTC)
It looks like there's still discussion about which format it should accept, and it's still not clear whether there's any way the extension can usefully accept Lillypond. But it seems like ABC is the way forward, that the extension could accept multiple input formats if they can be implemented (i.e. implementing ABC now doesn't affect whether Lillypond, etc, could be added later). Of course, the extension would call third-party software to do the conversion; it looks like they've basically decided which program that should be (Lillypond and its various conversion programs) as well as evaluated and ruled out another program. So that this point the best way forward may be an extension that can call Lillypond to display ABC music notation. -Steve Sanbeg (talk) 22:51, 15 May 2010 (UTC)

Copyright status of book by UK government commission[edit]

Does anybody here know what the copyright status of The grouse in health and in disease (vol 1) is? It was published in 1911 (so it is PD in the US), as the report of an Agriculture Ministry commission. I thought it would be Crown Copyright, and hence public domain as over 50 years old; but commons:Template:PD-UKGov only mentions things like photos and engravings. If the copyright was held by the members of the commission, who are mentioned by name as authors for some chapters, then I think it is in the public domain, but I'll need to confirm all the commission members died in 1939 or earlier, and that this would hence be out of copyright. —innotata 21:26, 12 May 2010 (UTC)

It is Crown Copyright, and I would have said that it came out of copyright after 50 years. As it happens I am just up to those pages in one of my new works. See s. 39 here, continued here and here. With relation to Commons tag, it just sounds as though it was written for images (their focus) rather than for the Act, and may need some tweaking. — billinghurst sDrewth 23:02, 12 May 2010 (UTC)
Thanks! I don't think I'll get around to adding this book for a while, but good to know that at least one bird species monograph is public domain. —innotata 23:07, 12 May 2010 (UTC)
I've just uploaded the book and created Index:GrouseinHealthVol1.djvu and Index:GrouseinHealthVol2.djvu. This book is massive, and I'd like to see some others finish before I start working on it. —innotata 15:42, 13 May 2010 (UTC)

"Androcles and the Lion" in the Shavian alphabet[edit]

Would it be possible to put the Shavian alphabet transliteration of Androcles and the Lion on Wikisource? The play itself is out of copyright. The transliteration was published in 1962. Page 4 of my edition says, "Any non-copyright work may be transliterated into the Shaw Alphabet without permission. In the case of a work still in copyright, however, permission must, of course, be obtained from the owner of the copyright." Androcles is out of copyright, so under its own terms the transliteration must presumably also be.

And if we discover we can put the text on Wikisource, should we? Marnanel (talk) 00:03, 13 May 2010 (UTC)

I've got a copy, and have considered scanning it; I think it's safe copyright-wise.--Prosfilaes (talk) 10:22, 13 May 2010 (UTC)

Template:ROOTPAGENAME[edit]

Some time ago, I posted here the Template:ROOTPAGENAME. Just as other "magic words" like PAGENAME it can be used without parameters, and gives back the name of the main page into ns0, or of Index page in "normal" pages into nsPage: coming from muilti-pagefiles (djvu, pdf).

This my question: is here another similar template or trick to obtain the same result? We do a very large use of this template into it.source, as a good automation tool; I'd like to use it for some tests here too. --Alex brollo (talk) 08:32, 13 May 2010 (UTC)

Need a more permanent record of New Texts[edit]

We are now doing quite well at finishing works and distinct parts of works that we are bringing to fruition. Works added to {{new texts}} can come and go in a few days, and while it lends to activity on the main page, it seems that works can be a flash in the pan. I am wondering whether people can identify a low energy means to collect our completed works into a summarising work. My initial thoughts were simply a page that collects works completed per month, eg. Wikisource:New texts/2010/May. This could be linked to from the new texts. To me, nothing readily springs to mind that is close to maintenance free, so wondered what other ideas other had to undertake this. — billinghurst sDrewth 15:48, 14 May 2010 (UTC)

My suggestion (with a stolen thought from Billinghurst): have a form (like the upload form) where you enter the name of the work, the author, translator, date, etc (alternatively, just use an edit box, and manually set it up). This script then can add the text to {{new texts}}, and also add an entry to an archive page. This approach can also be used for CotW, PotM, FT, etc. This also saves users burying around in {{new texts}}, or {{featured text}}, or wherever.
This will need a little bit of programming work, but once up, should be a generally trouble free way of maintaining the lists and recording a history. Inductiveloadtalk/contribs 16:38, 14 May 2010 (UTC)
My experience here is absolutely limited, so don'n matter if my suggestion is fool.... I drive a bot into it.source whose job is to scan Recent Changes, and to read new contributions and "do things" (refining texts, adding templates, monitoring changes into key data to update lists and so on). I'm adding elements to the list of such "things" to the main script, and I guess it will turn out really a long list, the only limit seems fantasy; I think I'll add too the kind of work you're talking about, it seems not so hard and very interesting. Thanks for suggestion! --Alex brollo (talk) 21:11, 14 May 2010 (UTC)
Actually Alex, that sounds almost like a plan. We could get a bot to watch certain pages and/or to collect certain data from Recent Changes, and/or if necessary set a bit on a page/header type, and then to collate edits for each month and build a history of what was. So we can present CONTENT NEWS FOR MAY 2010. — billinghurst sDrewth 08:49, 16 May 2010 (UTC)
Well, I added the new scripts to do some tests here to my collection of scripts, just those than I'm working about for "scanning Recent Changes project", and I published here the whole collection. My scripts are extremely raw I guess - nevertheless some of you could perhaps find out something interesting. I pubished them after a kind request of a user (on the contrary I would not publish them at all...) and you will found them "just as they are" here: User:Alex brolloBot/Python scripts.
I see the opportunity for a question to en.source community. I'd like to write a utility to build a decent PDF book from proofread works. To do this, I need to write some wikicode, using a page for every section/chapter of the ns0 version. In your opinion, can I use a ns0 subpage? Or it's better to create such pages into another namespace (I could use the talk page of the work, or a subpage of my user page)? The problem, when using ns0, is that such pages sometimes are picked out by Random page routine. --Alex brollo (talk) 17:45, 16 May 2010 (UTC)

New or old edition?[edit]

I'm looking to upload W. G. Beers' (d. 1900) book Lacrosse: The National Game of Canada but I was wondering which edition is preferred. The 1st edition published 1869 or the new edition published 1879? Thanks --Yarnalgo (talk) 19:41, 16 May 2010 (UTC)

Why not both? Jafeluv (talk) 19:48, 16 May 2010 (UTC)
Both is fine, but you'll still want to prioritize which you proofread first. For that, you're probably the person in the best position to decide. For a work of non-fiction, I'd generally prefer the newer version, because it probably clarifies some things from the original work. But if there were actual changes to the sport in those 10 years, then there's value in capturing the "original" work on the subject. —Spangineerwp (háblame) 20:51, 16 May 2010 (UTC)
As Spangineer said, there is no right or wrong answer. My general aim is the definitive edition, and while that is usually the first, sometimes it is not. Also, see the discussion #Choosing a text to upload (above) which is also part of the process.

To the specifics, what changed between first and second edition? Which is better quality? Which do you think has the better value to the reader? All pages are there? — billinghurst sDrewth 23:42, 16 May 2010 (UTC)

Forgeries, dubious works and the like[edit]

A Manifesto from the Provisional Government of Macedonia - 1881 has again raised the issue of works of dubious origins, forgeries and possibly their translations. That may need to have our handling of such works be tweaked. We need a more robust means to state

  1. That the original work may or may not exist as claimed, however, where it is clearly a published document, has undergone some level of peer review (ie. fooled lots of people to make it to the mainstream) it becomes a legitimate document to host.
  2. Clear guidance on the selection and application of copyright tags, suggestion is use of {{PD-Disavowed}}
  3. Should have a clear (hopefully short) statement within the notes of the provenance of the document, and if the document is worthy (most likely it will be to make its way through peer review) of an encyclopaedic article that it is linked to that.
  4. We also have a tag {{fidelity}} which is not sufficiently accurate to cover these situations, so maybe we need {{dubious}}

Such a direction is clearly for community consultation. — billinghurst sDrewth 01:55, 17 May 2010 (UTC)

Would a template Template:Controversial be the same or is it different? --Zyephyrus (talk) 11:06, 17 May 2010 (UTC)
It wouldn't be my preferred word, as something that is insulting can be controversial, though not dubious. I had thought of questionable provenance disputed, forgery, disputed authorship though thought language-wise less valuable. — billinghurst sDrewth 12:36, 17 May 2010 (UTC)
I think we are talking about "contentious material". It clearly is a good idea to flag up anything of that nature, and to allow for different "flavours", both by having a template with editable text, and by allowing for indirection so that explanations can be placed in notes, or on the Talk page. Perhaps several passes would be needed to allow for the full range, all the qualifications that might be required. Charles Matthews (talk) 13:05, 17 May 2010 (UTC)
That word suits. So part of it could function like {{edition}}, leading to where the discussion can take place and sides of the stories can be explored. — billinghurst sDrewth 13:28, 17 May 2010 (UTC)

Questionable single character[edit]

I apologize if the answer to this is easily found elsewhere, but I haven't been able to find it.

What is the best way to mark a single character as questionable or unsure? That is, I'm working from a digitization of an old document, and occassionally run into a word with one or two characters that I can't make out either from the copy or by contextual clues. I can take a best guess, but is there some way to mark just those characters to indicate to readers or reviewers that it was a guess? Also, when you have a definite misspelling, do you indicate that the error comes from the source (e.g. use "sic") or just leave it there for readers to wonder? Thanks, Cmadler (talk) 20:03, 17 May 2010 (UTC)

On the second point, {{sic}} explains (visible only in wikitext). Charles Matthews (talk) 21:06, 17 May 2010 (UTC)
The it.source approach to this problem is to use {{Pt}} (Pt is a memo for Page-Text). It simply shows two versions of a word/a period passed as parameters 1 and 2, and it was originally used just as your {{Hws}} with a slight difference: it doesn't add a hyphen to the hyphenated word start, it has to be enclosed into parameter 1. This simplifies a little bit the code, since the template simply shows in nsPage what is passed parameter 1 and in ns0 what is passed as parameter 2. Recently we added a title= into the template, so that you can see what the template will show into ns0 simply pointing the hypenated word into nsPage with your mouse, avoiding to enter into edit mode. Then we realize that there are many other uses of such a generalyzed template: some pretty simple, some pretty complex (as the case of long references, splitted into more than one page). One of such uses is typos management; our idea is to use Pt passing the wrong word as parameter 1 and its fixed version as parameter 2, so that both will be registered, and the "wrong one" will be shown into nsPage, while the "right one" will be shown into ns0. The title trick allows to verify that the right version has been registered, as I told. We don't use a {{Hwe}}, we simply include Hwe into noinclude tags. Really I'd like to use our beloved Pt here too... but it's too similar to your Hws to be added IMHO. --Alex brollo (talk) 22:59, 17 May 2010 (UTC)
There is {{SIC}} which leaves a visible note of bad speeling. I use it only for a visual clue so that people don't correct it. If it is a quote of an earlier time, people generally can accept alternate spellings without question. Also, it is a case of judicious use. If it is a poor text, you wouldn't add it all the time, and that might be more a case for something in the notes= field. With regard to illegible, I simply put (illegible). — billinghurst sDrewth 01:56, 18 May 2010 (UTC)
Thanks for pointing out {{sic}}, that's just what I needed (only one clear misspelling in this particular text). In regard to the other issue, my point is that part of the word is illegible. For example, I have in one place the name of a horse "S??p", which I think is either "Snip" or "Ship", but based on what I can see, could be either. Elsewhere I have another horse named "Li tle?on" which I think is "Littleton", but I can't be sure, and a person named "Mr. J. Jo?nstone" which I think is "Johnstone" but could also be "Jonnstone". I don't want to just mark it as illegible, because in each case I can make out a substantial part of the name, but I think I need to somehow clarify that parts of the name are not fully legible. Cmadler (talk) 12:04, 18 May 2010 (UTC)
If it helps, the text I'm working on is The New York Times/Sketch of Ten Broeck, source is online here. Thanks, Cmadler (talk) 12:05, 18 May 2010 (UTC)
If it is a letter that is illegible, you could do something like Johnstone<ref>{{user annotation|maybe Jo<u>n</u>nstone}}</ref> and ensure that you have a REFERENCES section on the page. If other footnotes exist, then we would look to split up the refs to being those in the work, and those that are user annotations. In which case we would add group=user or similar to the ref tag.
Thanks, that's just what I was looking for! Cmadler (talk) 13:35, 18 May 2010 (UTC)

Bot creation of a pdf book from proofread works[edit]

Just a plan so far; I'm going to test a script to build a PDF book from proofread works, by bot. I'll work into a personal sandbox (a subpage of User:Alex brolloBot); I presume that I'll need to create subpages of that subpage too... it all goes wrong, some delete job for a willing admin.

As soon as works will begin, I'll post here a link to pages and to scripts I'll use. The model will be Wikisource:Books/Equitation, that I built as a human user some time ago.

Can any of you suggest a valuable proofread work to use for my try? The best would be a medium complex book, built using the last version of proofread extension (t.i. using pages index tag), SAL 50% or 75% (just to see ongoing edits into resulting pdf book). --Alex brollo (talk) 09:55, 18 May 2010 (UTC)

I would suggest either WS:FTC or WS:PotM as a source of completed texts that may be valuable to have as available books. — billinghurst sDrewth 12:30, 18 May 2010 (UTC)
Thanks sDrewth. I'll use A Beacon to the Society of Friends, a "hard case" since it contains a pretty complex formatting and it is not fully validated, so some edits can be expected. Its "mirror PDF source code" will be posted into User:Alex brollo/A Beacon to the Society of Friends. --Alex brollo (talk) 14:02, 18 May 2010 (UTC)
I was wrong. #lst is not rendered... Simply, Equitation doesn't need it: it only uses full-page transclusion (that runs). A (bad) compromise could be subst: of sections, that runs, but I found a second issue with multi-column texts... so: I leave my test, accepting its failure. :-( --Alex brollo (talk) 23:54, 18 May 2010 (UTC)

Side by side image view (proofreading)[edit]

I'm struggling to understand the explanation at Help:Side by side image view for proofreading, and it appears that no one is answering questions on the talk page there, so I thought I'd ask here. That Help page refers to "Page", "Index", and "Main" namespaces. All the examples are multi-page works. How does this apply for a single-page work? I'm specifically working on The New York Times/Sketch of Ten Broeck, and I've created and uploaded the DjVu file at Commons:File:Ten Broeck.djvu. How do I make that work? Thanks, Cmadler (talk) 14:47, 18 May 2010 (UTC)

The first thing you must do is create an index page. The index must have the same name as the file (plus image extension), so you want to create Index:Ten Broeck.djvu. Fill out all the necessary fields and save it.
Then you'll get a list of pages (as red links) in the Page: namespace to click on and proofread. Since this djvu has only one page, you'll only get one red link.
When you want to use the proofread text on the actual page in the main namespace (i.e. on [[The New York Times/Sketch of Ten Broeck]]) go to that page and use the following notation:
<pages index="Ten Broeck.djvu" from="BEGINNING PAGE NUM" to="ENDING PAGE NUM" />
Likely, you'll want to use "1" for both the from and to parameters.
If you have any other questions (or I'm not being clear), don't hesitate to ask.—Zhaladshar (Talk) 15:18, 18 May 2010 (UTC)
Can you take a look and see how I did? Thanks, Cmadler (talk) 15:49, 18 May 2010 (UTC)
It looks good to me. It also looks like you've proofread the work, so you should edit the Page: and mark the yellow radio button, to indicate you've gone over it pretty well. Also update the progress on the Index page itself so that it reflects that it's been proofed and now needs to be validated.—Zhaladshar (Talk) 16:01, 18 May 2010 (UTC)
I think I got those done. Thanks for the help! Cmadler (talk) 16:27, 18 May 2010 (UTC)

Missing documents project?[edit]

Is there a Wikisource project that lists "missing" documents, i.e. highly notable public domain works which we should have but do not? BD2412 T 20:28, 18 May 2010 (UTC)

You could try Wikisource:Requested texts. Suicidalhamster (talk) 20:53, 18 May 2010 (UTC)
Not exactly what I was thinking of. I have been putting together a list here of public domain documents missing from Wikisource which are listed in various compilations of most important books/historic documents. Perhaps I should move this to a subpage of Requested texts? BD2412 T 00:23, 19 May 2010 (UTC)
I like that list, and your idea of making it a subpage of Requested texts. Would it be possible to add your sources at the top? Is it an exhaustive list of works that we don't have (from those compilations, I mean)? —Spangineerwp (háblame) 00:46, 19 May 2010 (UTC)
I'd actually rather not, because some of what went into these was taken from works which are themselves under copyright. The list itself is a compilation of what is in the public domain from each of those works, each work being a published list of "most important books/documents", but I'm sure it covers everything we're missing that is out of copyright from all of the ones I looked at. BD2412 T
Have you check this against the list at Wikisource:WikiProject CrankyLibrarian? — billinghurst sDrewth 01:32, 19 May 2010 (UTC)
No - I was not aware of that project. I suspect that some of the works on my list already exist in Wikisource, or on lists like that one, but require a redirect to an alternate title usage. BD2412 T 01:38, 19 May 2010 (UTC)
I'm going to go ahead and move this list to a subpage of Requested texts, then. Cheers! BD2412 T 02:27, 19 May 2010 (UTC)
It is now at Wikisource:Requested texts/important books and documents. BD2412 T 02:34, 19 May 2010 (UTC)


This is a good idea. I have long been amazed that we don't have Boswell's Life of Johnson here, a text widely regarded as the greatest ever biographical work. I note it still isn't on the list. I wonder how many more such blind spots we have.... Hesperian 04:34, 19 May 2010 (UTC)

If we're looking for works that are of most use to Wikipedia, we should probably focus on biographic dictionaries.
Lists of people who received a certain medal or achieved some particular honour are useful in Wikipedia for establishing that a person was notable.
Then again, if the work is already available somewhere else, the added value of copying it to Wikisource can be put in question. For example, the English Wikipedia has 19,000 references (not bad) to various issues of The London Gazette, the official journal of the British government. These references are using the w:Template:London Gazette which links to the website where these have been digitized as images and searchable OCR text. I started to proofread two issues from 1836 (19345 and 19346), but it would take a long time before Wikipedia started to link to Wikisource rather than the official website for the same publication. --LA2 (talk) 07:47, 20 May 2010 (UTC)

OCR quality[edit]

Are there any standards for measuring OCR quality? Given that 100% accuracy is the fully proofread text, we should be able to measure how much the raw OCR deviated from that. I can imagine either counting the number of characters that differ, or the amount of time (per kilobyte) that was required for manual proofreading. This morning I was able to proofread one text of 18.2 kilobytes in 52 minutes, or 6 characters per second. I estimate this text to be of medium OCR quality. I've seen texts with much better OCR, and also some with much worse. --LA2 (talk) 07:32, 20 May 2010 (UTC)

That’s a good question, the BnF give us OCR with a percentage but without explanations of what it means exactly (w:Optical character recognition is not explicit either).
On Wikisource we use an old Tesseract version, maybe upgrade to the last version it will be a good idea too. Tesseract can be trained too to improve the efficiency.
Cdlt, VIGNERON (talk) 17:18, 21 May 2010 (UTC)

Search engine by book[edit]

What would you think about putting the template {{engine}} on the sources main pages? JackPotte (talk) 19:43, 20 May 2010 (UTC)

If it works I would like it. Have you tested it? JeepdaySock (talk) 10:47, 21 May 2010 (UTC)
I'd love to see some examples of outputs of texts. — billinghurst sDrewth 11:37, 21 May 2010 (UTC)
So let's test it here:
JackPotte (talk) 16:14, 21 May 2010 (UTC)
First thing I noticed is it does not find Jeep on this page but it does find Jeepday. when I look for Cauchon in Personal Recollections of Joan of Arc, it only found Cauchon in Personal Recollections of Joan of Arc/Book III/Chapter 23 while Cauchon is in several chapters including Personal Recollections of Joan of Arc/Book III/Chapter 24, I don't think {{engine}} is ready for prime time yet. JeepdaySock (talk) 16:39, 21 May 2010 (UTC)
Ah, the issue is TRANSCLUSION, and where the search engine sees the text. I don't think that the search engine necessarily processes a transclusion before it indexes.

Compare Babington → prefix:List of Carthusians and Babington → prefix:Page:List of Carthusiansbillinghurst sDrewth 16:53, 21 May 2010 (UTC)

JackPotte proposed the same template on the French speaking Wikisource. Its sound a good idea. Could you fix it and improve it here so we can use it on WSfr too ? I had an other idea, why use a template ? why not integrate the search on the right navigation bar ? Cdlt, VIGNERON (talk) 17:11, 21 May 2010 (UTC)

Flipping illustrations[edit]

Hi! I am working on posting images for Houston: Where Seventeen Railroads Meet the Sea - Some of the images are already on Commons, but they are set to be "right-side-up" for use on Wikipedia. In the book some images of tall buildings are originally on their sides. Is it possible to use an image command to "flip" a right side image to mirror the way it was printed in the book, or do I have to upload "right side up" and "on side" versions to the Commons separately? WhisperToMe (talk) 14:51, 24 May 2010 (UTC)

We generally would align images the right way up in our works. In our rendering of works, we follow the principle of as the author wanted, not how the printer produced. So what exists in the book is the best guidance, not law for a facsimile. — billinghurst sDrewth 15:46, 24 May 2010 (UTC)
Okay, so I will have the images right side up. WhisperToMe (talk) 17:04, 24 May 2010 (UTC)
These PNG files are converted from djvu. Go to the source and get jp2 files instead, using the tiff files would be ideal; the images are the main content of this work. See also Help:Adding images Cygnis insignis (talk) 16:02, 24 May 2010 (UTC)
Okay, so I have all of the raw JP2 files downloaded. Should I upload them to the Commons as they are before I do anything else with those images? I plan to put them in a separate sub-category WhisperToMe (talk) 17:04, 24 May 2010 (UTC)
A slight rotation on some would be perfect, you can judge it using the horizon. A closer crop maybe? Nice little book! Cygnis insignis (talk) 17:28, 24 May 2010 (UTC)
So far I started crops of the high quality JP2 scans. At some point I will have all of the images uploaded... WhisperToMe (talk) 22:24, 24 May 2010 (UTC)

Searching across Wiki databases[edit]

Is there a way to execute a single search across Latin character wiki databases, regardless of the language, without visiting each separately? I am searching for authors of articles in PSM and came to the conclusion that Google is biased in favour of their offerings. - Ineuw (talk) 17:07, 24 May 2010 (UTC)

Good question, we should have such a thing, though it is possible with external search engines. Restrict the search to wikisource.org, without the 'en.' prefix in the url, eg., a google search for interwiki links to Bürger was typed in as: Bürger site:wikisource.org. Links to author pages in other languages are usually present if they exist. I also have search engines for these, de: en: fr: es: and so on, added to firefox search box. Cygnis insignis (talk) 18:10, 24 May 2010 (UTC)

Great suggestion. Thanks. - Ineuw (talk) 18:53, 24 May 2010 (UTC)

A Drink Problem[edit]

This featured text doesn't have a license. —innotata 18:38, 24 May 2010 (UTC)

then it should have {{no licence}} appended. — billinghurst sDrewth 00:00, 25 May 2010 (UTC)
It is locked so only an admin can append {{no licence}} JeepdaySock (talk) 15:26, 25 May 2010 (UTC)
I added the template. —Spangineerwp (háblame) 16:08, 25 May 2010 (UTC)
I can see why it was protected, does it need to remain so? — billinghurst sDrewth 00:05, 26 May 2010 (UTC)
On Wikisource once a work is "completed" the pages are usually protected, as there is no need to edit them WhisperToMe (talk) 06:51, 26 May 2010 (UTC)
That is not the current practice, which is the indication of my statement. Current practice has been to only protect works/pages of high visibility, vandalism, or impact. — billinghurst sDrewth 07:19, 26 May 2010 (UTC)

This probably isn't the right place to start a discussion on our protection policy, but as I tend to err on the side of "protecting validated texts helps our credibility," I only lowered the protection of this to block new/unregistered users. If you feel that's still too restrictive go ahead and unprotect. —Spangineerwp (háblame) 13:08, 26 May 2010 (UTC)

It would be a useful discussion at some point, as I have always just followed our practice in my couple of years in the arena. Noting that our vandalism levels to this date have been on the low scale. — billinghurst sDrewth 13:31, 26 May 2010 (UTC)

Which PD to use?[edit]

In researching the contributors in PSM, I came across two who published in 1882 but departed in the 1940's. Which copyright/PD note should I use? Studied the Wikipedia copyright instructions but it was not helpful. - Ineuw (talk) 13:38, 25 May 2010 (UTC)

{{Pd/1923|1940}}.--Prosfilaes (talk) 23:46, 25 May 2010 (UTC)

Thanks. - Ineuw (talk) 01:24, 26 May 2010 (UTC)

Question about proofreading[edit]

There is a work that is now complete that I want proofread (Houston: Where Seventeen Railroads Meet the Sea) - Do I immediately submit it to Wikisource:Proofreading or do I submit it to another proofreader first? The original work is hosted at http://ia331435.us.archive.org/3/items/houstonwhereseve00farb/houstonwhereseve00farb.pdf - so the proofreader needs to simply compare this to the original work WhisperToMe (talk) 06:56, 26 May 2010 (UTC)

Current practice is to host DjVu files at Commons, and then to have the text into our Page: ns as per side by side proofreading as that enables easy proofreading, and ongoing comparison. Best example of that is any of our Wikisource:Proofread of the Month. We have departed from the expectation that for casual or collective proofreading that someone will hunt up a work to proofread, or even hunt up the right edition. — billinghurst sDrewth 07:23, 26 May 2010 (UTC)
Okay - There is a DJVU file available at the Commons: http://commons.wikimedia.org/w/index.php?title=File:Houstonwhereseve00farb.djvu&page=1 WhisperToMe (talk) 16:35, 26 May 2010 (UTC)
Do you mind if I give the file a more descriptive name (such as simply "Houston, Where Seventeen Railroads Meet the Sea")? —Spangineerwp (háblame) 16:53, 26 May 2010 (UTC)
Sure :) - You can rename the file to whatever you want WhisperToMe (talk) 17:02, 26 May 2010 (UTC)
Great—the file is now at commons:File:Houston, Where Seventeen Railroads Meet the Sea.djvu and locally at Index:Houston, Where Seventeen Railroads Meet the Sea.djvu. —Spangineerwp (háblame) 17:25, 26 May 2010 (UTC)
So should I enter it as a candidate for proofread of the month? WhisperToMe (talk) 17:54, 26 May 2010 (UTC)
I've added it to Wikisource:Proofread of the Month/little works. Feel free to go through it yourself and add the images and text to each page. Eventually it should appear as a POTM, at which point others will validate it for you. —Spangineerwp (háblame) 14:08, 27 May 2010 (UTC)
Thank you! I am in the process of adding the images and text to each page WhisperToMe (talk) 22:49, 27 May 2010 (UTC)
All of the images and text to be proofread have been added! WhisperToMe (talk) 23:25, 27 May 2010 (UTC)

Missing "text-align:left;"[edit]

float

As can be seen in the following image, it is missing something like "text-align:left;" for the first span of the element "prp_header" added by this script. Could anybody fix it? Helder (talk) 12:16, 27 May 2010 (UTC)

Ping ThomasV at oldwikisource:Wikisource talk:ProofreadPage. We tend not to prod his components, especially as they may get the same change across systems. — billinghurst sDrewth 13:01, 27 May 2010 (UTC)

Anchor[edit]

There is some kind of anchor in pages like: The London Gazette 19346#65. How have you added this? And how do you avoid conflicts with the template: {{chapter|65}} ? -- Lavallen (talk) 06:05, 28 May 2010 (UTC)

If you mean the page number - #65 - the link at left - [65] - is the 'anchor'. Works using the Chapter template do not use transclusion of the namespace "Page: ", but I see where a conflict might arise in the use of # in a link. Cygnis insignis (talk) 06:14, 28 May 2010 (UTC)
You must have added some kind of code to: [65], because there no such by default in MediaWiki. We have nothing like this at svWS. I have now added some code to the Page-template, but it does not help in pages with <pages index=.... -- Lavallen (talk) 06:33, 28 May 2010 (UTC)
As above, see if you can get an answer from oldwikisource:Wikisource:ProofreadPage. Cygnis insignis (talk) 06:47, 28 May 2010 (UTC)
It is in MediaWiki:Proofreadpage pagenum templatebillinghurst sDrewth 08:15, 28 May 2010 (UTC)

Adding the "T" in the introductory paragraph[edit]

At Page:Houston, Where Seventeen Railroads Meet the Sea.djvu/8 there is a decorative "T" in the "the" - How do I add that in? Obviously I have to cut out the "T" from the source image, but how do I place it in the text to give a similar feeling in the raw Wikisource text? Also I want to represent it as "the" so someone doing a text search will see a "the," but will visually see <image t>he on the page itself. WhisperToMe (talk) 01:47, 29 May 2010 (UTC)

Wrap the image inside {{dropinitial}}. To the remainder, it is starting to get a bit harder, though you can just try [[File:text text|parameters|frameless|T]] as it will be frameless then the T just becomes a text component, though less than perfect representation. — billinghurst sDrewth 03:19, 29 May 2010 (UTC)
Use |alt=T. This will fix it for text searches and for unsighted readers. Hesperian 05:23, 29 May 2010 (UTC)
Thank you! I think I got it now WhisperToMe (talk) 05:56, 29 May 2010 (UTC)

Ear candy[edit]

Here's a (useless) page that plays a sound everytime a page is proofread : http://toolserver.org/~thomasv/rcsound.html. It uses websockets, so it only works in Google Chrome for the moment; Firefox 3.7 is expected to support websockets when it's released. ThomasV (talk) 08:38, 30 May 2010 (UTC)

Indented text wrap in a table cells[edit]

When time permitting, can anyone LOOK HERE and suggest how to indent the text line in a table cell where the text wraps? - Ineuw (talk) 02:25, 31 May 2010 (UTC)

Like this. Hesperian 02:51, 31 May 2010 (UTC)

What else can I say but thanks! :-) - Ineuw (talk) 03:22, 31 May 2010 (UTC)

For some of the table formatting there is some classes available, eg. class=valignb; detail at Wikisource:Style guide/Tables and don't omit Help:Tablebillinghurst sDrewth 06:15, 31 May 2010 (UTC)

What else can I say but thanks again! :-) - Ineuw (talk) 13:48, 31 May 2010 (UTC)

a question[edit]

I don't know whether I'm leaving this massage in a proper place. I'm just wondering why some chapters of one of Elizabeth Gaskell's novels (Wives & Doughters) are missed? In fact after chapter 60. I'm really looking forward to reading the rest of it.

Chapter 60 (LX. ROGER HAMLEY'S CONFESSION) is the final at the source of transcript, [4], what else appears to be missing? Cygnis insignis (talk) 18:58, 31 May 2010 (UTC)
Wives and Daughters? Relative to http://books.google.com/books?id=8Zy1auuRt1oC our copy is missing the concluding remarks by the editor (page 589 ff.), but those start "And here the story is broken off, and it can never be finished. What promised to be the crowning work of a life is a memorial of death." That's all she wrote.--Prosfilaes (talk) 21:36, 31 May 2010 (UTC)
This is a text where we should be bringing over the images and applying match and split. Anyone know which text we want? http://www.archive.org/search.php?query=Wives%20and%20Daughters%20AND%20mediatype%3Atextsbillinghurst sDrewth 01:06, 1 June 2010 (UTC)