User talk:The Land

From Wikisource
Jump to: navigation, search

Welcome

Hello, The Land, and welcome to Wikisource! Thank you for joining the project. I hope you like the place and decide to stay. Here are a few good links for newcomers:

Carl Spitzweg 021-detail.jpg

You may be interested in participating in

Add the code {{active projects}}, {{PotM}} or {{CotW}} to your page for current wikisource projects.

You can put a brief description of your interests on your user page and contributions to another Wikimedia project, such as Wikipedia and Commons.

I hope you enjoy contributing to Wikisource, the library that is free for everyone to use! In discussions, please "sign" your comments using four tildes (~~~~); this will automatically produce your IP address (or username if you're logged in) and the date. If you need help, ask me on my talk page, or ask your question here (click edit) and place {{helpme}} before your question.

Again, welcome! — billinghurst sDrewth 11:48, 27 February 2011 (UTC)

thank you![edit]

so how do you upload a DjVU with the OCR text already embedded? Am sure there's an explanation somewhere, but I haven't found it... The Land (talk) 12:27, 15 March 2011 (UTC)

  1. You first identify the full path name of the .djvu file on Archive.org by clicking on the All Files: HTTP link
  2. Copy that full URL and then open the online service Any2DjVu.
  3. Click button at bottom to open next page, tick off Grab Document from Publicly Available URL & highlight DjVu Document for verification or OCR
  4. Click button at bottom to open next page, paste copied full URL in the field for URLs, select single column for books & similar (select detect columns if original is more than one column per page) & tick I have read blah blah
  5. Click button at bottom to start OCR detection/verification. Wait anywhere from 3 minutes to 20 minutes depending on file size for the process to complete. You should see progress reports scrolling by
  6. Download finished file to your computer (rename it back to the original's file name if so desired).
  7. Then upload that file to Commons.

Your Done! Seems like you uploaded the original djvu straight from Archive.org & they don't always have an OCR layer in place (sometimes even when the info says it has one - so beware). — George Orwell III (talk) 14:31, 15 March 2011 (UTC)

Ah, great. Thanks! Also what's the convention here about page headers...? I think I'm getting the hang of other things. The Land (talk) 14:35, 15 March 2011 (UTC)
The "header" for articles in the main-namespace takes Template:Header if that is what you're looking for — George Orwell III (talk) 14:40, 15 March 2011 (UTC)
No, think I've got my head around that (though not actually done it yet)... I mean the section titles and other gubbins that appears on the top of the DjVu pages - does that need to be included in the plaintext pages? The Land (talk) 14:52, 15 March 2011 (UTC)
I tweaked Page:Royalnavyhistory01clow.djvu/32 to illustrate what is basically expected for a page's header/footer in the Page: namespace. If you do not see a header or footer automatically, hopefully there is at least a [+] button in edit-mode to toggle them on & off. To make 'em show all the time - you'll need to find that option in your user preference settings. — George Orwell III (talk) 15:15, 15 March 2011 (UTC)

An alternative where the DjVu file already exists at archive.org one … one can use http://toolserver.org/~magnus/url2commons.php (use the url djvu which you retrieve when you follow the All Files: HTTP link).

So where exactly would the missing OCR layer have been added following your method? — George Orwell III (talk) 03:30, 16 March 2011 (UTC)

Note that I prefer to utilise Commons:Template:Book rather than the default Information template. Also, I find that it is a lot easier to rename the DjVu file at this stage, usually to the filename that I wish for the final output, my reasoning being that File: → Index: → Main: namespace align nicely, and then you can then use some of the parser code to make things easier when it comes to transcluding. <sign>