Index talk:Library of Congress Classification Outline.djvu

From Wikisource
Jump to: navigation, search

Formatting reference note[edit]

Just to explain how I am laying out each page, and why:-

The heading of each item should look like this:

{{x-larger|'''{{anchor+|Subclass TN}}'''}}

In order for the contents list of each class to be able to link internally (as a custom TOC/Header combination matching the source material).

The content, using a table, often looks like this:

{| width="100%"
|width="20%"| ### || colspan="2" | FIRST ITEM
|-
| ### || width="10%"|  || SECOND ITEM
|-
| ### ||  || THIRD ITEM
|-

...

|}

The table, with predefined column widths, allows the indentations seen in the source representing subsections of the classification. Defined column widths should ensure that tables onn different pages all line up when seen in the mainspace. Each new subsection/indentation needs a new column; the first example of which should have a defined width of 10%. Colspan should be used to allow the other items to take up a full row with unnecessary line-wrapping. Altogther, this should correctly display the classification outline in the mainspace. - AdamBMorgan (talk) 12:12, 21 January 2011 (UTC)

I think that a revision might make the pages more useful. The column that is currently at 10% should be fixed at 20px. The reason behind this is that the current layout looks very strange on wide monitors. Here's what I'm talking about. Any objections? --Hardwigg (talk) 19:45, 16 July 2011 (UTC)
Sorry, I should have replied to this a while ago. I will change it but only when all of the pages are proofread and formatted. Getting everything to line up is a problem at the moment (and the major reason why this is taking a while to proofread). However, once everything isdone, it will be relatively easy to change the spacing of the entire index. - AdamBMorgan (talk) 12:51, 13 September 2011 (UTC)

Note on OCR text, raw text and style[edit]

As the pdf I uploaded put all the OCRed text in the first page rather than in each page to match the image, I have started copying and pasting the raw text into pages myself. Hopefully, this will make the total proofreading easier. This text is completely unformatted. The final style that I'm aiming for can be seen in subclass Z, where the use of tables should display the indented text faithfully when they are transcluded (as is already the case for subclass Z). - AdamBMorgan (talk) 17:45, 10 January 2011 (UTC)

Curious - would a .djvu of this .pdf help to make any difference as far as generating a proper OCR? — George Orwell III (talk) 22:10, 27 February 2011 (UTC)
Was too curious to see if I could make one at that online AnyDJVU server thing and to my surprise - it did. It's index is based on the PDF Index:Library of Congress Classification Outline.djvu amd lines up the same. Let me know if its worth using or not so I can delete it if it isn't or give you a hand to quickly port over the existing pages. — George Orwell III (talk) 23:17, 27 February 2011 (UTC)
That might be easier to adapt (lining up the call numbers and the subjects in the table is the most annoying part of the current index) and we should probably use DjVu instead of Pdf. Oddly, I did try AnyDJVU before I uploaded the Pdf and I didn't get a useful file out of it; I don't know why it worked this time.
Something new & different is going on at AnyDJVU - there are more options that much is sure. This went through awfully quick too.
Anyway, yes, it's better to use the DjVu. Is this going to involve manually moving individual pages or something similar? (I can't think of an easy way to port them over.) - AdamBMorgan (talk) 00:38, 28 February 2011 (UTC)
Sorry, you must still be under the impression I know what am I doing most of the time (not the case). My gut tells me not to do anything until the .DJVU is moved to Commons first or the images will constantly appear then disappear for days afterwards. Beside that, I know of no other way other than applying...
{{subst:Page:Library of Congress Classification Outline.pdf/1}}
in each corresponding .DJVU's page, manually. Maybe we should reach out to someone just in case there is some other way? — George Orwell III (talk) 00:53, 28 February 2011 (UTC)
[GOIII asked for some comment so, I hope this helps (from a quick read)] A few ways that we can handle it
  • Create the text article in the main ns with the OCR'd text (forget about the pdf), and then when the djvu file is in place just do a match and split. This will align the text and undertake the transclusion, and we just need to proofread and validate
The text aticles have existed for some time now though I'm not sure if it was complete or not. You should really inspect a page or two - these are basically the bullet-lists of the LLC catalog system that frames the Portal system here on en.WS now
  • We get someone to bot the moves in the Page ns. I believe that JackPotte (talkcontribs)'s has designed his bot to do moves, so that would be stick a request at Wikisource:Bot requests. (Noting that moving is not a run-of-the-mill process for the obvious reason). A one-to-one move from Page:filename.pdf/1 to Page:filename.djvu/1 should be fairly trivial. If transcluded need to fix main ns (see below)
At first glance this seems the way to go. Both old and new are page to page matched only the file extension changes from .pdf to .djvu
  • We move them all manually; it is not the most complex of moves, just time-consuming.
Would need the index: page created first to make it easy for ourselves, and we would be looking to move and preferably to suppress the redirect, noting that if we have transcluded pages, that we would need to get to update those promptly, and that is an easy bot task (we can even do ahead of moves, depending on when we want them broken). — billinghurst sDrewth 01:47, 28 February 2011 (UTC)
Index exists Index:Library of Congress Classification Outline.djvu but the file is uploaded to WS not Commons (my bad I was taking a chance the upload was pointless but it wasn't). Can you do that voodoo that you-do and move that to Commons before I hit the Jack Potte for his lucky charms? 01:59, 28 February 2011 (UTC)
Moved to Commons, deleted locally. — billinghurst sDrewth 03:05, 28 February 2011 (UTC)
The bot option looks the best to me. I'll finish setting up R (Medicine) as I'm alomst done there, then it should be easy to work out which pages need to be moved and which do not (I've already finished S-Z and the individual TOCs - first pages - of the other classes). - AdamBMorgan (talk) 14:32, 28 February 2011 (UTC)
I agree - asking Jack Potte to run a bot to move the pages sounds like the way to go first but I'm not so clear on how many of those pages are done and how many are just copy & paste of pg.1 PDF hidden text or the equivalent of what would amount to OCR text in the new anyway. If only a dozen or two pages are finished-finished then lets run the Thomas bot and move those formatted pages manually rather tying up Jack. I was wondering why so many pages are still "Red". — George Orwell III (talk) 22:20, 28 February 2011 (UTC)
Formatted pages are: 1-5, 12, 35, 46, 84, 96, 109, 131, 145, 179, 193, 197, 206, 237, 250-320. That is, the five proofread pages at the start, the first page of each class and everything from page 250 onwards. (There may be an odd page not noted here but, if so, it'll be easier just to re-format than to find it.) I'll start moving them with the normal move tool soon. - AdamBMorgan (talk) 01:17, 1 March 2011 (UTC)
I think everything has been moved now, including this talk page. - AdamBMorgan (talk) 13:11, 1 March 2011 (UTC)

Should I delete the PDF LoC Classification Outline?[edit]

Thanks for the DjVu version of the LOC Classification Outline. I think everything has been moved across to it from the PDF, just leaving unformatted text behind. Should I nominate the PDF index, and all of its remaining pages, for deletion? Can I speedy delete them myself as duplicates? I'm not really clear on deletion etiquette. - AdamBMorgan (talk) 13:20, 1 March 2011 (UTC)

Well we sort of azzed that up a bit. As I understand it, the Thomas Bot should have ran 1st, turning all the pages "red" and in process, fixing the originally embedded text as the first edit history and set the text in place upon viewing. Then, those pages already formatted in the PDF should have been deleted in the DJVU so when we moved the PDF page to the to DJVU page the histories could have been merged, retaining the original embedded text revision along with the 3 or 4 edits made of that specific page over in PDF index.
No matter now, what's done is done so I guess there is no issue -- you being the original contributor all this time primarily anyway -- of stepping on anybody else's edits either. I would not bother nominating it, & delete the pages themselves myself and the index last once all the pages are gone to insure there are no orphans.
Using F8 - djvu Updated or G4 - Redundant as our reason? you start from the front and I'll start from the end & work backwards until we conflict sound OK? — EDIT calling it a night. — George Orwell III (talk)
...and now it's done. Thanks again. - AdamBMorgan (talk) 18:26, 2 March 2011 (UTC)
Never a problem. — George Orwell III (talk) 01:16, 3 March 2011 (UTC)