Talk:The Ballad of Reading Gaol
| Information about this edition | |
|---|---|
| Edition: | The Ballad of Reading Gaol (1904); Thomas B Mosher; Portland Maine |
| Source: | Index:The Ballad of Reading Gaol (1904).djvu |
| Contributor(s): | Initially transcribed by "Faith Knowles" for Project Gutenberg and uploaded to Wikisource by 81.154.165.75. In Febuary 2008, it was overhauled by Mjodonnell using pagescans. |
| Proofreaders: | John Vandenberg |
citation request [edit]
Can someone add a citation to the edition of "The Ballad of Reading Gaol" entered here? I have a rather carefully compiled edition somewhat different from this one. But I don't want to enter changes if this is the correct text of a different edition.
--Mjodonnell 04:07, 11 February 2008 (UTC)
Hi, I would bet my boots that it comes from Project Gutenberg which is quite a strange etext, even for Gutenberg's early standards. It contains two versions, which I have placed side by side, and manipulated so that the lines of each correlate (i.e. I have added blank lines on the left hand side): User:Jayvdb/The Ballad of Reading Gaol (gutenberg).
I cant see any textual differences, so the differences are the white space, and the fact that the first version neglects to note where section "III" begins.
So, our current version is absolute shite, based on a poor digital copy with no provenance data associated with it at all. This is not our best work.
In order to do a good job of fixing this, I have set up a proper transcription project at Index:The Ballad of Reading Gaol (1904).djvu, completing all the front and rear matter. There are 32 pages left to be done. Does that DJVU file match your edition? Do you want to take over from here and create the rest of the pages ? John Vandenberg (chat) 13:04, 11 February 2008 (UTC)
- I have found two differences between those two editions in the gutenberg text.
| First edition | Second edition |
|---|---|
| A reguiem that might have brought | A requiem that might have brought |
| In which their convict lies. | In which the convict lies. |
- A 1909 edition with a few images. John Vandenberg (chat) 13:16, 11 February 2008 (UTC)
Have you done a comparison between the old text and this new text? If so, could you point out a few of the key differences? John Vandenberg (chat) 05:35, 20 February 2008 (UTC)
I have compared the page text and the DJVU text [1] and found the following differences:
| Gutenberg extext 301 (first of two transcriptions) |
1904 page |
1904 |
|---|---|---|
| Went shuffling through the gloom | Went shuffling through the gloom: | |
| With a hangman close at hand? | With a hangman close at hand. | |
| corpse | p.32 | corse |
| And that man's face was grey, | And that man's face was gray, | |
| The memory of dreadful things | p.40 | The Memory of dreadful things |
| And terror crept behind. | p.40 | And Terror crept behind. |
| Christ brings his will to light, | Christ brings His will to light, | |
| In which their convict lies. | in which their convict lies. |
I havent yet compared this with the gutenberg text. John Vandenberg (chat) 15:37, 20 February 2008 (UTC)
Having found a spelling mistake "reguiem" in the first of the two editions in the Project Gutenberg etext, which also appears in the initial upload to Wikisource, we can now say with 100% assurance that our etext came from that source. It is quite possible that the above differences actually appeared in print in a different edition that was used in the transcription for the Project Gutenberg etext; the only definitive answer to that will be if transcriber "Faith Knowles" knows which edition was used in 1995. John Vandenberg (chat) 16:00, 20 February 2008 (UTC)
audio [edit]
Gutenberg also has an audio (Project Gutenberg) , but I am not sure we want to add that until we have worked out which editions of this work were published or exist as MS. John Vandenberg (chat) 13:09, 11 February 2008 (UTC)
I'll do the transcription [edit]
My version is just from The Penguin English Library, which I generally trust. It looks like you dug up the first edition (it matches the "London, 1898" mentioned in my Penguin edition). The Penguin doesn't discuss how it edited the text, so I think the older publication is preferable.
I'd like to do this project. I transcribed the first two pages to get the idea. I haven't seen this transcription system before, and both WikiSource texts that I've looked at so far (this one, and Dies Irae) have been rather sloppy, so I didn't know how precise you were trying to be. I'm all for precise (does "anal retentive" need a hyphen?).
I didn't find a section in the Help pages on transcription projects. If you can point me to one, great.
I proofread your head matter, and corrected a couple of small things. I marked one of your blank pages "problematic," since I thought it might be a faded title page. But, having seen the whole front matter, I suspect it's a bleed across from a stack of pages before binding.
I also marked "problematic" the page of publication data, to question your use of "poem" markup on something that isn't at all a poem. (I corrected a typo in the Roman numeral date as well.)
I didn't mark anything "proofread" if I'd changed it, since that seems like bad security. Maybe it's OK for me to proofread it on a different day.
Questions that already arise in my first two pages:
- I put in the page numbers at the foot, but marked them with "noinclude". Right?
- After some puzzlement, I entered the first word of the poem as "He", with capital "H" and lower case "e". It's given with a huge illuminated block "H", and what looks like a capital "E", but I take it to be a lower case "e" in small cap style. Not sure whether to mark up such format info. It's not really part of the poem text.
- I made the line breaks and indentation that is part of the poem, but not the line breaks due to running over the right margin. They are not part of the poetry.
- I used a somewhat arbitrary 3-space indentation.
- I put in a textual note of the "flower icon" at the bottom of the second page of the poem (page number 4). I'm pretty sure that this is a subsectional divider in the poem. In my Penguin edition, there is an asterisk. I have another artsy printing, which has a woodblock picture here. No idea how to mark this properly.
- I put opening and closing "poem" brackets on each page. But, structurally, it's all one poem.
- I left the part number, and the flower separator, outside the "poem" brackets, but on second thought, they are really part of the poem.
Anal-retentively (it must have a hyphen in the adverbial form) yours,
--Mjodonnell 04:46, 12 February 2008 (UTC)
the flower [edit]
virginia is using the character "❧" for the flower, with HTML ❧ .
I think we should create an image of the flower that appears in the printed edition. John Vandenberg (chat) 05:21, 12 February 2008 (UTC)
flower vs. section mark [edit]
Hmmm. As a user of the poem, I would like a markup that indicates that it's a section break. Even Virginia's character is weak from that point of view, but maybe better than an image.
I'm a bit puzzled about the relation between the scanned edition, the transcription, and the final WikiSource product. The use of "noinclude" markup suggests some automatic processing into a version of the poem itself, as opposed to this particular printing. And the scanned images are available to those who want to see what it really looks like.
--Mike 05:34, 12 February 2008 (UTC)
- To see it in action, take a look at Index:Rusk note of 1951 and Rusk note of 1951 (click edit on this page). John Vandenberg (chat) 05:53, 12 February 2008 (UTC)
- The Wikisource software has the ability to display an image of the actual flower on the "Page:", but then display something else on the final rendition of the work.
- The syntax to do that looks like:
<noinclude>[[Image:Flower in The Ballad of Reading Gaol (1904).png]]</noinclude><includeonly>❧</includeonly>
- John Vandenberg (chat) 23:31, 12 February 2008 (UTC)
I think I'm almost grokking noinclude and includeonly.
I think I can take a good copy of the flower image out of one of the page images with the GIMP. If you notice one that looks particularly clean, you can point me to it. Also, any advice on the format (I'm thinking png, with a transparent background), pixel size, ...
Still not convinced that Virginia's weird character belongs in the includeonly version, rather than a more abstract sectional markup.
--Mike 03:05, 13 February 2008 (UTC)
Finished transcribing the text [edit]
I just finished transcribing the text. I'll break for tonight, then proofread your final pages tomorrow. I'll check over mine, but I think I shouldn't mark my own typing as proofread---seems to call for an independent eye.
I'll also work on pulling the flower image out. Not sure how much more I should do myself. I'm willing to try to finish it all the way, but not sure I've figured out all the issues in "transcluding."
- I didn't mark the larger small-cap fonts at the beginnings of sections. The beginning of the first section is even bigger than the beginnings of II-VI (and illuminated).
- I also didn't mark the line breaks that were not poetically meaningful.
- I noticed that the centered section numbers and flower stubs formatted fine inside the poem markup, and that seems to be the logical way, since the divisions are part of the poetry.
- Not sure whether the intermediate poem markers should be surrounded by noinclude. Not even sure whether this sort of bracketing a bracket is syntactically good. But it does seem that any abstraction of the poem text should get a single unbroken poem, not a sequence of short poems.
- I noticed that the italic formatting stopped at the end of a line. So, I marked each italic line individually. But that's structurally wrong---there should be one unbroken italic sequence across lines.
- Since we're grabbing the flower picture, how about the other designs? The vinish one at the very end is arguably the end marker of the poem. The others are more purely part of the book rather than the poem. But it should be easy to grab them with the GIMP (fingers crossed).
--Mike 03:15, 13 February 2008 (UTC)
Proofread final matter [edit]
I proofread the final matter, which was almost all empty pages. I made a correction on the one nonempty page, so I didn't mark it "proofread."
--Mike O'D 01:33, 14 February 2008 (UTC)
Uploaded flower icon [edit]
I GIMPed the first instance of the little flower icon used to separate subsections of the poem, from the 800-pixel version of the page image (which is the highest resolution I could find). I made the background transparent, and created a png:
This is easy to fiddle with, if there is a better source, or some suggestion for improvement.
--Mike O'D 01:43, 14 February 2008 (UTC)
- Nice!
- To respond to an earlier query ... in a perfect world, where everyone has limitless time to spend on personal projects, we would want every single part of this edition to be broken down into re-usable components, for two reasons:
- to allow high quality reproductions of PD works to be distributed cheaply in print, and
- mining the public domain enlarges the pool of resources usable for new works to be created
- There are some more decals worth grabbing on p19. The decorative "H" should be uploaded onto the Commons project, and categorised into commons:Category:H, where all the other decorative H are placed (yes, the commons project is a bit anal about categorising media).
- OK, back in the real world: these decals are not going to end up in front of many eyeballs. Only people who are actually interested in the production of edition are going to care. And now for the rub - the people that will appreciate those decals are probably also the same people that would be motivated enough to extract them from the per-page view, so it could be left for a task for the next person who is motivated (for whatever reason) to help improve this particular work. near enough is often good enough - each page is a collaborative effort. But if you do want to go all the way, the work could end up being "featured" - see "Featured text candidates".
- John Vandenberg (chat) 02:33, 14 February 2008 (UTC)
Here's a draft of the block H [edit]
(I already see from the preview how I could improve this a bit, but I'll wait for now.)
I did a similar version of the block H from the beginning of the poem. This one was harder, and not so obvious to what extent it should be a selected image vs. a simulation of the type block. I did an intuitive job of cutting out a lot of the obvious ink bleed, and presenting it roughly as the shape of the type block (with the ink color) on a transparent background.
I joined WikiMedia Commons, and started into the uploading process. But, I wasn't sure how to describe the permissions. I have a blanket license of all my own work as CC attribution-sharealike, but I'm willing to go even more liberal if you recommend it as making things more useful. I've always assumed that people will fudge the attribution part when things get too complicated to mention all sorts of tiny contributions, and I certainly won't sue anybody. Don't understand the exact provenance and license of the starting image, though. I assume that the original book has passed into public domain (but don't really know the UK rules, nor the attachment of copyright on typographers' decorations). Then, there's the photo by MicroSoft, which I suppose is licensed fairly freely or else you wouldn't have brought it onboard.
Anyway, for now I'll just post my pictures here on the discussion, in the spirit that they are drafts put up for critique. If you find any of them worth deploying better (particularly, posting in the WM Commons), go right ahead. I'll try to follow the process well enough to start doing it myself.
I find the GIMPing rather fun, so as long as the results look potentially useful, I'll do more. I can also adjust quite a bit to hints regarding the most useful form to provide. If the Commons posting is annotated well enough, then people can always trace back the original image, so I've assumed that I should add some value, and the best value to add seems to be a reasonable reverse engineering of the shape as carved or cast (in the sense of being able to redeploy the picture without the smear, not in the sense of studying the type block as an object, of course).
Yee hi!
--Mike O'D 04:08, 14 February 2008 (UTC)
but, the use seems odd [edit]
This sort of GIMPing is fun, and I can imagine the image being useful in some contexts. But, it seems odd to use this image in a version of the text that doesn't use the fonts of the original. The particular flower image seems to be essentially a special character added to a particular font. For the pure abstract text, it ought to be a structural markup of the subsectioning. For a particular display, it ought to be some character that's visually consistent with the rest of the font.
--Mike O'D 02:03, 14 February 2008 (UTC)
Decorative header bar [edit]
This one was pretty easy, due to the high contrast and small amount of ink bleed. I could clean up a bit of bleed, but it has a nice antiquey look as is.
Printer's mark: red anchor [edit]
I proofread the poem text [edit]
I just finished proofreading the poem text, and correcting a few typos. I found 0 errors on most pages, 1 on a few, never more than 1. I'm pretty confident in the poem text now.
I did not proofread the markup, and in fact it isn't final.
I did not mark any pages as proofread, since I think the proofreader should be a different person from the transcriber.
--Mike O'D 05:50, 14 February 2008 (UTC)
Experimenting with robust markup on first poetry page [edit]
I spent the evening experimenting with the first page of the poem, trying to find a robust markup that would correspond verifiably to the page image, and be usable to produce natural transclusions.
I improved some minor points---used the "uc" function to preserve normal capitalization in the all-cap title, used ":" for indenting.
The main point was to turn the page code into a template with plenty of parameters that could be fine tuned, but that had sensible defaults.
I found the "<poem>...</poem>" incompatible with sensible uses of templates. There appears to be no decent way (There's probably a very complicated workaround) to introduce the poem markers so that they are interpreted at the right level of processing.
I tried using "BeginPoem" and "EndPoem" templates to delay the parser's view of the poem markup, but then the beginning markup had no visible effect (and disappeared).
I am pondering whether to abandon "<poem>...</poem>", since it appears to be a rather crudely implemented layout markup, unsuitable for semantic markup. But, it is unfortunate not to have a well disseminated standard for poetry.
CLEANUP NEEDED: I created templates "PageOfPoem", "BeginPoem", and "EndPoem" to test their effects. They should probably be deleted, so that those nice names aren't used up. But I didn't see how to delete.
IDEA: I'll try just blanking the contents.
--Mike O'D 05:11, 20 February 2008 (UTC)
- The problem is that templates dont work within <poem>...</poem>" ? John Vandenberg (chat) 05:30, 20 February 2008 (UTC)
Parsing of <poem>...</poem> is deeply peculiar [edit]
It appears that the opening <poem> bracket takes effect immediately during the parsing of the text. If it appears in the default for a parameter, it causes the parameter definition (without the "<poem>" itself) to appear in the final presentation. Even the part of the parameter definition textually to the left of the "<poem>" mark becomes part of the poem layout. Presumably, it is on some sort of parsing stack, which gets output literally.
Trying the converse approach, and introducing the <poem>...</poem> brackets within a template definition (that's what I tried in the "PageOfPoem" template), a parameter expression within the brackets does not get expanded.
I've taught compiler construction, and implemented a programming language, and this is the sort of thing that happens when a language definition piles up gradually, without a clear concept of its structure. The same thing happens with Mathematica, BTW.
So, I've tried the first page with a hand-coded poetry markup. The main structural problem is that default values seem to apply to only one instance of a parameter. This invites confusion when changing things later.
--Mike O'D 06:11, 20 February 2008 (UTC)
- The parser and the lexicon are not 100% defined, in part because the language is expanded by extensions to the core MediaWiki software, and also because the language has always been "whatever the code said it was". There was an attempt at writing a proper grammar [2], but it isnt the same as is in effect in the software - as far as I know that codebase is now stale.
- Mixing extension tags (i.e. <poem>..</poem>) and templates is not a good idea. I still dont see what problem it is that you are attempting to solve, but feel free to keep experimenting. p.s. just now, I have proof-read a few pages. John Vandenberg (chat) 06:27, 20 February 2008 (UTC)
Marking a poem [edit]
Not sure what "extension" tag means. Is <poem> an extension of regular HTML? Presumably it's important to put other sorts of HTML markup in templates.
Here's the example that I should have given before:
}}}
A mathematician named Klein
Thought the Mobius band was divine,
{{{EndPoem|
Source code:
{{{BeginPoem|<poem>}}}
A mathematician named Klein
Thought the Mobius band was divine,
{{{EndPoem|</poem>}}}
I remember the "{{{BeginPoem|" showing up before, but maybe I hallucinated, or typed something else wrong, or maybe it depends on context.
This is the style in which I cooked up structural markup for an online journal published in LaTeX source. It was nasty, but doable, in LaTeX---seems undoable in this style if I use <poem>. It works with <center>.
What am I trying to accomplish? Partly, I'm feeling that out as I go. But, I think that this book should turn into a template (or set of templates) that is/are easy to compare against the scanned page images, and from which I can extract the pure poem text. "Pure poem text" is not completely well defined, but it certainly indicates the breaks between poetic lines and verses, but not between pages.
Regarding the "<poem>...</poem>" markup, in order to use it on the individual pages, it needs to bracket the poem fragment on each page. But, in a presentation of the pure poem text, there should be only one bracketing of the entire poem. Hence my desire to include vs. not include the brackets depending on parameter values.
OK, I was sure that <noinclude> would be no better, but:
A mathematician named Klein
Thought the Mobius band was divine,
Source code:
<noinclude><poem></noinclude> A mathematician named Klein Thought the Mobius band was divine, <noinclude></poem></noinclude>
But I'm not sold on <noinclude>, <includeonly>. They seem to cater to precisely two forms of a text fragment, while I easily imagine many more than two.
Cheerio,
--Mike O'D 07:12, 20 February 2008 (UTC)
- <poem> is not HTML. It is a special tag that is provided by a MediaWiki extension, called "Poem" which can be found on Special:Version#Installed_extensions. John Vandenberg (chat) 07:25, 20 February 2008 (UTC)
- Ah. I think I see the "issue" now. <poem>..</poem> is not intended to be a semantic designation of where a poem begins and ends — it is simply intended to designate area's of text that should be processed before they are emitted, so that leading spaces are replaced with which is a HTML tag that forces a space (by default HTML collapses consecutive whitespace). John Vandenberg (chat) 07:31, 20 February 2008 (UTC)
Test page template [edit]
This is a brief test of what I think a page template should be good for. It's as much for my own recorded understanding as for discussion, but ideas welcome of course.
I had to copy the page to a new "TestPageTemplate", because I couldn't figure out its proper name.
First I'll produce something closer to the look of the photographed page image:
The Ballad of Reading Gaol
I
He did not wear his scarlet coat,
- For blood and wine are red,
And blood and wine were on his hands
- When they found him with the dead,
The poor dead woman whom he loved,
- And murdered in her bed.
He walked amongst the Trial Men
- In a suit of shabby gray;
A cricket cap was on his head,
- And his step seemed light and gay;
But I never saw a man who looked
- So wistfully at the day.
I never saw a man who looked
- With such a wistful eye
{{User:Mjodonnell/TestPageTemplate|PrintedDecorativeHeader=[[Image:Ballad_of_Reading_Gaol-Mosher_edition-Header_bar.png]]}}
Other adjustments can be fiddled by changing other parameters.
Now the verse fragment alone:
The Ballad of Reading Gaol
I
He did not wear his scarlet coat,
- For blood and wine are red,
And blood and wine were on his hands
- When they found him with the dead,
The poor dead woman whom he loved,
- And murdered in her bed.
He walked amongst the Trial Men
- In a suit of shabby gray;
A cricket cap was on his head,
- And his step seemed light and gay;
But I never saw a man who looked
- So wistfully at the day.
I never saw a man who looked
- With such a wistful eye
3
{{User:Mjodonnell/TestPageTemplate|BeginPrintedPageNumber=<noinclude>|EndPrintedPageNumber=</noinclude>}}
That was a bust, and I should have guessed that the <noinclude> wouldn't take effect in that way.
On the other hand, changing the presentation of the lines (as one of my printed editions does):
The Ballad of Reading Gaol
I
He did not wear his scarlet coat,
For blood and wine are red,
And blood and wine were on his hands
When they found him with the dead,
The poor dead woman whom he loved,
And murdered in her bed.
He walked amongst the Trial Men
In a suit of shabby gray;
A cricket cap was on his head,
And his step seemed light and gay;
But I never saw a man who looked
So wistfully at the day.
I never saw a man who looked
With such a wistful eye
{{User:Mjodonnell/TestPageTemplate|PoemBeginLine2=}}
--Mike O'D 07:51, 20 February 2008 (UTC)
Remaining problems with TBoRG [edit]
I'm trying to learn the realistic possibilities for a project here, and scale accordingly. Two points that I think are important, and a third that snuck in because I thought it was part of number 2, but I was wrong:
- As a user of the poem text, I really need to know about the subsections. (I have direct personal experience on this one, as I've been studying the poem and noticing the poetic function of the sections.) You (User:Jayvdb) seem to have <noinclude>ed the section marks, given as flower pictures, except for one place, where I assume you just overlooked it. As a minimum, I think that transcluded versions need some indication of the segmenting. I'll provide the picture as <noinclude>, and an asterisk (used in my Penguin edition) as <includeonly>, unless somebody proposes something better soon.
- The transcluding page has section numbers on the transcluding page, and transcludes the individual poem pages for each section. That works for this particular edition, but we can't count on all section breaks coming between pages. I think I'll re-engineer those to be transcluded from the pages.
- I think that the <poem> extension is just not robust enough, and should be replaced by <br> for line breaks and : for indentation. I will wait a while before trying this, since it involves substantial text on each page. But it will be fairly easy to do with my editor. I think that I saw this point raised in a discussion somewhere, when I was trying to learn the ropes. But, I can't put my finger on it any more.
It also occurs to me that I don't know what to do about the "proofread" mark if I change something on a page.
--Mike O'D 23:48, 20 February 2008 (UTC)
Quickly replying out of order,
- 1. yes, section markers of some sort need to be emitted, however as the previous "gutenberg" version of the page did not have them, I didn't bother reconstructing trying to work out "the" solution for this - i.e. I left this creative choice to you.
- 3. We are moving away from using <br>/: for a few reasons, the most important being that <Poem> allows the contributors to use simpler "markup", resulting in fewer hurdles for non-technically minded contributors. The issue that we are having with <poem> is easily resolved using the makeshift solution described at WS:S#Poems that flow across multiple pages, but maybe there is a better solution.
- 2. We have another extension called Labeled Section Transclusion (LST) which allows for more complex transclusion, however this complexity should be avoided unless it is a necessity. And, the LST extension does not interact well with the <Poem> extension, so the developers are going to have to fix that.
John Vandenberg (chat) 00:42, 21 February 2008 (UTC)
The recent changes you have made, with "includeonly" section headers on the pages, is unnecessary. i.e. [3] [4] It makes the "wikitext" more complex than is necessary. John Vandenberg (chat) 03:02, 21 February 2008 (UTC)
thought I was carrying out number 2 above [edit]
I thought that I was carrying out number 2 above, and I probably confounded your reply to number 1 with my number 2. That is, the section numbers should be emitted by the individual pages, since there is no necessity for each page to be in a single section. Perhaps they should be given in one form for both the page version (<noinclude>) and the transcluded version (<nowkiki>


