Index talk:Dictionary of the Swatow dialect.djvu

From Wikisource
Jump to navigation Jump to search

Takashihakase Method Guide[edit]

  • Phase 1 Dictionary entry heading clean-up
Fix page heading
Fix dictionary heading entry
Fix section break
Split page in half
  • Phase 2 English formatting and clean-up
Check each English sentence:
Remove unnecessary spaces
Remove hyphenation in word-split at the end of line
Correct punctuation errors
Correct incorrect spelling
  • Phase 3 Teochew formatting and clean-up
Correct each Teochew sentence with proper diacritics
  • Phase 4 Proofreading
  • Indicate completion of each phase in the Discussion tab of each page

Tips and toolbar[edit]

  • There is a specific template to format every entry : Template:swatow entry. It provides a wikilink to the wiktionary and chinese characters are automatically written from right to left even if inputed from left to right in wikicode (examples in the documentation).
  • I've created a toolbar to add diacritics and subscript N. Here is the result: User:Assassas77/common.js. Feel free to copypaste it to your own common.js

Assassas77 (talk) 04:24, 10 January 2017 (UTC)[reply]

Objectives (to avoid overlapping tasks)[edit]

Transclusions features[edit]

Even if it's a bit early to talk about the transclusion, I was thinking about some features we could add to this dictionary to improve :

  • In the transclusion, we can use a bot to add a conversion from this old transcription system to modern Peng'Im system. We should first prepare a table of conversion including all the syllables combinations possible.
  • We can add sinograms as well for each example.

Assassas77 (talk) 17:15, 16 January 2017 (UTC)[reply]

Definitely.
I think the conversion to Peng'im could be done with on-wiki code. Suzukaze-c (talk) 01:21, 17 January 2017 (UTC)[reply]

Two-column layout[edit]

Should we really be using template:multicol? It won't look good when the Page:s are transcluded to the Main namespace and Help:Beginner's_guide_to_proofreading#Proofreading seems to say to ignore columns (For example, columns of text are not necessary and do not work well on Wikisource). Suzukaze-c (talk) 00:21, 21 February 2017 (UTC)[reply]

True, I've tried it too to check the result and well.... :/ Assassas77 (talk) 10:46, 21 February 2017 (UTC)[reply]
I found this which seems to have an interesting answer. Suzukaze-c (talk) 04:47, 23 February 2017 (UTC)[reply]
Yeah, I think we should follow that. Justinrleung (talk) 20:40, 23 February 2017 (UTC)[reply]

Section header template[edit]

@Assassas77, @Suzukaze-c: do you think we should make a template for the sections so we can write {{swatow begin|cwt}} instead of <section begin="cwt" />{{center|— cwt —}}? Justinrleung (talk) 22:11, 11 October 2017 (UTC)[reply]

That would be nice and convenient, but they need <section end="cwn" /> too. How would those be handled? Suzukaze-c (talk) 22:24, 11 October 2017 (UTC)[reply]
@Suzukaze-c: {{swatow end|cwt}}? It's kinda clumsy, though. Justinrleung (talk) 22:28, 11 October 2017 (UTC)[reply]
I agree: the original flat <section end=...> <section begin=...> is not really convenient. I haven't thought much about a way to improve it though. Maybe a single template acting as a section separator ?: {{section sep|cwn|cwt}}. The drawback would be for the first and the last one. Assassas77 (talk) 11:20, 12 October 2017 (UTC)[reply]

Radical pages[edit]

I'm starting the radical pages because we sometimes need them to check whether it is written 30 or 80, etc... I'll transclude them and add links and anchors. Assassas77 (talk) 18:12, 17 January 2017 (UTC)[reply]

Color progression on the index page[edit]

Hi, on my computer, color-progression is not displayed anymore on the index page, am I the only one ? Assassas77 (talk) 16:17, 20 March 2017 (UTC)[reply]

No.Suzukaze-c (talk) 17:53, 20 March 2017 (UTC)[reply]
well i've tried removing last edit and putting it back... colors are back since first edit... ?!?! i don't know what happened, but it's fine for me nnow Assassas77 (talk) 16:22, 28 March 2017 (UTC)[reply]
Looks like a null edit can be used to get the colors back. Assassas77 (talk) 13:22, 1 September 2017 (UTC)[reply]

By the way, thanks for the company on this work. :) @Suzukaze-c: Shouldn't you put yellow (for pages with problems) Page status on the page you worked on and are finished with? Assassas77 (talk) 20:03, 9 September 2017 (UTC)[reply]

I am actually not checking them at all yet; once I type in text, I don't look over it again... Also, I'm not sure it's right to say "proofread" when I am incapable of telling if there is a mistake in the romanization. Suzukaze-c (talk)
I think, it is actually considered proofread only when a second person proofread a second time making the flag green. So feel free to put the yellow flag. Assassas77 (talk) 09:05, 10 September 2017 (UTC)[reply]
I understood you the first time; I am not confident enough to mark the page as yellow... Suzukaze-c (talk) 23:17, 10 September 2017 (UTC)[reply]
Ok, no problem :) Assassas77 (talk) 07:26, 11 September 2017 (UTC)[reply]

Validation of pages[edit]

Ways to check the validation of pages :

  • check the characters in the variants dictionary : [1]
  • check the page number in Williams' Dictionary
  • check the radical in the table at the beginning of the dictionary
  • check the number of strokes (on the Wiktionary for example)
  • cross-check by using the dictionary itself and finding the right sinogram for each example phrases and sentences.

Assassas77 (talk) 14:40, 1 April 2018 (UTC)[reply]

Separating different uses of a syllable[edit]

It may be advantageous, @Suzukaze-c:, if the long space that separates different uses of a syllable (as in "kîaⁿ hàu kâi kíaⁿ; a filial son.put hàu kâi kíaⁿ; an unfilial son.", for example, where there is a {{gap}} inserted) within an entry is reproduced in the proofread pages, as there doesn't seem to be any other formatting difference separating the Swatow topolect and the English. Mahir256 (talk) 05:05, 20 April 2018 (UTC)[reply]

Hmm, and maybe it could be converted to a line break when the page is transcluded into Main, too. That would be nice. I worry about the legibility of the markup though. Also, the {{gap}} would have to change size, and not be a fixed size of 2em, for aesthetics. Having a 2em space start a line wouldn't look nice. @Assassas77:? Suzukaze-c (talk) 05:21, 20 April 2018 (UTC)[reply]
Using a line break between each example/use would be nice in main. About the {{gap}} in Page: mode, hmmm... (thinking), there is indeed the problem of having a line starting with {{gap}}. Maybe we could start with a placeholder until someone comes up with an elegant solution : start with a simple model (specific to this dictionary Category:A pronouncing and defining dictionary of the Swatow dialect templates) that introduce a line break in main but nothing in page mode :
{{#ifeq:{{NAMESPACENUMBER}}|0|<br/>| }} or {{#ifeq:{{NAMESPACENUMBER}}|0|<br/>|{{gap}}}}
I created {{Swatow gap}}. Feel free to modify, I'm just improvising :D I've started to use it on the page 153, you can see the display in main in hong and ho̤. Assassas77 (talk) 10:05, 20 April 2018 (UTC)[reply]

Tool to clear OCR[edit]

@Suzukaze-c: : I've started a tool on WMFlabs to make the OCR proofreading faster : [2] Assassas77 (talk) 16:24, 23 June 2020 (UTC)[reply]

Very cool.
I'm satisfied with my own process though. Suzukaze-c (talk) 21:01, 23 June 2020 (UTC)[reply]

@Ngwuidak, @BlobcatsAreCool: To be honest, I am not sure if this is proper. Help:Annotating does not seem to approve of this. It's not really a part of the original book. Suzukaze-c (talk) 05:40, 17 August 2020 (UTC)[reply]

@Ngwuidak: Could I request that you follow the 用字 (kíaⁿ 子, chĭang 蹌, etc.) and hyphenation ("bô̤ lí, à kàu ŭ lí") of the original book? Suzukaze-c (talk) 02:24, 18 August 2020 (UTC)[reply]
@Suzukaze-c: I will fix it unsigned comment by Ngwuidak (talk) .

Idea: create an "annotated" version in accordance with the Help page above. The output of {{SHC|abc|字}} would be abc (), but only on the "annotated" version. On the "normal" version it would be abc.

And maybe every syllable could be a wikilink. That might be useful. Suzukaze-c (talk) 03:02, 18 August 2020 (UTC)[reply]

Title of the book[edit]

@Assassas77: "A pronouncing and defining dictionary of the Swatow dialect, arranged according to syllables and tones" is very long. This makes URLs absolutely hideous.

It seems that "Dictionary of the Swatow dialect" is the short title: the emphasis on the title page is on the phrase "Dictionary of the Swatow dialect", and the text on the header of every page is "Dictionary of the Swatow dialect". Could we arrange to have both the PDF and the transcluded pages ([[A Pronouncing and Defining Dictionary of the Swatow Dialect, Arranged According to Syllables and Tones/a|---/a]]) renamed? Suzukaze-c (talk) 07:09, 17 August 2020 (UTC)[reply]

I think the Title page can be made shorter to solve this issue. ;) indeed Assassas77 (talk) 09:38, 17 August 2020 (UTC)[reply]

Wikitext formatting[edit]

@Assassas77, @Borneanape, @DefrostedKevin, @Jiaozi17, @Takashihakase, @Truchas13108: (this appears to be everyone who has edited within the last month — did I miss anyone?)

What is your opinion on changing the underlying formatting to what is seen at User:Suzukaze-c/sandbox#Index:Dictionary of the Swatow dialect.djvu?

===chin===

; {{Swatow entry|相似|chin-chĭeⁿ|837}}
: Like, similar, resembling; as if.
; seⁿ lâi chin-chĭeⁿ;
: they are much alike.
; cò̤ lâi m̄ chin-chĭeⁿ;
: they are very unlike.

; {{Swatow entry|親|chin|991|147|9}}
: One's own; akin; kith.
; bó̤ chin;
: own mother.
; úa chin hĭⁿ thiaⁿ-kìⁿ;
: I heard it with my own ears.

Without modification it would actually look like the following:


chin[edit]

相似chin-chĭeⁿ837
Like, similar, resembling; as if.
seⁿ lâi chin-chĭeⁿ;
they are much alike.
cò̤ lâi m̄ chin-chĭeⁿ;
they are very unlike.
chin9911479
One's own; akin; kith.
bó̤ chin;
own mother.
úa chin hĭⁿ thiaⁿ-kìⁿ;
I heard it with my own ears.

and we can use CSS (Index:Dictionary of the Swatow dialect.djvu/styles.css) to change its appearance, as it appears at User:Suzukaze-c/sandbox#Index:Dictionary of the Swatow dialect.djvu.


<templatestyles src="Template:Sandbox/styles.css" />

chin[edit]

User:Suzukaze-c/T:Swatow entry
Like, similar, resembling; as if.
seⁿ lâi chin-chĭeⁿ;
they are much alike.
cò̤ lâi m̄ chin-chĭeⁿ;
they are very unlike.
User:Suzukaze-c/T:Swatow entry
One's own; akin; kith.
bó̤ chin;
own mother.
úa chin hĭⁿ thiaⁿ-kìⁿ;
I heard it with my own ears.

I believe it has multiple benefits over the current system:

  1. Clear indication in the HTML/wikitext code itself of what is Teochew and what is English;
  2. === is more meaningful for indicating HTML headers than {{center|— chim —}} (Semantic HTML);
  3. Ability for an end user to modify appearance further with CSS (one could make Teochew bold, or English red, etc.)
  4. Clearer wikitext for editors (as opposed to everything being on one very long line, or paragraphs being interrupted with {{Swatow gap}}, or placing text on new lines in ways that is actually unreliable 10342193, or forgetting about <br>).

One problem is that distinguishing ; and : by eye is troublesome, but the other benefits may outweigh this.

And as for conversion, I will use a WS:bot to change Page:s. Suzukaze-c (talk) 09:16, 28 December 2020 (UTC)[reply]

(Trying again because I suspect that the ping failed: @Assassas77, @Borneanape, @DefrostedKevin, @Jiaozi17, @Takashihakase, @Truchas13108: Suzukaze-c (talk) 01:42, 30 December 2020 (UTC))[reply]
(Message received @Suzukase-c:. We are deferring to Assassas77 and Takashihakase)
Why not, as long as we can follow the original layout with CSS. Can the layout CSS be set for all users though ? Assassas77 (talk) 13:42, 31 December 2020 (UTC)[reply]
@Assassas77: mw:TemplateStyles (I forgot to mention this) gives it to all users. Suzukaze-c (talk) 04:47, 1 January 2021 (UTC)[reply]
@Suzukase-C: (1) Do you have a suggestion for how to deal with pages that begin in the middle of an entry? For example, on Page:Dictionary of the Swatow dialect.djvu/444, the page begins with a new peng'im phrase from an 汉字 entry that began on the previous page. (2) Also note that there's a line-break after the first peng'im/translation set, while the rest of the sets are paragraphed together as expected. (3) The syllable reference pages, such as https://en.wikisource.org/wiki/Dictionary_of_the_Swatow_dialect/pau show the pre-CSS display, which is actually desirable as we do not want all peng'im/translation pairs to be wrapped into a paragraph together. However, it does display the trailing semi-colons at each peng'im. What are your thoughts about that? Thank you for your suggestions!
I guess you use the following code for headword :
; {{swatow entry|包|pau|663|20|8}}
: To wrap up, to envelope; to contain, to hold; to be included in; to be patient; to undertake; to manage an affair; to assume; to engage; to warrant; to insure; a bundle, a bale, a wrapper; plated, as with gold.
; pau cò̤ cêk pau;
: do them up in one bundle.
and the parser can't differentiate this from a page that starts with this code :
; bô pât mih kâi;
: have nothing beside.
; ŭ pât īeⁿ;
: have other kinds.
(example from here : Page:Dictionary of the Swatow dialect.djvu/444)
I think the last CSS code that you use to style the headword is the cause of the linebreak at the start of the page :
dd:first-of-type {
	display: block;
}
Assassas77 (talk) 20:19, 5 January 2021 (UTC)[reply]
Hello, @Suzukase-c: regarding Page:Dictionary of the Swatow dialect.djvu/532, I have a similar question as well. When a page starts a new section, and the section end and section begin tags are included at the beginning of that page, there are some unnecessary line breaks right at the beginning of the page, right before the section header. Would the recommendation be to put the section end and section begin tags at the end of the previous page then?
There is also a problem if I put a multicolumn break on the page when the column breaks up a peng'im phrase (and possibly a definition) from the previous column. I have opted to try putting it like the following:
; tang hṳ́ hûe cū <noinclude>{{multicol-break}}</noinclude>khi-thâu sue;
which still causes a problem on the scanned page referenced above, but makes it look okay in the dictionary proper at https://en.wikisource.org/wiki/Dictionary_of_the_Swatow_dialect/sue. DefrostedKevin (talk) 03:20, 7 January 2021 (UTC)[reply]
@Assassas77, @Borneanape, @DefrostedKevin, @Jiaozi17, @Takashihakase, @Truchas13108: Haha oops I didn't realize that there were replies and forgot about this for half a month :')
I absolutely did not think of that ambiguity in the CSS :')
Unwanted line break due to ambiguity regarding the contents of a ;/: pair:
  1. Ignore it, since the important part is the transclusion in the 'dictionary proper'
  2. Use ~something~ to indicate that the ; that contains a {{Swatow entry}} is special
    lose the brevity that was intended for the code. also apparently unfeasible
  3. Use an asterisk (technically for bulleted lists) for {{Swatow entry}} instead, eliminating the ambiguity
    preserve brevity
    maybe not good if there are bulleted lists elsewhere in the book
As for the multicolumn template ruining things (which I also did not think of):
  1. Ignore it (etc)
  2. Completely remove the manual column code and also rely on CSS to create two columns (not visually faithful to the original book in terms of column break location)
Suzukaze-c (talk) 08:06, 22 January 2021 (UTC)[reply]
Unwanted line breaks at the beginning of pages (the wikitext rendering engine is such a joy):
  1. Ignore it (etc)
  2. Directly use the equivalent HTML tag <h3></h3> (three equals signs = level 3 = h3)
Suzukaze-c (talk) 08:17, 22 January 2021 (UTC)[reply]
Possible solution for Unwanted line break 1 using different symbols:
Page:Dictionary of the Swatow dialect.djvu/532
* {{Swatow entry|萱|suang|231|140|9}}
* A species of day-lily; it is said that if a woman carries it she will bear a son.
; chun suang pĕng mŏng;
: both parents are in good health.
; suang-cháu, cîah lío, ŏi kói nâng kâi iu-bun;
: the day-lily when eaten will cause one to forget his sorrows.
Suzukaze-c (talk) 08:47, 22 January 2021 (UTC)[reply]

@Assassas77, @Borneanape, @DefrostedKevin, @Jiaozi17, @Takashihakase, @Truchas13108, @Suzukaze-c: Hello Suzukaze-c, most of us are part of a Discord group <https://discord.gg/5sNj9Vs4J6>, where we have discussion on the wikisource project. You are welcome to join us! unsigned comment by Takashihakase (talk) .

note on colons[edit]

If there is a colon in the romanization or English, you have to force the wiki engine to not recognize it as the "formatting" colon:

; úa thí-thiap lṳ́, lṳ́ īa tîeh thí-thiap úa: jī-ke tîeh sie thí-thiap cìaⁿ hó̤;
: I consider your wishes, and you must consider mine also: all will be well if both are accommodating.
úa thí-thiap lṳ́, lṳ́ īa tîeh thí-thiap úa
jī-ke tîeh sie thí-thiap cìaⁿ hó̤;
I consider your wishes, and you must consider mine also: all will be well if both are accommodating.
; úa thí-thiap lṳ́, lṳ́ īa tîeh thí-thiap úa: jī-ke tîeh sie thí-thiap cìaⁿ hó̤;
: I consider your wishes, and you must consider mine also: all will be well if both are accommodating.
úa thí-thiap lṳ́, lṳ́ īa tîeh thí-thiap úa: jī-ke tîeh sie thí-thiap cìaⁿ hó̤;
I consider your wishes, and you must consider mine also: all will be well if both are accommodating.

Suzukaze-c (talk) 10:08, 24 January 2021 (UTC)[reply]