From Wikisource
Jump to navigation Jump to search

This is a test page for checking the placement of the Hebrew diacritics Meteg (מֶתֶג meteg) and Patach (פַּתַח pataḥ). The Meteg is the little vertical line, usually for marking long vowels or stress, and the Patach is a little horizontal line marking a vowel [a] below a base consonant letter. Both are placed under the first letter from the right. Their logical ordering is supposed to reflect their visual placement: If Meteg comes first, then it's supposed to be placed to the right of the Patach, and vice versa.

It's highly recommended to install the Free Taamey Frank CLM font for proper viewing of this page.

Meteg first: הַֽבְרָכָה

Patach first: הַֽבְרָכָה

In the following line the Meteg (מֶתֶג) is inserted before the Patach using the template {{מתג}} (named without diacritics), so it is not "normalized" by the MediaWiki software, and is therefore displayed correctly

Meteg first, with template: הַֽבְרָכָה

However the Patach character that follows (to the left) the template inclusion combines into a single grapheme cluster with the last closing brace of the previous template inclusion (to the right), so it is not easy to select the template alone in text editors without also selecting the following Patach. Ideally a brace encoded immediately before hebrew diacritics should not be considered a single grapheme cluster or editors, but this behavior is conforming with the standard Unicode character model which describes how to break default grapheme clusters, used in most text editors (including input forms in web browers), because the combining characters are implicitly associated with a base character (otherwise they create "defective" sequences). To solve this input problem in text editors we could use a template as well for each one of the combinining character that may be following the CGJ (here for the Patach).

Note that MediaWiki parsing is not affected by this problem and isolates the closing brace (used in its syntax for template transclusion) from any character after it, even if the later is a combining character, so it will break in the middle of a default grapheme cluster. It is not recommanded in MediaWiki to input any text field as a defective combining sequence, i.e. all text elements should start with a base character, not by a combining character. Using syntax tricks (including HTML character entities) for inputing isolated combining characters may help.

Note : the normalisation cannot be realiably avoided. Their logical ordering is not respected simply because the Meteg has been historically a higher combining class than the Patakh. For Unicode compatibility, the standard way to preserve their ordering so that the Meteg occurs before (to the right) of the Patach is to insert a Combining Grapheme Joiner (CGJ) between them. This character is ignorable in collation, after the normalization process (that still remains strongly recommended).
There's no way to have fonts assume that normalization will not occur as this can occur directly as part of various transformation or preparation processes prior to looking up into OpenType/TrueType layout tables.
It is perfectly normal (not a bug) that the sequence <Meteg, Patach> and <Patach, Meteg> are considered as canonical equivalents (exactly like in the first two examples above), even if their normalized order is not logical for the most common case where it should have better been encoded and normalized as <Meteg, Patach>. However, the Unicode relative values of combining classes are not changeable (they are stabilized since long ; a lot has been discussed since years about the initial incorrect relative order of these combining class values ; the main problem here comes from the fact that these combining classes HAD to be distinct ; initially it was not expected when the concept of combining classes were created that distinct combining classes would still interact visually in a meaningful way).
So the best is to use Hebrew fonts that have support and mapping for the CGJ to work with the current standard and independantly of the fact that a normalization has or has not occured (the Taamey Frank CLM font you propose does not support this, so this is not a long term solution; there are other compliant fonts).
But the other problem is to convince Wikimedians to enter these CGJ where needed. This requires support for entering it, either on the existing physical keyboards with keyboards drivers, or withing a visual character map, or using a Mediawiki plugin.

In the following line the Meteg is inserted before the Patakh using the template {{מתג}} and after it the {{CGJ}}:

Meteg first, with template: הֽ͏ַבְרָכָה

With CGJ, the normalization "issue" is no longer an issue. In the previous example, you see that a CGJ is entered using a template
This solution is not very helpful and very ugly to edit due to the Bidirectional algorithm where you are mixing Hebrew letters and Latin letters for the name of the CGJ template, and probably it should be entered directly without needing a substitution template.
But a better solution would be that the template should be named using Hebrew letters (just create a redirect for this alternate name) to avoid the ugly reordering caused by the normative Unicode Bidi algorithm or UBA (including the uglily mirrored braces around the template call, because the UBA initially did not have the currently proposed UPA extension to match pairs of parenthese-like characters)... This would make the input much easier to interpret and easier to edit.


This demonstrates that incorrect normalization of Dagesh may affect the display of a letter which has a Dagesh, a vowel and an accent:

Nun + Dagesh + Kamatz + Metegאָֽנָּֽה

(broken on XP and GNU/Linux, good on Windows 7)

Nun + Dagesh template {{דגש}} + Kamatz + Metegאָֽנָּֽה

(good on XP, GNU/Linux and Windows 7)


This is a similar problem with Shin and Sin dots (which are normally encoded after the base Shin letter).

Shin + Shin dot + Kamatz + Ethnahtaמָשָׁ֑ל

(broken on XP and GNU/Linux, good on Windows 7)

Shin with Shin dot template {{שין ימנית}} + Kamatz + Ethnahtaמָשָׁ֑ל

(good on XP, GNU/Linux and Windows 7)

Note that normalization to NFC solves the problem by precombining the base Shin letter with the sin or shin dot. Most Hebrew fonts and Hebrew keyboards support the correct input and rendering of these two precombined letters.