Length-based Cantonese Romanisation

February 7, 2015 at 06:32 AM

@imron Pīnyīnput for Mac?

February 7, 2015 at 06:44 AM

Funny you should ask. I'm actually planning on launching a Kickstarter for this shortly.

February 7, 2015 at 06:51 AM

Hahahahaha

That link you shared with the comics, well I pressed the random button and got this

http://xkcd.com/628/

February 7, 2015 at 06:56 AM

BTW I can share your Kickstarter campaign if you share my 100% phonetic, Cyrillic-based, Cantonese input method (using Pīnyīnput).

February 7, 2015 at 08:41 AM

I'm confused... How is this any different from existing romanizations? Yale and Jyutping also account for vowel length do they not? It also seems a bit redundant to have a "uu" nucleus but have no relation to the "u" nucleus... Also there doesn't seem to be a natural extension of "o" to "oo" or "oa" or "oe", so why use the letter o for any of the following anyway? I'm with Hofmann, wouldn't it just make more sense to have a marking that shows long/short rather than creating an entirely new branch of combinations?

I think the tone marks are interesting, but frankly any romanization system, as long as it is systematic, should do. It's certainly easy enough to read your romanization but that's because it's so similar to what already exists.

February 7, 2015 at 01:09 PM

We have the letter њ in Macedonian, so њ for /ŋ/

But surely њ is /ɲ/ and not /ŋ/.

Or maybe you mean using њ as a substitute for ŋ, and I misunderstood?

February 7, 2015 at 04:06 PM

I'm confused... How is this any different from existing romanizations? Yale and Jyutping also account for vowel length do they not?

Not orthographically, no.

It also seems a bit redundant to have a "uu" nucleus but have no relation to the "u" nucleus...

~~The letter 'u' cannot exist in the nuclear position unless it is doubled. A single 'u' exists only as a coda, which parallels the single 'i' and 'y' in the coda position.~~

The nuclei 'u' and 'uu' are both rounded vowels, and that's where the similarities end. Initially, there was no nucleus 'u' (as in Y. R. Chao's analysis), but I revived it because [ɵ] is equidistant from both [e~ɪ] and [o~ʊ], and thus sufficiently distant in quality to isolate. Since the nuclei 'u' and 'uu' are not compatible with the coda 'u', there is no conflict.

Wouldn't it just make more sense to have a marking that shows long/short rather than creating an entirely new branch of combinations?

The tone marks are the only markings that I want on the nuclei, both for the sake of legibility and for ease of typing (especially considering atonal IMEs). As I mentioned earlier, the IME maps the letter Eng to the 'r' key, and that is as complex as the typing process should be.

Also there doesn't seem to be a natural extension of "o" to "oo" or "oa" or "oe", so why use the letter o for any of the following anyway?

It's actually the reverse; the long vowels ~~'oo', 'oa', and 'oe'~~ 'oo' and 'oa' (brood/broad) are condensed into the short vowel 'o'. This follows the same pattern as the other long-short vowel relationships: 'ee' and 'ae' (breed/aerial) are condensed into 'e', and 'aa' (bazaar) is condensed into 'a', whilst 'uu' (vacuum) has no short vowel counterpart ~~(the single 'u' is exclusively a coda)~~. The short vowel counterpart of 'oe' (Schoenberg) is 'u', which doesn't conflict with the coda 'u' because it only occurs in the nuclear position. The basic rule is: a long vowel's digraph [usually] contains its short-vowel counterpart. Nuclear digraphs should be considered fixed, unbreakable units in their own right.

There are two main schools of interpretation when it comes to the length of Cantonese vowels:

The first school (Yale and Jyutping) defines the following long-short pairs, each comprising two allophones, with the exception of 'aa'/'a' (and 'oe'/'eo' in Jyutping*):

[aː]/[ɐ], [ɛː~e], [iː~ɪ], [ɔː~o], [uː~ʊ], [œː~ɵ]*, and the isolated long vowel [yː].

The second school (Y. R. Chao) defines a set of three short-vowel phonemes ('a', 'e', and 'o') mirroring a pyramid of six phonetic realisations, wherein each layer represents a group of allophones belonging to a given short-vowel phoneme.

<a> ---[ɐ]---

<e> --[e~ɪ]--

<o> [o~ʊ~ɵ]

I primarily subscribe to the second school, while also incorporating [most of] the 1:1 long-short relationships of the first school. Unlike the second school, however, I've separated [ɵ] (as 'u') from [o~ʊ] because they are adequately distant on the IPA vowel chart. ~~I've thus arrived at a pyramid of long vowels whose three layers each have a corresponding short vowel.~~

~~<a> ----<aa>----~~

~~<e> --<ee/ae)>--~~

~~<o> <oo/oa/oe>~~

~~(each short vowel is embedded in its corresponding long vowel)~~

As you can see, I've thus far constructed every nuclear vowel using only the letters 'a', 'e', and 'o', with only the isolated long vowel [y] remaining. This allows me to use the letters 'i', 'u', and 'y' as codas, since they will remain visually distinct within vowel clusters (e.g. 'oai', 'aeu', etc). As the letter 'u' is not otherwise used in the nuclear position, I may comfortably assign 'uu' to [y], which can neither accept the coda 'u' nor produce a corresponding short vowel (it would otherwise be conflated with the single 'u'). The codas 'i', 'u', and '(u)i' are shorthand (and theoretically shortened) forms of the phonetically equivalent vowels 'ee', 'oo', and 'uu' respectively (i.e. all of the digraphs with repeated letters except for 'aa'). Similarly, the coda (and initial) 'ŋ' is the shorthand form of 'ng', which is used as an autonomous nasal syllable (and nucleus in 'hng'). ~~When the initial 'y' and coda 'y' occur within the same syllable, there is always a nucleus between them, and thus no confusion (as with the nasal and stop codas).~~ Although the coda 'i' has two possible realisations–'ee' and 'uu'–the latter form only occurs after the nucleus 'u', which cannot be followed by the former form. As the letter 'i' can never exist in the nuclear position, it precludes some major headaches with tone diacritics; 'a', 'e', 'o', and 'u' are all beautifully compact and symmetrical. ~~It's an additional perk that all of the codas are equivalent to their (broad) IPA values as well.~~ The coda 'u' may not follow the nucleus 'u', nor may it follow the nucleus 'uu', and thus no conflicts arise. All short-vowel diphthongs and compounds are fixed in this system; each of the short vowels–except for 'a'–have exactly three possible codas, allowing for a rather symmetrical table of short-vowel finals.

The vowel-shortening pattern is thus as follows:

(short in quality and length)

aa [aː] >>> a- [ɐ]

oa [ɔː] >>> o- [o~ʊ]

oe [œː] >>> u- [ɵ]

ae [ɛː] >>> e- [e~ɪ]

(short in length only)

ee [iː] >>> -i

oo [uː] >>> -u

uu [yː] >>> (u)i [y]

For these reasons, I feel that my system is phonemically robust and systematic whilst maintaining intuitive lengths and English analogues. Although I've emphasised the logic behind the nuclei and codas, the initials also contribute to length representation (thanks to the letter Eng), as each initial phoneme is represented by a single letter (the 'w' in the initials 'gw' and 'kw' is considered a medial). The tone diacritics take it a step further by appropriately blending with the nuclei, rather than sitting at the end of a syllable (as with tone numbers).

February 7, 2015 at 07:49 PM

Why not use these diacritics?

1. ā

2. á

3. a
4. a̖
5. a̗
6. a̱

They may be hard to type, but so are yours.

February 7, 2015 at 08:06 PM

Why not use these diacritics?
1. ā

2. á

3. a

4. a̖

5. a̗

6. a̱

Aren't those precisely the ones I use?

I guess you might have been thinking of my initial posting, which I had updated yesterday. Sorry for the confusion!

February 7, 2015 at 09:45 PM

Yes, I didn't look at that again.

February 10, 2015 at 08:14 AM

But surely њ is /ɲ/ and not /ŋ/.

Or maybe you mean using њ as a substitute for ŋ, and I misunderstood?

Right. In Macedonian, њ is used for /ɲ/ .

However, if we are taking about a 拼音字母 for Cantonese, it will be convenient to use њ for /ŋ/ .

February 10, 2015 at 08:42 AM

For these reasons, I feel that my system is phonemically robust and systematic whilst maintaining intuitive lengths and English analogues. Although I've emphasised the logic behind the nuclei and codas, the initials also contribute to length representation (thanks to the letter Eng), as each initial phoneme is represented by a single letter (the 'w' in the initials 'gw' and 'kw' is considered a medial). The tone diacritics take it a step further by appropriately blending with the nuclei, rather than sitting at the end of a syllable (as with tone numbers).

I like how Evans chose to represent the medial w:

-w- (a dot after the syllable) ᐤ

http://en.wikipedia.org/wiki/Canadian_Aboriginal_syllabics

If I were to create a writing system for Cantonese, it will be an abugida, not an alphabet.

Your romanization system does not look that different from existing systems, as 陳德聰 pointed out. Your system has fixed certain issues, but beyond using it to transcribe Cantonese names, I don't think people will actually use it.

February 10, 2015 at 09:00 AM

Also,

I would use ii instead of ee for the long nucleus [iː]

(ee messes up with the phonetic nature of the writing system)

then use j for the coda i

then use a different letter for the initial j (maybe Cyrilic ѕ, or Jyutping z - when it comes to this choice of symbols, I prefer Jyutping over your idea)

February 10, 2015 at 08:35 PM

I would use ii instead of ee for the long nucleus [iː] (ee messes up with the phonetic nature of the writing system) then use j for the coda i

If this were an IPA-based system, I would agree. However, the aim of this system is to be biased in favour of English (otherwise, Broad IPA is ideal). From an English orthographical standpoint, 'ee' is preferable.

then use a different letter for the initial j (maybe Cyrilic ѕ, or Jyutping z - when it comes to this choice of symbols, I prefer Jyutping over your idea)

The letter 'z' almost exclusively implies the sound [z] in English, which doesn't exist in Cantonese. Also, Cyrillic letters fall outside the scope of my Latin-based Romanisation.

I chose the letters 'j' and 'c' primarily for phonemic (not purely phonetic) reasons. In English, there are the following pairs of sounds: 's/z', 'sh/zh', 'ts/dz', and 'tsh/dzh'. In none of these four pairs do the letters 'j' and 'c' occur, albeit 'j' and 'ch' are used orthographically to represent the fourth pair. In Cantonese, the pair 'j/c' represents sounds in between the ts/dz and tsh/dzh pairs of English (if you ignore the voicing distinction), and as I demand a 1:1 letter-to-phoneme ratio for initials, 'j' and 'c' are the best candidates to represent the aforementioned Cantonese allophones. I can easily explain it to English speaking students thusly: 'j' represents a sound varying between what is spelt in English as 'j' and 'dz', whilst 'c' represents a sound varying between what is spelt in English as 'ch' and 'ts' (similar to the Italian 'c' before 'i' and 'e'). As with Hanyu Pinyin, 'z' is fine for native orthographies (it's all relative), but for an English-based system, it is misleading.

February 10, 2015 at 10:00 PM

I can easily explain it to English speaking students

I am sorry, but I don't agree. An IPA-like system will be easier for students to learn, no matter their background. If you tell a student to write i for /i/ and then ii for /i:/, it will be easy for the student to learn it. Then ei for /ei/ will also make sense. Can you convince me that I am wrong?

I am not saying English orthography is bad. I love English and I love being able to communicate with this many people. However, there is no need to stick to orthography that does not work that well and is not easy to learn.

In Macedonian, we use с for /s/. There is no confusion between 'ch' and 'ts' and it has been used that way for centuries. http://en.wikipedia.org/wiki/Sigma

Plus, don't worry, it's not like English-speaking people will forget how to spell English words properly after they learn an IPA-like romanization system for Cantonese. I am switching between English, Macedonian and Chinese every day. I don't think my spelling has been affected.

February 10, 2015 at 10:46 PM

Frankly speaking, I think that English has the worst orthography in the history of human language. Phonemes and letters do not match well, etymological features are often obscured, it's inconsistent, and sometimes arbitrary too.

That being said, my Cantonese Romanisation system was crafted from this bastard orthography to suit the intuition of those literate in English. It has an admittedly limited and specialised scope. For example, in the cases of 'ee' and 'oo', 'seen' and 'soon' would elicit more accurate pronunciations than 'sin' and 'sun' from unprepared native English speakers.

For a serious student from any linguistic background, I would recommend either Broad IPA or Cantonese Phonetic Symbols (the latter doesn't conflict with orthographic intuition).

February 10, 2015 at 11:01 PM

That being said, my Cantonese Romanisation system was crafted from this bastard orthography to suit the intuition of those literate in English. It has an admittedly limited and specialised scope. For example, in the cases of 'ee' and 'oo', 'seen' and 'soon' would elicit more accurate pronunciations than 'sin' and 'sun' from unprepared native English speakers

Well, why don't you change that and make unprepared native English speakers prepared? It's not impossible.

a is /a/, e is /e/, i is /i/.

Why should only serious students of linguistics be allowed to know about the IPA? Have you read Saussure (although he did not write the book himself)? He wanted to reform French spelling. Just like Lu Xun with Chinese. Both of them failed.

I told my professor of 文字學: "I taught myself how to read and write in one day, how long did it take you to learn how to read and write Chinese?"

My professor said: "I am still learning."

February 10, 2015 at 11:10 PM

If the student in question is willing to be prepared, then they might as well learn Broad IPA and/or CPS, which was my point.

February 10, 2015 at 11:19 PM

You can start by introducing ii for [i:] to your student and see where it goes from there.

February 11, 2015 at 12:55 AM

Why should only serious students of linguistics be allowed to know about the IPA?

It's not about being allowed, it's about being bothered. Many language students won't see the point.

Sign In

Length-based Cantonese Romanisation

Recommended Posts

Angelina

Link to comment

Share on other sites

imron

Link to comment

Share on other sites

Angelina

Link to comment

Share on other sites

Angelina

Link to comment

Share on other sites

陳德聰

Link to comment

Share on other sites

renzhe

Link to comment

Share on other sites

ParkeNYU

Link to comment

Share on other sites

Hofmann

Link to comment

Share on other sites

ParkeNYU

Link to comment

Share on other sites

Hofmann

Link to comment

Share on other sites

Angelina

Link to comment

Share on other sites

Angelina

Link to comment

Share on other sites

Angelina

Link to comment

Share on other sites

ParkeNYU

Link to comment

Share on other sites

Angelina

Link to comment

Share on other sites

ParkeNYU

Link to comment

Share on other sites

Angelina

Link to comment

Share on other sites

ParkeNYU

Link to comment

Share on other sites

Angelina

Link to comment

Share on other sites

imron

Link to comment

Share on other sites

Join the conversation