Jump to content
Chinese-forums.com
Learn Chinese in China

  • Why you should look around

    Since 2003, Chinese-forums.com has been helping people learn Chinese faster and get to China sooner. Our members can recommend beginner textbooks, help you out with obscure classical vocabulary, and tell you where to get the best street food in Xi'an. And we're friendly about it too. 

    Have a look at what's going on, or search for something specific. We hope you'll join us. 
Guest Yau

Indexing of chinese characters

Recommended Posts

Guest Yau

In the long and exhausting debate on the "Characters are objectively harder, even for Chinese" (i can't finish it all), demoser raised an interesting problem with chinese on indexing. It's in fact a problem that always upset me, but i believe demoser's argument is overstated for the sake of debate.

Yes, I never lookup a chinese name in a map unless it's emegency.

I'm also frustrated at counting the number of strokes for getting a word in a dictionary.

It's even worse when i have to pick up a chinese name from a big list. I do that with my pinyin name all the time.

However, while a chinese input method is getting popular, the way it dismantles a character can serve as a way of indexing too.

By a cangjie input method, a character is usually divided into 1-5 parts, depending on the complexity of the characters. Every part can be represented by around 20 codes. (e.g. 金、木、水、火、土) and these 20 codes can serve as indexing.

For comparing it with pinyin indexing, this method is absolutely intuitive and you can neglect all the phonics behind that. That's what I always insist that the best chinese indexing should be detached from the sounds, because this element doesn't appear on a character (forget about the impractical onomatopoeia.)

Sure there are some obstacles.

You have to learn an ascending order of all the codes, but i can't see any difficulty to do it.

You also have to learn the way to dismantle a character, i took it for around 30 mins to understand the rules when I was 14, and I think it's already good enough for indexing if you can remember the order of the codes. (to master the typing is another story)

All these problems aren't difficult to solve. In fact there's even more simplified way to dismantle a character. A chinese can be divided into 9 big classes with Jiufang and zhongheng input methods (the former one is even wonderfully easy, i learnt it for 3 mins when a salesman showed it to me. )

Some may comment Cangjie input method isn't logical enough to divide a character, it isn't a problem, we can use another well known logical way, like Zhongheng.

However, does my suggestion solve the indexing of characters? No.

THE PROBLEM with indexing is related to non-standardization. There are dozens of input methods in china. Pinyin for putonghua speakers, Zhuyin for taiwanese, And there's also MaoXiaMi and WuBi. All these are not interchangable. Even worse, they are all popular and you can't expect any of them will be out in the market. In other words, there'll be no standard of indexing, and no publisher will be brave enough to do indexing with any methods.

My point is that a chinese character doesn't hinder itself from indexing, but chinese people themselve.

It may not convince an alphabetic language speakers that it's an easy way to make chinese "indexable". They may find A,B,C system much easier to remember, but it's not my case. When i was a student, i saw A....Z order was simply a random combination of 26 unrelated sounds and I had to sing the alphabet song( ABCDEFG HIJKLMNOP ) to make a correct order.

Share this post


Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

nnt
no publisher will be brave enough to do indexing with any methods
.

I think this can be solved by extensive usage of computers.

Multi-indexing is very common in databases, and it "just" takes some effort to make a list of characters to correspond to IMEs entries.

Multiple IME already exist in Chinese/Japanese PC environments.

Another way of indexing is the Unicode coding (for uniqueness of indexes).

Mobile computer/phone/gamestation all-in-one is the solution...

Share this post


Link to post
Share on other sites
Quest

you don't “斩脚趾避沙虫”, there are better ways to correct or patch the inefficiencies than to change the whole writing system.

Share this post


Link to post
Share on other sites
ala

It is a bad idea to use a non-phonetic indexing system.

It is difficult to recall the shape of the character without writing it. Especially the nature of the stroke. It will be slow and exhausting. In the end, one just memorizes the combinations of Cangjie, which is the last thing anyone needs. There is a reason why Zhuyin and Pinyin are the most popular forms of input in the Mainland and Taiwan.

It is a good idea to have some phonetic forms of indexing. Suppose you don't know the first character of a word, but know its pronunciation? This occurs very often for me btw in the sciences.

Why not just use pinyin ci with stroke number to index? If pinyin for the words are identical, then stroke number is taken into consideration. Pinyin arranged in English alphabet order. Tones are arranged 5 1 2 3 4 order within the alphabet order. The problem is solved.

Share this post


Link to post
Share on other sites
nnt

Indexing is not only for displaying results: it's also for searching.

In Sursong.ttf font, there are 40000 rare characters you don't know how to pronounce, but you know how to split them into parts.

Multiple indexing is the solution, for displaying and searching.

Share this post


Link to post
Share on other sites
Guest Yau

I think nnt states a core characteristic of chinese word that its sound isn't stored in itself. A long-term indexing method has to stick with the traits appeared on what we are looking for. It's a character, not a sound.

Ala's suggestion may be good for today's mandarin speaker, but it can't prevent itself from being obsolete when the pronounciation changes. Adaptability just doesn't exist.

What also concerns ala might be the orders. But it turns me to another question: Is A..Z order really easier to remember? Why can one remember the 26 unrelated sounds perfectly, but fail to remember the order of "character radicals"? It can't convince me. Don't forget Chinese is traditionally good in creating a peom to link up something apparently unrelated. (飛雪連天射白鹿, 笑書神俠倚碧鴦, all characters represent the first character of the name of Jinyong's 14 novels, is an example.)

Share this post


Link to post
Share on other sites
nnt

Just have a look here:

http://www.nomfoundation.org/nomdb/lookup.php

and you'll know what multiple indexing is...

As many Chinese characters are included "as is" in Nôm, you can try searching by any method shown.

Ordering by alphabetic order (Mandarin, Cantonese, Vietnamese) is no problem once you have all the indexes.

Share this post


Link to post
Share on other sites
hparade

cangjie is really not so logical IMO, the ways it divide characters into parts are just odd

Share this post


Link to post
Share on other sites
ala
and you'll know what multiple indexing is...

Multiple indexing is expensive, time consuming, and not feasible for a 50 page nonfiction book made of paper. I would much rather have pinyin indexing, than no indexing in such cases.

Ala's suggestion may be good for today's mandarin speaker, but it can't prevent itself from being obsolete when the pronounciation changes. Adaptability just doesn't exist.

Pronunciation aren't changing that fast for standard Mandarin, and when small changes do occur, nothing is interfered as in-demand books presumably have revised editions and prints. And old books? You can just learn the pinyin system of those times (pronunciation changes tend to be systematic and exhibit obvious patterns). Again, better than a purely non-phonetic system of indexing and much better than nothing.

Is A..Z order really easier to remember? Why can one remember the 26 unrelated sounds perfectly, but fail to remember the order of "character radicals"? It can't convince me.

They are completely different. One is phonetic, the other is completely graphical. Phonetic means I can just say it to myself, and arrange as necessary, that's how small American children learn to use a dictionary. Graphic, I'll actually have to picture the character in my mind, the strokes, the radicals, that have nothing to do with the sounds of the words. Language is fundamentally phonetic. In phonetic systems, you have 2 reinforcing methods of sorting: phonetic and graphic. In graphic system of indexing, you have only one: graphic. You tell me which is more efficient for indexing (and retrieving) human language and words.

Share this post


Link to post
Share on other sites
skylee
cangjie is really not so logical IMO, the ways it divide characters into parts are just odd

That's right. :clap

Share this post


Link to post
Share on other sites
pazu
It is difficult to recall the shape of the character without writing it.

You have mixed up two different concepts.

Pinyin is popular because it's easy and doesn't require much extra learning, but it doesn't mean that a character shape is difficult to recall by not writing it. It's easy but still requires 30 minutes for a 14-year-old boy (how about girl?) to do it, while Pinyin requires 30 seconds (spend 30 secs to tell you "v" => nü ?)

A structure-based input method (IME) is more effective than a phonetic-based one because it detaches itself from the sound, which is an important characteristics of Chinese characters.

Using Cangjie, it's easy to type Beijinghua, Guangdonghua, Shanghaihua (if you know the characters, and if the characters are implemented). Once I have tried typing a simple Cantonese phrase using Mandarin, 只係打一陣就頂唔順 (zhi1 xi4 da3 yi1 zhen4 jiu4ding3 wu2 shun4 )...

hparade: yes Cangjie is odd, and sometimes it's not very logical (very few exceptions indeed), but most importantly, IT WORKS (this is important). I can type more than 100 Chinese characters a minute quite easily.

My Vietnamese friends (Chinese teachers) were stunned because they have never heard a structure-based IME!```` okay, here's another problem.... one day I went to a Chinese school in Vietnam to help them installing a Chinese software, I couldn't find the headmaster (my uncle, who asked me to do this for him), so I asked another teacher to switch on the computer for me, she couldn't really understand me, and thought I wanted to "LEARN" how to type Chinese here! So she spent about 4 or 5 minutes to open quite a few softwares and prepared to teach me to input Chinese, she typed "nihaoma" and said, "oh see, this is easy, isn't it?" I stopped her and showed her the Cangjie IME, she was stunned, then everybody was stunned then.... problem was, they asked me to teach them this IME, which could be quite difficult. Oh wait, if it was so easy as Mr Yau has claimed, why should I be so reluctant to teach them?

Share this post


Link to post
Share on other sites
nnt

Still, there are characters you know how to pronounce, not exactly how to write.

There are characters you see, but you don't know how to pronounce.

You may target an international audience for a paper book, needing two or three indexes of the same list of characters (not of all words in the the book, of course), most often in alphabetical orders (hiragana, Pinyin, etc...)

Even paper books are now designed and composed using computers. Computers and software are more and more powerful and versatile. Indexing is just a small part in publishing...

Share this post


Link to post
Share on other sites
Guest Yau
In phonetic systems, you have 2 reinforcing methods of sorting: phonetic and graphic. In graphic system of indexing, you have only one: graphic. You tell me which is more efficient for indexing (and retrieving) human language and words.

I think you didn't consider about nnt's concern on the fact that the sound of chinese characters can be vague. How do you figure out the sounds of a character? We can't do it based on what we see. In fact, chinese characters never function well in phonetics and her aloof from it makes it possible for all dialects to share the same set of characters.

Share this post


Link to post
Share on other sites
Guest Yau
hparade wrote:

cangjie is really not so logical IMO, the ways it divide characters into parts are just odd

That's right.

yes, sure it's. The criticisim on cangjie is usually driven to their ways to classify a character, but not its methodology.

Cangjie showed a milestone in a grahpical methodology to dismantle a chinese character and it immediately gained its popularity. Though there were better and logical methods coming, most couldn't go beyond the boundary set by Cangjie. The rest of them are either phonetic or pen-based, but their efficiency tremendously falls behind its graphical counterparts by evidence.

It's also a disprove to ala's suggestion. While phonetic method is popular (have no idea if it's more popular or less popular), it's not because it's more efficient or more intuitive, but because it's just too easy to learn.

What we have to deal with graphical dismantling method is that: is it really difficult to learn chinese IME?

Share this post


Link to post
Share on other sites
ala

I don't think I have mixed any concept. I wasn't talking about IMEs or Chinese input methods. And it is ridiculous to say Chinese characters are detached from sound. And a phonetic system is more intuitive because language is foremost PHONETIC.

I said it was more efficient to index and retrieve from a phonetic order than a graphic order, because in phonetic system you can index/retrieve based on shape/parts of a word AND sound. Whereas in graphic, you order based only on shape/parts of a word/character. Unfortunately Chinese characters disable some of the shape/parts advantage when using pinyin to order (in the sense that you are ordering pinyin of characters rather than characters). Nevertheless you can't say that a completely graphic system of indexing is just like an alphabet order, because they incorporate completely different ordering mechanisms.

My suggestion is that dictionaries, reference materials carry two indices (phonetic and graphic), and non-fiction books, textbooks carry a pinyin index. For the moment, paper books are still forseeable to be in style for the next 20 years.

Share this post


Link to post
Share on other sites
geek_frappa
yes, sure it's. The criticisim on cangjie is usually driven to their ways to classify a character, but not its methodology.

cangjie is not well organized,

this is an excellent point you have discussed here. thank you for this post.

chinese IME was created by an older generation of programmers.

the next generation of chinese IME will be better and will use the extensibility of cangjie as a starting point...

Share this post


Link to post
Share on other sites
hparade

characters are partly phonetic, although usually not obvious, and some sound part no longer resemble the modern sounds, but to say characters are detached from sounds is really not correct...

Share this post


Link to post
Share on other sites
pazu

When I said Chinese Hanzi detaches itself from sounds, I don't mean it contains no phonetic parts (well... just refer to "that thread" and check it out) but the fact is that a character stores a sound extrinsically rather than intinsically, a sound is stored by convention rather than by deducibility,

and this is the basic reason why the simple "一" (one) could be yat, yi, nhat, ichi, and this is also the basic reason why a Chinese could communicate with a Japanese in Hanzi while they can't understand each other's spoken language.

And the suggestion that a phonetic-based classification gives two reinforcing methods was nonsense. A phonetic-based one gives you "two reinforcements" only because the Hanzi was written next to the pinyin, not because a graphic image of Hanzi could be recalled by pinyin, and indeed you're contradicting yourself if you're saying this. Indeed a structure-based classification can give you 10 (if you like) reinforcement, you can easily make an index like , "structure-code" + Hanzi + pinyin + ... + ... + ... !

Share this post


Link to post
Share on other sites
ala
A phonetic-based one gives you "two reinforcements" only because the Hanzi was written next to the pinyin' date=' not because a graphic image of Hanzi could be recalled by pinyin, and indeed you're contradicting yourself if you're saying this. Indeed a structure-based classification can give you 10 (if you like) reinforcement, you can easily make an index like , "structure-code" + Hanzi + pinyin + ... + ... + ... !

[/quote']

What? That's not what I mean at all. I wasn't even talking about Chinese characters, but just phonetic ordering in general. The two reinforcements (graphic + phonetic) are inherent in all alphabetized scripts. The graphic effect would obviously be reduced when applied to pinyin ordering of Chinese characters, since we are ordering the character's pinyin rather than the character directly.

Nevertheless, I still believe that Chinese characters are just an extended, complex syllabary, and are intrinsically phonetic as well. And because of this, if a non-dictionary publication can only have one index, let it please be based on pinyin.

Share this post


Link to post
Share on other sites
pazu

Oh yes then a perfect "index" of Chinese should be an abolition of Hanzi probably in your opinion. :wink: hor hor.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...