Learn Chinese in China

Hanzi Visual Similarities - minimizing the pain of learning hanzi


I spent a few days playing with a table of 6700 Chinese characters and observing their visual/phonetic similarities in a desperate search for some facilitation of my own study of characters (I know 1200 of them after one year)

The sinoling.com table of characters gives a hanzi sequence for each pinyin reading, for instance: BI bī 逼 bí 鼻荸 bǐ比笔彼鄙匕俾妣吡秕舭 bì 必毕币秘避闭壁臂弊辟碧拂毙蔽庇璧敝泌陛弼篦婢愎痹铋裨濞髀庳毖滗蓖埤芘嬖荜贲畀萆薜筚箅哔襞跸狴

I took each pinyin reading (throwing away the tones for the reason of simplification) and re-grouped the characters for their visual similarities to become apparent. The above example for BI-characters after my rearrangement looks like this:











The level of similarity was decided by myself and may have absolutely no linguistic and etymological justification.

Having done this with all 6700 characters in the table, I ended up with a list of generic characters (or rather quasi-generic) and their corresponding tail sequences of "derived" characters. When no visual similarity was found I called that character orphan and put it aside. I also analyzed the orphans and if possible I attached them non-phonetically to existing sequences - eventually reducing their number to a little over two hundred. Finally, I had 1432 sequences and 217 orphans sieved from those 6700 characters. I throw away some of them to optimize for learning the first 4000.

I found that in order to read or systematically learn the first four thousand characters I need to make myself familiar with only 848 generic characters and 217 orphan characters. Synergically, in learning this group of 1065 characters I will in fact be prepared for recognizing another 4500 characters which is a total of 5500+!

I generated a XLS file with my master database of all re-arranged characters as well as the two basic lists. The file can be downloaded from http://otaflegr.com/chinese/hanzi-similarities.zip (101 kB)

For hanzi translations in my lists I made use of the files on http://lingua.mtsu.edu/chinese-computing/statistics/

I took each pinyin reading (throwing away the tones for the reason of simplification) and re-grouped the characters for their visual similarities to become apparent.

I think what you've done is mainly to identify 'sound loan' characters that are still valid under modern mandarin pronounciation.

If there's someone out there who knows how chinese pronounciation has changed since the period when characters were invented, you might well be able to find more relationships that would help people remember characters.

I suggest this because I've heard that languages sounds change according to general 'rules'. For example, 辟 劈, pronounced pi, are clearly related to 避壁臂, all pronounced bi - presumably because some they were all pronounced the same/(more similar?) ways at the time these characters were invented.

Nice post otaflegr. You may also be interested in the Heisig method(sorry no link - computer problems). It only teaches how to write the character when prompted by its meaning, but after that recognizing the characters, adding pronunciations and additional meanings is greatly simplified. Learning all 4000 in a year would be possible (approximately 400 hrs).

  • 4 years later...

@Otaflegr: If you check resources such as zhongwen.com or smarthanzi.net, you'll be able, via the phonetic breakdowns and components they provide, to reunite many of the "orphans" with their parents. :wink:

Phono-semantic compound characters "are often called radical-phonetic characters. They form the majority of Chinese characters by far—over 90% [1]" but notice not all of them are accurate. Out of all these only 58% (about 3688) have both the matching initials and finals. The second largest group of PS Characters make up about 13% (819), this group of PS characters have matching finals but only a close pronunciation of the initial. Basically the PS characters are rated according to 6 levels:

*NOTE: This has not taken tones into account

类别 - 字数 - % - 累计%

0 声韵全同 - 3688 - 58 - 58

1 韵同声近 - 819 - 13 - 71

2 韵同声异 - 782 - 12 - 83

3 声同韵异 - 376 - 6 - 89

4 声或韵近 - 485 - 7 - 96

5 声韵全异 - 250 - 4 - 100

Chinese source: http://chinese.exponode.com/2_1j_1.htm

- - - - - -

Rough Translations

- - - - - -

Type - Number - % - Total%

0 Matching Initial + Final - 3688 - 58 - 58

1 Matching Final + Close Initial - 819 - 13 - 71

2 Matching Final + Different Initial - 782 - 12 - 83

3 Matching Initial + Different Final - 376 - 6 - 89

4 Close Final/Initial - 485 - 7 - 96

5 Different Final + Initial - 250 - 4 - 100

Example of Level 0


Often times pictophonetic characters are obsoleted when the Chinese language change, and afterward they usually become characters that give no Semantic meaning or a phonetic element. Some of the simplified characters have their phonetic element replaced by another with close pronunciation and less strokes. However more characters became accurate after the simplification.

