Jump to content
Learn Chinese in China

Character to words ratio

Recommended Posts

Seeing how a number of characters seem to be used a number of times in different, and not necessarily related, words, how many actual characters do you think would be needed to write, say, the 4,000 most common words used in daily life. And then some, what about 10,000 words?

Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

If you're willing to take the HSK vocab lists as an acceptable take on 'most common words'. you can come up with this fairly easily:

Level 1: 1033 words, 798 characters

Level 2: 2018 words, 808 characters

Level 3: 2202 words, 598 characters

Level 4: 3569 words, 670 characters

So, rounding off to friendly numbers

Your first 1000 words need 800 characters

Your first 3000 words need 1500 characters

Your first 5000 words need 2100 characters

Your first 8800 words need 2800 characters

The main character learning 'push' would therefore be at the start, assuming of course you are learning to write. At Level 1 you have to learn 800 characters and you get only 1000 words in return. But by Level Four you get over three times as many words for less additional characters - although tragically you still have to actually learn the words, you can't just learn the characters and wait for them to pop into your head.

Link to post
Share on other sites

Yep me too.. thanks Roddy,

this is quite useful info.. I know what you mean about the words though... why is it that when two characters are combined they can come to mean something completely different to what you thought... sigh...

Link to post
Share on other sites
why is it that when two characters are combined they can come to mean something completely different to what you thought... sigh...

There're 2 possible answers to this:

1. To keep the number of characters down, out of pity for us learners.

2. To frustrate foreigners who try to master the language.


Link to post
Share on other sites

This is a related tool - you can plug in the characters you already know / are learning / dream vaguely of one day being somewhat familiar with, and it will output the words that those characters will allow you to write. It can be kind of encouraging to see sometimes how very simple characters you might learn early on in a writing course - say 中,立,天,文 - can combine to produce less common bits of vocab like 中立 and 天文.

I'd imagine there's something out there that works in reverse - plug in the vocab you already know and get a list of the characters you'll need to learn how to write - but I don't know specifically where.

Link to post
Share on other sites

Probably also worth noting that as the words are restricted to those on the HSK lists, you'd also find that those characters give you 'extra' words。 笔记本, for example, isn't on the lists, but you'd be able to write it with characters you'd learn at first level.

Link to post
Share on other sites
why is it that when two characters are combined they can come to mean something completely different to what you thought

I cannot really answer your question here but I think you can discover more through 语素.

语素 is like a "proper" Chinese intermediate grammar once you have mastered most Chinese basic grammar.

A good knowledge of 语素 will enable you to tell which characters can be combined and used as a pair, 3-character-word, etc. You will also gain a deeper understanding of the uniqueness of Chinese characters.


Link to post
Share on other sites

The following excerpt from the Clavis Sinica FAQ (http://www.clavisinica.com/fs-info.html) also contains relevant information that allows us to continue Roddy's list (approximate word-to-character ratio of 1-1 for the first 1000 words, 2-1 for the first 3000 words, 2.5-1 for the first 5000 words, 3-1 for the first 9000 words) by deriving an approximate word-to-character ratio of 6-1 for the first 25000 words.



How large is the program's dictionary?

The dictionary contains over 25,000 separate entries, including approximately 4,000 characters and over 21,000 multi-character compound words, phrases, and idioms, or chengyu. All of the entries are fully searchable in both English and Chinese.


I've heard that written Chinese has tens of thousands of characters. How can a dictionary of 4,000 characters be of much use?

It is true that the great Kang Xi dictionary of 1716 listed nearly 50,000 characters, but this number included many variant and obsolete forms. The number of characters to be found in modern Chinese texts is probably much closer to 10,000, and of these, more than half are used only rarely. The 4,000 characters included in the Clavis Sinica dictionary account for approximately 98% of the characters to be found in a typical modern newspaper, and 100% of the characters found in any of the most commonly used college-level Chinese textbooks.


On what basis were these 4,000 characters selected?

The Clavis Sinica dictionary is based on the first level of the Guo Biao Chinese character set, which is the accepted standard in the PRC. The 3,754 characters in this set represent the most commonly used characters in the modern written language. Clavis Sinica supplements these with an additional 250 of the more frequently seen characters from the second level of the Guo Biao character set.

Link to post
Share on other sites
  • 1 year later...

Roddy and or anyone else here on this forum,

Can you guys tell me where I can obtain a list of HSK word list up to level 4? I spent three years loosely learning Mandarin, but I have studied Cantonese since I was a little kid. I say I know roughly 800-1000 characters (not sure how many words) and I want to intensify my learning. I'm definitely a lot better at reading and recognizing a character than writing it. If there is a list, then I can focus more on what I ought to know and just follow up on it.

Link to post
Share on other sites

Back to the topic, I've never understood why some learners places such a high importance on character frequency lists. As if they plan to learn the first 4,000 or however many characters in isolation and hope that will cover them? Surely they would be better off learning vocabularies in context?

Link to post
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...