Jump to content
Chinese-Forums
  • Sign Up

Charactor+Usage Statistic


KiraKira

Recommended Posts

I've heard so many different claims on various blogs and forum searches regarding the number of hanzi and usage percentages that I'm not sure which is correct.

For example claims like:

"100 characters cover about 50% of the used language, and 500 characters cover about 90%." .. or variation of that statistic.

Does anyone have a source that has a pretty good arguement/analysis used to back up the claim? I know there is some correlation (and it would be nice if it were close to the one posted) but I just want to be sure & see what everyone thinks the closest correct claim is.

Link to comment
Share on other sites

Just remember though that this is for characters only, so for non-native speakers who know 3000+ characters, there will still be many words that are unfamiliar. Most usage statistics like this are really only valid for native speakers and not for language learners.

Link to comment
Share on other sites

Just remember though that this is for characters only, so for non-native speakers who know 3000+ characters, there will still be many words that are unfamiliar.

This is very true, however, to know a (multi character) word you need to know the individual character first. So knowing all characters is for sure a good start.

Next step is to know all words.... But that's not it. Even if you know all words you will still often have difficulties to figure out the meaning.

It has some similarity to the claim that just 26 letters covers 100% of English texts.

I think it makes perfect sense to learn characters according to frequency. They are not avoidable anyway, and it's very motivational since you can very fast recognize lots of passages.

So frequency analysis is a very useful toy for sparetime fun use, not more, not less. It's specially useful to analyze online text that you plan to read to figure out if it's suited to your level.

With 500 characters you can "see" 90% of the text, but the other 90% are in the remaining 10%...

Link to comment
Share on other sites

I am in a conversational class now, what I noticed that many common words used in speech (not so much in the written language) may not have the same frequency. In other words, all frequency ratings are based on newspapers, formal texts, not on what you hear more often in the street (I am talking about standard Mandarin vocabulary). Just my 2 cents, thought it was mentioning.

Lists of characters by frequency are only useful for some reviewing, not for actual studies, anyway. Just keep reading texts. Individual frequency lists may differ largely.

I haven't seen detailed analysis of Chinese frequency lists but I've seen descriptions how Japanese ones were made, which newpapers were used, over which period, etc.

If you do a simple search by character in google, it gives you a number of hits.

Link to comment
Share on other sites

Individual frequency lists may differ largely.

That is very true. But #1 is mostly 的. In the Top ten are usually 一,了,们,在,是...

I don't use it really for learning, more for Chinese related fun. But also to check before I read something longish how many different characters are there.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...