Jump to content
Chinese-forums.com
Learn Chinese in China

  • Why you should look around

    Since 2003, Chinese-forums.com has been helping people learn Chinese faster and get to China sooner. Our members can recommend beginner textbooks, help you out with obscure classical vocabulary, and tell you where to get the best street food in Xi'an. And we're friendly about it too. 

    Have a look at what's going on, or search for something specific. We hope you'll join us. 
chijyh

Total Number of Chinese characters

Recommended Posts

chijyh

Re: a thread I found here: http://www.chinese-forums.com/index.php?/topic/80-how-many-chinese-characters-are-there

Originally Posted by sherman

"China ministry of education requires the student in senior middle school can recognized 2900 chinese characters. In taiwan it might be about 3100 and in Hongkong 2600."

I think you guys may be looking at the lower elementary school data, (see link below). As a native chinese speaker, I'm sure we learn a lot more, even for an average middle school person, should know at least around 5000. (Note: the total number of Chinese characters in a common book is already 8000-13000 characters!)

I'm not sure about the figures about Hong Kong 2600, mainland ~2900, I do NOT think these figures are substantiated. If you like, you can always check out the Education Department (government) website of both Hong Kong or PRC etc.

But as you may know, not just the number of words you know is enough, also sentence structures and how it applies to written/spoken context. Well, have fun learning Chinese! :)

http://people.netscape.com/ftang/chineselearning/howmanychinese.html

# According to "Chinese Information Process (3)", published at May 1987

* The total number of Chinese characters used in common book is around 8,000- 13,000 characters

* The total number of Chinese characters used in Elementry School and Middle School textbook is 5,404 characters

* Common used characters in newspaper is 3,000 characters

* The most common used 4,000 characters accumulate frequency is more than 99.6%.

* The accumulate frequency for the most commonly used 6,000 characters is 99.88%

Share this post


Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

zhwj
Note: the total number of Chinese characters in a common book is already 8000-13000 characters!

I'm fairly certain this is a mistranslation - the number of characters in a common one-volume dictionary is around 8000-13,000 (the 《应用汉语词典》 I have in front of me lists 10,000 in the preface, and it has more than the 《新华字典》).

Share this post


Link to post
Share on other sites
gato
I think you guys may be looking at the lower elementary school data' date=' (see link below). As a native chinese speaker, I'm sure we learn a lot more, even for an average middle school person, should know at least around 5000.[url']http://people.netscape.com/ftang/chineselearning/howmanychinese.html[/url]

From your link: http://people.netscape.com/ftang/chineselearning/howmanychinese.html

It says that students in Taiwan are expected to know about 4,800 characters by the time they finish Grade 12. That number could be somewhat lower in the mainland because some simplified characters can represent multiple traditional characters.

In my experience, once you've learned about 2200 characters, you can comfortably read most newspaper articles.

From the same page:

Chinese characters used in Internet:

According to "A composite approach to language/encoding detection", a research paper by Shanjian Li and Katsuhiko Momoi from Netscape Communication (see http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html),

The most common 2048 characters make up 97.583% of the texts examined.

The 2000 addtional characters you learn after the first 3000 go mostly to help you read classical Chinese. Most of them are very rarely used in modern Chinese texts.

Share this post


Link to post
Share on other sites
wushijiao
In my experience, once you've learned about 2200 characters, you can comfortably read most newspaper articles.

Gato, if I'm not mistaken, aren't you a native speaker of putonghua (and Shanghaihua)? :D:conf

I think for non-native speakers, character recognition isn't a good measure of literacy, even though it is for native Chinese speakers. This is because Chinese people are already fluent in the spoken language by the time they formally learn to read (ages 5+). For example, let's say I were a Chinese farmer that has never learned to read characters, but am a fluent speaker. I'm a smart person, but for economic reasons, never had a chance to go to school. If I learn characters 2048 characters, then I'm basically literate. I could read, say, 小康 (xiaokang) and know that it means "well-off" (one of Deng Xiaoping's political ideas) or 主观 zhuguan, which means subjective, 抽空 (choukong) have free time/manage to find time.

Yet a foreigner who has studied Chinese for a year or so probably can recognize those characters, but wouldn't nessecarily know the exact meanings of the words because s/he isn't already fluent in the spoken language. Of course, one can always take an educated guess in context and be right some of the time, yet I think most non-native learners still needs to learn those character combinations in a systematic way.

I'm only saying this because at one time I fell under the impression that if I could just recognize about 2,500-3,000 characters, then I'd be fluent. Once I got to that level (roughly), I could guestimate what was going on, but I later had to learn character combinations, of fairly common characters sometimes, systematically.

Share this post


Link to post
Share on other sites
gato
Gato' date=' if I'm not mistaken, aren't you a native speaker of putonghua (and Shanghaihua)? :D:conf

I'm only saying this because at one time I fell under the impression that if I could just recognize about 2,500-3,000 characters, then I'd be fluent. Once I got to that level (roughly), I could guestimate what was going on, but I later had to learn character combinations, of fairly common characters sometimes, systematically.[/quote']

You're right. I am a native speaker of Mandarin and Shanghainese. It is different for non-native learners who don't have same spoken vocabulary. Knowing only the characters is like only knowing the Latin/Greek roots for English words. You can kinda guess what the words mean, but you won't know how the words are used and their colloquial connotations. It happens to me sometimes when I encounter a classical Chinese phrase (such as one of those set phrases/idioms inherited from classical Chinese) . I would know all the characters but not the meaning of the phrase.

Share this post


Link to post
Share on other sites
Harpoon

wushijiao, what do you mean exactly? can you give an example?

Share this post


Link to post
Share on other sites
wushijiao
wushijiao, what do you mean exactly? can you give an example?

Well, for example, if you are a Chinese illiterate person who is learning to read, you might go through this process. You learn the character 法 (fa3). Then, maybe a week or two later you learn the character 国(guo2). You could read them together as 法国 and know that it is not "Law-Kingdom", but rather "France".

In the same way, when I was learning to read in English, I could read D-O-G, slowly sound it out to myself, (Duh, Duh,Doh,Doh,DohDohga, oh! DOG!) I knew it. That's because at age 5 I already knew the word "dog" in spoken English. However, a native Spanish speaker certainly can regognize the letters D and O and G. But she may not know what they mean when put together in English.

In any case, I read somewhere a long time ago, that when the PRC was set up, they had literacy drives for adults they put the benchmark at 2,000 character. That's because, like gato stats show, at that point you could certainly be functional in society or the workplace. Similarly, in China, they often say a high school graduate should know X characters or a college grad should know X characters. That's great for them. But learning Chinese as a foreign language is another ballgame.

Share this post


Link to post
Share on other sites
wushijiao
Knowing only the characters is like only knowing the Latin/Greek roots for English words.

Interesting. I sometimes think of characters as prefixes or suffixes, with rough meanings which are one can guess. From this way of looking at it, Chinese is a bit like 2,000 or so prefixes or suffixes that when put together in 2's, 3's or 4's, form words.

Share this post


Link to post
Share on other sites
woodcutter

Wushijiao is oh so correct in this thread.

Foreigners who know 2000 characters will have no chance at all of reading a newspaper - that's not so many, and such people are unlikely to know a lot of words. And it all depends on knowing words, in the end.

Why is this daft kind of statistic constantly recycled?

And as I have mentioned elsewhere, newspapers are very difficult, and contain all sorts of strange names which will be mysterious to a relative beginner.

Share this post


Link to post
Share on other sites
gato
Foreigners who know 2000 characters will have no chance at all of reading a newspaper - that's not so many, and such people are unlikely to know a lot of words. And it all depends on knowing words, in the end.

That statistic is important. The point to take away is that one should concentrate on learning words instead of characters once reaching the 2000 level. Unless you're interested in reading classical Chinese, you don't need many more than 2000 characters. You're only behind in terms of word vocabulary and not characters. Concentrating on words at that point will help more than learning more characters on their own.

And as I have mentioned elsewhere, newspapers are very difficult, and contain all sorts of strange names which will be mysterious to a relative beginner.

The transliterated non-Chinese names that abound now are difficult for everyone, Chinese and non-Chinese. I think you need to tackle them head-on. Get a list of famous non-Chinese names (places and people) transliterated into Chinese and study that list on its own. Once you recognize the special characters used solely for transliterating non-Chinese names, you'll be able to tell, at least, it's a name you're looking at, and not fancy Chinese word you haven't learned, yet.

Share this post


Link to post
Share on other sites
roddy

I agree that there's far too much emphasis on 'how many characters do I need' rather than 'how many words' - Chinese is made up of words, not characters.

As for the issue of names and unknown words - I think coping strategies for those are much more valuable than continuing to learn the 字 and 词 - a short course 'Recognizing names of people and organisations in Chinese reading' would probably be more useful than extra vocab which, at higher levels, provides ever diminishing returns.

Roddy

Share this post


Link to post
Share on other sites
in_lab

Another one of the reasons you always hear about knowing 3000 characters is because it's an attainable goal. How many words can a native speaker understand using those 3000 words? I think that figure would scare most learners (including me). But, on the positive side, learning those words should be no harder than Japanese, Arabic, etc, and probably easier.

Roddy, what ideas do you have for a whole course in recognizing names of things?

Share this post


Link to post
Share on other sites
roddy

Ah well, if you're going to ask for concrete suggestions I'm in trouble :shock:

Off the top of my head -

Surnames

Characters commonly used in given names

Characters used in transliteration

Practice in, when you come across the start of a name (particularly a long one - ie an organisation, or transliteration) skimming past till you spot a 'non-name' character which tells you the name has stopped. You can then skip it, hope that context will make it clear, or if necessary go back and figure the name out, now aware of its boundaries.

It might be possible to identify words that commonly follow a name - it wouldn't be likely to see a noun follow a personal name, but job titles, verbs . . . (grasping now . . . :D )

Organisations - 部门, 处, 办公室, 公司, can all mark the end of an organisation name - when you think you've hit an organisation name, look forward for something like that, 'chunk' it off and ignore / interpret it as needed.

Any other ideas?

Share this post


Link to post
Share on other sites
gato

I've found just what you guys need for your course on names (maybe you want make the post a sticky or something).

(1) Chinese personal names:

A list of common Chinese surnames

http://www.greatchinese.com/surname/surname.htm

Traditional guidelines for Chinese male and female names.

http://www.baixun.com/qmjq.htm

Note that under Communism, many mainlanders chose names like 建国(build country) or 红军 (red army) instead of more traditional possiblities.

(2) Non-Chinese names

Famous People:

http://zh.wikipedia.org/wiki/%E8%AF%91%E5%90%8D%E8%A1%A8/A

Common English Names

Male:

http://szbbs1.soufun.com/post/1946_4705975_4705975.htm

Female:

http://szbbs.soufun.com/2810065142~2/4705989_4705989.htm

(3) Region names:http://zh.wikipedia.org/wiki/%E4%B8%96%E7%95%8C%E6%94%BF%E5%8C%BA

Some more guidelines for transliterating non-Chinese names into Chinese:

http://zh.wikipedia.org/wiki/%E4%B8%AD%E6%96%87%E5%AE%98%E6%96%B9%E8%AF%91%E5%90%8D

Share this post


Link to post
Share on other sites
in_lab

Useful links. That one about choosing the names looks especially good.

What's up with simplified/traditional characters in Wikipedia? I see them mixed together on the same page. I thought there were two Chinese versions of Wikipedia. Did they get merged?

Share this post


Link to post
Share on other sites
smalldog

I've finally written a rough bit of code to test how many characters I know, after getting fed up of being asked that question and being unable to find a working test online. You can take the test (for the time being) at www.cugbwaiyu.com/hanzi.

It keeps a running estimate of how many characters you know at the top right. There is no 'end' to the test... just keep going until the number settles down to a range of about 100, which should be after 20-30 questions.

I don't know how accurate my model is, so some feedback from people who already know how many characters they know would be useful. The test tells me I know about 2100, which sounds about right. I tried it on a Chinese uni student who got 5500.

Share this post


Link to post
Share on other sites
gato
I've finally written a rough bit of code to test how many characters I know' date=' after getting fed up of being asked that question and being unable to find a working test online. You can take the test (for the time being) at www.cugbwaiyu.com/hanzi.

That's a great little program. Thanks.

Share this post


Link to post
Share on other sites
TCcookie

Awesome, Smalldog. Thanks! I especially like it because I apparently know a lot more characters than I thought I did ;) (I guess about 6-700 but got closer to 1000), although it sounds about right; I usually give what I think is a conservative estimate. How do you calculate that number? How does your model work?

Share this post


Link to post
Share on other sites
roddy

This reminded me of a post made by a member who unfortunately doesn't have the time to participate any more, in a topic on the same subject over a year ago.

But there are characters that I comprehend from the context, but can't write and can't pronounce. Can write and can't pronounce. Can pronounce but can't write. Could make a guess at pronunciation. Sort of comprehend, can pronounce and write easily. Get the meaning of what seems to be a word, without having a clue about one of the characters. And whatever other permutations are left.

I'd get that put on a T-shirt, if it'd fit.

I found doing that test that I was thinking things like 'oh, that's the second character of that word that means . . . ' and 'that's pronunced ting1. Can't remember what it means.' and 'damnit, I'd know that one in a sentence.' I decided to be remarkably lenient with myself as it's late and I want to go to bed in a good mood, and came out at around 2500. :twisted:

Roddy

Share this post


Link to post
Share on other sites
skylee

I feel the same as roddy, but it was always over 5000. How does it work?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×