Jump to content
Chinese-Forums
  • Sign Up

How many WORDS in the chinese language (NOT characters)?


ma3zi1

Recommended Posts

My colleague has been trying to convince me that English is accepted world-wide as the language with the most words. He has pointed out to me that Chinese only has 100,000 or so characters.

However, characters and words are not equivalent, some characters are never even used outside the context of their words (for instance the 邋 from 邋遢). My colleague tells me that the Oxford English dictionary has 600,000 words or so. How many entries are in the equivalent Chinese dictionary (ex: 重編國語辭典 or 漢典) ?

I know that the structure of the language is such that morphemes and words tend to flow together in a way that does not necessarily define individual words (especially since spaces are almost never used). Furthermore, people are always generating new constructs or creating acronyms.

In so much as we can agree on the definition of a Chinese "word", what is a good estimate for the total number of words in modern Chinese?

NOT characters :D

Link to comment
Share on other sites

There is no way to "know" how many words a language has. There are only estimates.

As far as I know, English is, indeed one of the languages widely considered to have the most extensive vocabulary (numbering in the millions). This is largely due to the variety of influences, as it is uses a combination of Romance, Germanic, Latin and Greek vocabulary, and the huge geographical spread.

I also believe that Chinese is up there in terms of vocabulary. This is due to the fact that the (written) language has such a long history, and many ancient words are still accepted as perfectly normal today, despite totally different pronunciation. In either case, the 100,000 number is pants. My (modern) Chinese-English dictionary has more than 140,000 words, and there are far more extensive Chinese-Japanese and especially Chinese-only dictionaries.

I don't think that you'll get a precise number. These things also depend on the definition, is a word from the Tang dynasty still considered, even if it is not widely known today?

Also keep in mind that these languages are spoken, and the only vocabulary that matters today is that which is readily understood by native speakers. The estimates for English speaking University students tend to be around 40-60,000 words, and there is an estimate for Chinese students of around 46,000 (both cited by DeFrancis). Very few people out there can comfortably use more than 100,000 words.

Link to comment
Share on other sites

These things also depend on the definition, is a word from the Tang dynasty still considered, even if it is not widely known today?

Why do you people always refer to the Tang dynasty when you know pretty well that the Chinese language goes way back to the Xia dynasty? Qin Shi Huang reformed and united the Chinese language and his advisor Li Si created the Xiaozhuan or Small seal script by simplifying the Dazhuan or Large Seal script. The modern script is based on the one used since the Han dynasty.

Tang dynasty is only a point in time where Middle Chinese started taking shape, but it might not have started there. I believe it started with the San guo era where there's three people fighting over territories and therefore would need some armies which would consist of people of different races or of mixed heritages.

Or maybe it started with Lu Buwei who got a lot of learned peoples to start the Lu zhi or Lu shi chunqiu prior to the Qin dynasty?

Well to answer your question "How many WORDS in the Chinese language (NOT characters)?", It's hard to say: Modern Chinese = Ancient Chinese + Old Chinese + foreign loanwords + invented words + created characters, etc... out of necessity. People might have different ways of understanding what is a word, phrase, sentence, etc... due to Chinese words, especially those with one characters but has multiple pronunciations, where each pronunciation is attached to a different meaning, which could sometimes turn from a noun to a verb, etc... too quickly, depending on the context.

朋友 = one word

男 = one word

女 = one word

男朋友 = one word? [if 男 is one word and 朋友 is one word, shouldn't this be considered two words?]

女朋友 = one word? [if 女 is one word and 朋友 is one word, shouldn't this be considered two words?]

男女 = one word or is it two? [男女 = 男+女]

男女朋友 = one word or is it two or is it three? [男女朋友 =男朋友+女朋友? or 男女朋友 = 男+女+朋友? ]

Just how many words have I just listed above?

(for instance the 邋 from 邋遢)

Actually that's only one word, 邋遢, "Lata", from Manchu, meaning something like "dirty" or "dusty", etc... : Chinese only approximated the pronunciation. I'm sure "La" or "ta" would not be the same as "Lata" and would not be abbreviated as such in Manchu. 邋遢 is used in Cantonese, to mean "dirty" and possibly other dialects also use it, but a Mandarin speaker might or might not know its definition. 邋遢 are created characters to reflect its definition? or at least pronunciation, derived from Manchu. Therefore, in conclusion, 邋 & 遢 are NOT two separate words, they each are only but half of a whole.

Edited by trien27
Link to comment
Share on other sites

Why do you people always refer to the Tang dynasty

Why not? :conf

If 男 is one word and 朋友 is one word, shouldn't this be considered two words?

No, for the same reason that "spaceman" is one word, and not two words.

But it's true in general that there are disagreements about what exactly is considered a word. Some people even count inflected forms as separate words for the purpose of estimating vocabulary.

Link to comment
Share on other sites

The most accurate estimation could be done from the largest Chinese word dictionary (词典, not 字典). If the largest word dictionary only contains ### words, then any estimation above would only be an estimation. Someone please find it, I couldn't after a quick search.

现代汉语词典 is considered a middle-sized word dictionary, it contains only 56,000 entries.

It's a well-known fact that English contains the largest number of words, as all cultural or historical, scientific phenomena from other languages are either translated or transliterated into English. Like "valenki" is a Russian word but is part of the English vocabulary.

--

EDIT:

Hanyu Da Cidian - 汉语大词典 (editor - Luo Zhufeng) - approximately 370,000 entries

Link to comment
Share on other sites

  • 3 weeks later...

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...