Jump to content
Chinese-Forums
  • Sign Up

Excel word list


anticks

Recommended Posts

A while ago i saw someone post up a word list (or link to one) in an Excel spreadsheet. It was the 3000 most common words.

Does anybody know where i can find this?

Search a program called pocket scholar.. its similar to supermemo (i think) but free and it uses txt or csv excel files, also comes with a converter.

Thanks in advance for anyone who can help with that spreadsheet.

Thanks

A

Link to comment
Share on other sites

I'm not sure about the 3000 most common words, but there is a vocabulary list for HSK, with definitions here on this site. It is likely a good approximation for most common words, and you can download them as CSV files, and load those into Excel.

Levels 1+2 will come to around 3000, and they are all very common and important words.

Link to comment
Share on other sites

I just made a word list out of the Lancaster Corpus of Mandarin Chinese.

One obstacle to making word lists is that the source texts need to split into separate words. It can be automated to about 95% accuracy, but to be fully correct requires human intervention. The Lancaster Corpus has done this with about 1 million words worth of texts.

However, another trap is that after the first few hundred words, you meet the long tail, where dozens of words have the same frequency. Grab one extra article on the Olympics, for example, and "奥运会" shoots up hundreds or thousands of places. So the choice of source material becomes important. In my Lancaster Corpus list, you will see that 苏联 (Soviet Union) is ranked #923. One category of their texts is news and press releases, most of which were between 1990-92. But with that caveat, I hope you find it useful.

Link to comment
Share on other sites

EDIT: Sorry, my bad - didn't see the the fact that you've got them separated in 4 books (Or whatever they're called, I forget). Good stuff.

I'm not sure if that's supposed to be the whole list or what? There's a total of just over 2000 rows (2018), which doesn't look much like 6800 words to me :)

Edited by ipsi()
Link to comment
Share on other sites

To make things clear for everyone, there are 4 worksheets total with the characters and words mixed up. These are sorted alphabetically by the pinyin. I got a few hundred more to go on #2 before I advance to level 3.

Worksheet 1 has 1033 characters & words.

Worksheet 2 has 2018 characters & words.

Worksheet 3 has 2202 characters & words.

Worksheet 4 has 3571 characters & words.

Link to comment
Share on other sites

I'd be careful about how you think about characters and words here - those lists aren't characters + words, they're words. Some words may be single-character, but thinking on them as characters could confuse the issue. This might explain a little.

Or perhaps I'm being pedantic. (sh)

Link to comment
Share on other sites

I think that single-character words can also be classified as characters even if they are words by themselves. (Am I right?) Anyways, learning words are much more important than learning characters by themselves. After all, if one just starts studying the 3000 most common characters, then he/she will eventually realize that he/she can still not communicate very effectively without words.

Edited by ABCinChina
Link to comment
Share on other sites

  • 1 month later...

Okay, I had a little play with the spreadsheet: I wanted a list of all the characters you should recognise at each level. Some of the characters are "words" in their own right, in that they stand alone very easily. Others only exist (for HSK, at least) in a bound form with another character.

Stats:

A: 805

B: 798

C: 590

D: 669.

This means for example that to recognise all the words and characters for level C, that haven't already been seen in levels A and B, you need to learn 590 characters.

Three caveats:

1) I might have made mistakes (especially for A, which I believe should total 800).

2) Probably there are characters in, say, lists A B or C which only occur in "bound form" as part of a two-character "word", but which then appear in list D as a standalone character. I haven't taken this into account.

3) It may or may not be useful to learn certain characters on their own.

Fuller stats (which may be even more wrong of course) :

Unique Characters / Total entries / Of which single / Of which multiple / Bound only

A 805 / 1033 / 453 / 580 / 352

B 798 / 2018 / 559 / 1459 / 239

C 590 / 2002 / 441 / 1561 / 149

D 669 / 3571 / 457 / 3114 / 212

So it appears that for A, for example, there are 453 characters which stand alone, but a further 352 which only appear as part of a multi-character word.

Don't know how useful that is ... I just want to be able to work out how which characters & words to learn to "complete" the different HSK levels.

Link to comment
Share on other sites

  • 2 weeks later...

Hehe, you could be right but it's always fun to check one's progress, see how much further to go, right?

Anyway, just a warning about the file that is linked to earlier, called HSK List.rar. It has quite a few mistakes in it: a few wrong tones, but also some completely wrong translations of words. The most recent one I found was: 报酬 translated as "revenge, avenge" but according to Wenlin 报酬 means "reward, renumeration, pay". It is in fact the similarly-sounding 报仇 which means "revenge, avenge".

I still think the list is really useful but I'd treat it with caution. I double check every definition.

I'd also like to know where it came from ... is it "official" in any way?

Link to comment
Share on other sites

  • 3 months later...
  • 1 month later...
  • 1 month later...

That's exactly the same list that realmayo is talking about (and which is available here on this site and a number of flashcard programs).

As far as I can tell, that's the single most reliable corpus for learners of Chinese out there. It's probably a good idea to learn all of that (pretty much all of it is important) and then get the spoken vocabulary from spoken materials (TV-shows, movies, radio, podcasts, etc.)

Link to comment
Share on other sites

  • 4 weeks later...

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...