Jump to content
Chinese-Forums
  • Sign Up

How frequently used are the words in the new HSK vocabulary?


elliott50

Recommended Posts

I've been studying the new HSK vocabulary on scritter (finished levels 1-5 and just started 6). However as I've been studying I have noticed just how uncommon some of the words in the list are - so I thought I would do some analysis...

I used the Leeds Internet Corpus (http://corpus.leeds.ac.uk/list.html) to get a word ranking by usage frequency for the first 50,000 words. Then compared each word in the 6 HSK lists with this ranking to find not only the most and least frequent words in each list, but more importantly the average ranking of the words in the list.

EDIT: However @c_redman spotted an error in the results, so I have removed my analysis so as not to mislead any other members of the forum. Sorry everyone. :oops:

(The original error-prone spreadsheet results PDF remains downloadable below)

HSK_vs_leeds_internet_corpus.pdf

  • Like 3
Link to comment
Share on other sites

Agreed :conf . I think it's the corpus frequencies that are skewed.

If the HSK is anything like the English Academic Word List or similar lists, it's not just about raw frequency (although that's a major factor) but the "dispersion" of the words; i.e., how broadly applicable they are to a wide variety of texts, not just concentrated in a few areas.

Edit: In the Leeds internet corpus, word #48790 is 踢腿, not 踢足球; #42501 is 电子计算机 not 电子邮件; #38697 is 葑 not 著名; and #49928 is 飞禽 not 飞禽走兽. None of the target words are in the Leeds word list, which is kind of suspicious. If you do a lookup from their query interface, you get 94 results for 飞禽走兽, so something seems messed up in the frequency list. Also, 踢足球 only returns results as "踢 足球", so the word segmentation is also suspect.

  • Like 1
Link to comment
Share on other sites

My errors seem to stem from using the LOOKUP function inside a huge Apple "Numbers" spreadsheet which recalculates very slowly. However I can't see where the problem lies, which probably means it lies in my inadequate spreadsheet skills. So I will abandon my efforts. Sorry again for wasting everyone's time :oops: !

Link to comment
Share on other sites

What I have found is that if you have sufficient input, most words that you think are obscure/rarely used, will actually still occur in regular usage. What is sufficient input? I would probably say consuming 1/2 an hour to an hour of native material a day. I'm often surprised at how I word I think is completely useless appears in the least expected places (granted that link is not a typical example, as I normally wouldn't have to wait 9-10 years before seeing the word again).

  • Like 1
Link to comment
Share on other sites

For my 2 cents there are some very common and some not so common words on the HSK 6 list. I think if someone is going to take the HSK 6 obviously they should study the whole list, but if they are just looking to build their general Chinese vocabulary it might be better to use other sources after the first 4-5 HSK levels. A lot of the Chengyu in particular on the HSK 6 list aren't that common and, as someone else said, there are plenty of common words which aren't on the list.

  • Like 1
Link to comment
Share on other sites

I'll be finishing level 5 soon and I'm debating whether or not to move on to level 6. For those of you who have studied level 6, is it worth it? If not, what's the proper course of action? Another deck? Build your own?

Link to comment
Share on other sites

@jasonchina, after doing my flawed analysis and producing my dodgy spreadsheet I can at least say that Sigma is right - there are many very common words that do not appear in the HSK list. So personally, I am certainly heeding the advice of WestTexas and creamyhorror and not studying the HSK 6 list yet. My current strategy is to learn the most common 5000 words (based on corpus lists available on-line) first, around 1000 of which do also appear in the HSK 6 list.

Link to comment
Share on other sites

I'll be finishing level 5 soon and I'm debating whether or not to move on to level 6. For those of you who have studied level 6, is it worth it? If not, what's the proper course of action? Another deck? Build your own?

I would definitely suggest the latter, build your own. If you look at the corpus you'll see that the first words the difference in frequency is obvious. After a few thousend the difference becomes less and less obvious. This means the choice of words to learn become less obvious as frequency is often also context/material dependent. If you built your own you learn the words that are most relevant for you.

Of course it's hard to judge which words to learn from the words that you come across as you can't learn them all. At least not when you're still at a fairly beginner stage. I decide on a mix of factors in which guts feeling, frequency I encounter them and HSK are important factors.

More about my idea's about the subject can be found here

Link to comment
Share on other sites

Don't get me wrong, im not trying to start an argument however, I have to suggest it does seem like if people spent more time studying and less time playing with computer statistics they would have learnt more :P~

Just read stuff. If its a useful word, it will come up over and over again and you wil remember it well. If its a useless infrequent word, it wont come up that often and you'll forget it.

  • Like 1
Link to comment
Share on other sites

You're definitely right. It's better to study instead of overanalyzing things. Without any analysis however there may be a fair chance you study the wrong thing:) A little analysis may also be (de)motivating if you see the progress you made and what still has to be done....

The same is true about a lot of the discussions over here, finding the right material, tools (toys) etc for your study, all kind of other 'irrelevant' discussions etc. If we would, more in general, stop analyzing and discussing, decided to work instead a lot more would be done, but whether more would be achieved I'm not so sure.

Link to comment
Share on other sites

Well the think I am keen to work out is, would knowing how useful words in XYZ word list actually help anyone? If you already know the word lists from HSK1-5, cant you just start reading stuff?

Find stuff to read that is not too hard, and progress from there. Words that are high frequency will be high frequency in your reading material. Words that are low frequency will be low frequency in your reading material.

  • Like 1
Link to comment
Share on other sites

would knowing how useful words in XYZ word list actually help anyone?

Yes, one has to know how usefull a list of words is before deciding to study it or to choose another list.

Find stuff to read that is not too hard, and progress from there.

To find that stuff some analysis is needed. I looked at material and thought that it didn't look to hard but appeared to be when I tried to actually read it. From just a simple look at it it's hard to decide how much you actually can read vocabulary wise. Grammar wise it's even harder to decide.

Words that are low frequency will be low frequency in your reading material.

It's not that simple. Word frequencies vary hugely depending on author, subject, region, etc. So it may very well be usefull to choose carefully what to read also with a goal in mind.

Off course, you're right that after reaching a certain level it's more important to read than what is read. Nevertheless I think it's worthwhile to put a little effort in choosing the right material. The wrong level or just the wrong subject can have a huge effect on your motivation and consequently learning experience.

Link to comment
Share on other sites

  • 3 months later...
  • New Members

Looking at this post, and also taking the level 5 in May, I think it would be good to compile a larger list of the most commonly used words for the HSK (Something like 200 of the most common phrases and words).

If there is already one out there, could someone tell me, it would cut down on my time studying all 2500 new words. Instead spending time on learning how to say "battery charger" or "to bore through wood", I can focus on the more important words and phrases.

What do you guys think?

Link to comment
Share on other sites

Instead spending time on learning how to say "battery charger" or "to bore through wood", I can focus on the more important words and phrases.

But aren't you preparing for the test? I'd think if getting a good mark is your goal you would focus on learning it all sufficiently to score well.

If day to day interaction is your goal I think you probably already have an idea while studying the list which words you expect to use more frequently or not, and if you're making a point of daily interaction or media viewing the most important ones should be reinforced through 'natural SRS'. I'd just stick with studying the lists as is and let other activities reinforce what is really important.

Actually, I think knowing "battery" or "battery charger" is a pretty reasonable expectation if you are at the HSK5 level (low advanced, supposedly). Useful, too.

Link to comment
Share on other sites

  • 2 months later...

I agree with Icebear that you should train the skill that you want to have. The "natural SRS" approach has the advantage of delivering the most frequently used words to you in context, but the disadvantage that it delivers many infrequently used words too.

So for me, the role of the SRS system I use (Skritter) is to support my reading and use of the most frequent words, using the published academic frequency lists as my guide.

The reason I started this thread was to try and understand how this approach related to the HSK tests.

The key thing to understand about the new HSK is that it is a sample test. It you look at the new HSK words, many of them are quite infrequent, especially at higher levels, as others have noted above.

Although I messed up the detail of my report on the frequency of words vs their HSK level above, I still believe that, if you only study the top words by frequency, roughly speaking:

new HSK 4 (1.2k words) requires you to know the most frequent 4k words

new HSK 5 (2.5k words) requires you to know the most frequent 8k words

new HSK 6 (5k words) requires you to know the most frequent 16k words

Or, to put it another way, at any new HSK level, the word list only covers about one third of the words that you should really know at that level.

By contrast, I believe both the old HSK and TOCFL test word lists cover about half of the words you should know, to be competent at a given level.

All of the lists tend to omit the large number of nouns, especially proper nouns that occur in normal language, as well as most informal language.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...