Jump to content
Chinese-Forums
  • Sign Up

splitting HSK vocab in smaller chunks?


Erbse

Recommended Posts

Hi guys,

I'm working on a mobile flashcard software and I'd like to use the HSK vocab.

They are already split up into level1 to 4, however each level has to much vocab in it. So I'd like to split these groups down further, but I'm not sure how to do that in a meaningful way.

My Idea is to create chunks of 100 vocabs each. Is there any meaningful system out there, to create meaningful chunks of HSK vocab?.

What do You think is the ideal number of flashcards per chunk? 20, 50, 100?

Link to comment
Share on other sites

If you already know some vocabulary, you could screen the vocabulary for things that seem familiar / logical. This is what I did. I went through each new HSK level with a text editor and copied all the vocab where I could guess at the meaning, or I had seen the word before, or I knew all the characters. Then I learned those first.

It's time-intensive (took me many hours), but it can ensure that you tackle the "easy" part first and build a reasonable vocabulary quickly.

If you're starting from scratch, I'd just divide each level randomly into small chunks and go from there. You'll have to learn most of those sooner or later anyway.

Link to comment
Share on other sites

I think what would be very useful, and as far as I'm aware isn't currently available, is some kind of categorization.

Kitchen Vocab: 筷子,腕

Foods; 土豆, 青菜

Sports . . .

Places . . .

Etc . . .

Would take some work.

Edit: And moving.

Link to comment
Share on other sites

yes, it takes an enormous amount of time. There is a reason most dictionaries are sorted alphabetically. Nevertheless, it'd be a great thing to have.

There is a lot of teaching material like that available in Japanese. I've got two volumes, "2000 basic words" and "2000 additional basic words". The first title is called 中国語基本単語2000. The words are ordered thematically.. Is there something similar available in English?

For instance "料理を作る” (cooking) has the following:

汤,炒饭,稀饭,炒面,咸菜,烧饼,馒头,包子,糕,饺子,烧卖,油条,火腿,腊肠,切,煮,炒,煎,靠,蒸

(don't have the advanced one here right now)

Link to comment
Share on other sites

What might work is grouping them by characters contained . .

Obviously you're going to have overlap issues, and some very small / very large groups, but . . .

菜 cài

白菜 báicài

蔬菜 shūcài

菠菜 bōcài

青菜 qīngcài

菜单 càidān

芹菜 qíncài

油菜 yóucài

火车 huǒchē

火 huǒ

火柴 huǒchái

灯火 dēnghuǒ

火箭 huǒjiàn, rocket

火力 huǒlì

火焰 huǒyàn

火药 huǒyào

点火 diǎnhuǒ

发火 fāhuǒ

火山 huǒshān

火灾 huǒzāi

烈火 lièhuǒ

恼火 nǎohuǒ

怒火 nùhuǒ

炮火 pàohuǒ

可能 kěnéng, might

能 néng

能够 nénggòu

力所能及 lìsuǒnéngjí

能干 nénggàn

能力 nénglì

能源 néngyuán

才能 cáinéng, then to be able

功能 gōngnéng

技能 jìnéng

能 néng

能 néng

能歌善舞 nénggēshànwǔ

能量 néngliàng

性能 xìngnéng

本能 běnnéng

节能 jiénéng

能手 néngshǒu

太阳能 tàiyángnéng

无能为力 wúnéngwéilì

原子能 yuánzǐnéng

职能 zhínéng

只能 zhǐnéng

智能 zhìnéng, intelligent

Link to comment
Share on other sites

I'm mostly looking for chunks of increasing difficulty. Most common/beginner words first and then rarer words. Similar to what You would find in a regular textbook. However I couldn't find a list that displays the HSK vocab in such manner, but I do have to admit, that I do not have a HSK study book. How is the vocab sorted in those HSK books? Are there any HSK lists that split the HSK vocab further down, than those well known 4 levels?

@roddy,

I've thought about this style, but I think this is a very specific use case. It definitely makes sense, but I'd like to add the most desired use cases first.

@renzhe,

Good idea, that's basically what I want to do, but I can only do that with the vocab I'm currently working on by myself, which excludes level 3 and 4. This is going to be a commercial(*) product, so I can't wait until I get to level 3 and 4 by myself.

Last resort would be to use some simple list of how common each word is, but I'd rather like to match it with HSK-books and courses/schools that are based on the HSK vocab.

@chrix

Additional groups of words sorted by topic are planned for later, but I want to get the mainstream uses done first.

>>The words are ordered thematically.. Is there something similar available in English?

I know of such kind of book for German-English. It's damn well organized and researched. Haven't found any for Chinese though.

(*) A serious discount for Chinese-Forums users helping with their comments is definitely possible.

Link to comment
Share on other sites

There might not be such lists available, but as you're a programmer it shouldn't be too difficult to take frequency data such as that found here and write a small script or program to sort the HSK level data by frequency. Although frequency is not necessarily a measure for difficulty it will arange them in order of most common to least common.

Link to comment
Share on other sites

Personally, rather than breaking them down into chunks, keeping a fixed sized "learning list" seems much more useful to me. And as you learn a word it leaves the list and the next one gets placed on the list. Declan's flashcard programs does this.

I'm mostly looking for chunks of increasing difficulty. Most common/beginner words first and then rarer words.

Difficulty is not the same as frequency of use. Which do you mean?

There might not be such lists available, but as you're a programmer it shouldn't be too difficult to take frequency data such as that found here

Those lists seem to be by character (or character pairs), not words.

Link to comment
Share on other sites

I think there were also some lists that included frequency data for words, but I might be mixing things up.

@Erbse: sure there's a lot of high quality "Grundwortschatz" publications on the German market if it's about European languages. But as far as Asian languages go, there is really not that much stuff in German, it's quite disappointing. Even if you compare it with stuff written in English, I was amazed how much more high quality stuff there is in Japanese for learning other Asian languages. Especially for Indonesian, where there is not that much available in any Western language, whereas the Japanese market has a lot. For Mandarin it's slightly better for English language stuff, but I still think you can find more stuff in Japan. That's why I was wondering if there is any kind of thematically grouped basic vocabulary book available for English-speaking learners of Mandarin.

The only major exception are reference grammars which are usually written in English rather than in Japanese (for Chinese, Li/Thompson or Pulleyblank come to mind) but that comes with the territory, what with English being the language of international linguistics.

Link to comment
Share on other sites

What might work is grouping them by characters contained

I'd be a bit cautious of this, when I've done something similar then I've ended up confusing a whole bunch of similar meaning words, rather than knowing them well individually, consequently using the wrong ones all the time.

Link to comment
Share on other sites

@imron,

thanks for the link. I didn't know about those Bigram frequency lists until now.

@jbradfor

yes, that sounds interesting, yet it creates a similar problem. If I mark one word as done, which is the next word to enter my list? I still have to arrange them in some way beyond the lv1 to 4 thing.

Difficulty is not the same as frequency of use. Which do you mean?

Best: Sort the vocab similar the average HSK learner would expect them to be sorted.

Worst: By frequency.

@chrix

You're right. For many language combinations thematically grouped basic vocabulary doesn't exist at the moment. An opportunity to start a business?

@realmayo

Lists sorted in such kind of way wouldn't be the first thing on the to do list, so don't worry :)

Link to comment
Share on other sites

Those lists seem to be by character (or character pairs), not words.

There are frequency lists for character pairs and I believe that there are some for character triples.

Most of the words in the HSK are two-character words anyway, so getting the frequency for the character pair would be a good measure. You simply ignore all the character pairs that are not in the HSK vocabulary. Leave the chengyu for last.

There are also vocabulary decks for all popular Chinese textbooks -- Integrated Chinese, (New) Practical Chinese Reader, etc. You could screen those as well.

Maybe calculate a score. You have the lesson in which the word appears in different textbook (lower score = easier), you have the frequency (higher = more important) and you have the HSK level. It must be possible to calculate a ranking based on these numbers.

Personally, I just sat down and learned them while improvising along the way :)

Link to comment
Share on other sites

yes, that sounds interesting, yet it creates a similar problem. If I mark one word as done, which is the next word to enter my list? I still have to arrange them in some way beyond the lv1 to 4 thing.

Why do you need to arrange them? Why not just pick one?

Best: Sort the vocab similar the average HSK learner would expect them to be sorted.

Worst: By frequency.

Well, I think you're seeing that there is no such expectation....

Link to comment
Share on other sites

yes, that sounds interesting, yet it creates a similar problem. If I mark one word as done, which is the next word to enter my list? I still have to arrange them in some way beyond the lv1 to 4 thing.

Why do you need to arrange them? Why not just pick one?

Because the beginner of Chinese expects not any random word of level 1 to begin with, he wants to have words like 我 你 好 是... in the very beginning. This is also true for the ongoing learning process. The user expects a certain ordering, especially when I want to advertise the program as as "useful to accompany any HSK course".

Link to comment
Share on other sites

I think I'd go for something similar to Roddy's suggestion: You could first rank the characters by the number of times they appear in the vocabulary list (so it's character frequency rather than word frequency). Then you can add the characters one by one, starting from the most common one in the list, and see which words you can create with these characters.

For example:

Word 1: Char A + Char B

Word 2: Char A + Char C

Word 3: Char B + Char C

Word 4: Char A + Char D

Word 5: Char C + Char D

in which A is the most common character in the HSK list (and B+D doesn't exist as a word).

It's true that this will result in chunks that contain lots of synonimes and similar words (which are hard to learn all at once). Maybe you could include a rule that characters with the same definition can't appear in the same chunk of words, but that may be difficult if the definitions are not in the right format (synonimes may not have precisely the same definition in the HSK list). Or for example a rule that the same character appears as often as possible in the same chunk, but with a set maximum (so you get some synomimes, but not too many).

I usually learn words in chunks of 15 or 20, but maybe others prefer a different number.

Link to comment
Share on other sites

  • 2 months later...

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...