Jump to content
Chinese-Forums
  • Sign Up

Memorizing vocabulary at the advanced level


songlei

Recommended Posts

I was thinking about this problem not too long ago. I have the impression from reading an article here and there that Chinese writing actually has a greater proportion of "in the middle" words that aren't common but also aren't rare enough to ignore. (See the Zipf-Mandelbrot Law.) Basically it's easy enough to learn enough words to obtain coverage of 80+% of texts, but beyond that the number of words you have to add on to gain another percentage point of coverage starts going up faster (and all those unknown words really get in the way of your reading comprehension). I think for English the curve might not be as steep.

Anyway, I think at a certain point it's more effective to just read like mad and let unfamiliar words just soak in through sheer repeated exposure. If you have problems with any in particular, you can put them into an SRS for reinforcement. Don't bother memorising or SRSing everything you come across; just let the most commonly-occurring ones soak into your memory via osmosis. I think it's more time-efficient to do that.

I also think knowledge of characters comes into play - if you know 1 character in a 2-character word, you can guess the meaning of the word (since the characters are often synonyms). I often check the meanings of unfamiliar characters for this purpose - when I see them again in an unfamiliar word, I can make a good guess at the meaning.

Hope the advanced learners and native speakers can chime in on this issue!

  • Like 4
Link to comment
Share on other sites

Illuminating reply there, kdavid. Sometimes I try to move words and expressions from my passive to my active store, but it rarely works. The ability to correctly and elegantly deploy rarer words and fancier expressions in writing generally impresses me :P ...while in speech it might strike me as bombastic.

Anyway, for normal functioning in society I certainly agree we can stick to learning to recognise words rather than recall them. Sometimes I browse the dictionary and end up adding words to my SRS that I don't even know if I'll ever see in writing, and regret spending brainspace/time on them.

Link to comment
Share on other sites

Very good question songlei, one I have been wondering about. I reckon there's about 1000 words in the HSK vocab list I have yet to learn, though there are plenty outside the list that I know. I use SRS every day, and until recently I was looking up words I didn't recognise and adding them to the SRS. But now that I've started really reading lots (mainly magazines) I'm getting way too many new words to add to the SRS: sometimes these are words whose meaning I can guess the first time I see them (given the context, and knowing the two characters that make up the word).

What I've ended up doing is work out that at the moment, learning 30 new words a day is a reasonable quota for me. So, the first 30 I come across (excluding some which I think are too rare to bother learn at this stage) I add to the SRS, the rest I don't learn and, once I've got my quota for the day, I might not even bother look up words when reading certain articles (ie if I'm practicing improving my reading speed).

The alternative, of earnestly underlining and then looking up and noting down and adding to the SRS ... was just overloading me, and I was spending much longer on vocab than on reading, which can't be right.

And I have found that, while SRS is good in its way, if there's a not-so-common word that I learned months and months ago and added to the SRS, and then see for the first time in a magazine, and remember it correctly, then it's at this stage that the word really seems burned into my brain, and I'm confident I'll remember it for a good long while. Which just makes it seem more obvious to me that I should never let the time I spend learning vocab get in the way of time spent reading.

But I'd definitely be very interested in hearing from others at a similar stage, and those who are some way beyond. (I should add, I'm talking here more about 'passive' than 'active' vocab.)

  • Like 3
Link to comment
Share on other sites

I'm also at a similar stage. I can read novels, although slowly, and know about 90% of the words in a chapter, when I count them. At this point, the unknown words are of such low frequency that I'm not enthusiastic about learning them. But what I have done just recently is to focus on characters more closely. I am using Skritter to finally learn to write the top 3000 characters. This focuses more attention to the components of the characters, and actually increases the speed of recognizing them for reading. Knowing words is preferable to knowing the component characters, but after knowing 8,000 words or so, studying the next 5,000 low frequency words may not help reading ability as much as learning 1,000 unknown characters which can at least give you a chance of guessing the meaning. Someday, I'll do the number-crunching to see if this is actually true.

This is really just a short-term fix. In the long run, there's no avoiding the need to learn more words to improve fluency. This is the same challenge in learning any language.

creamyhorror, the Zipf plots are similar for both English and Chinese. There is a slight difference at the lower frequency end, but with this log-log scale, that means word 10,000 is 5 occurrences per million versus 10.

post-12715-084280300 1290021596_thumb.png

Link to comment
Share on other sites

how nice to see so many interesting reactions already. it seems as if this is a commonly perceived problem. c_redman, i don't understand much of that graph you attached, but do i understand correctly that you're trying to measure the distribution of the frequency of certain words in languages? according to which standards are they ranked? i too have a hunch that at a certain point, an increased focus on individual characters becomes more feasible in the light of the war against numbers. in other words, if you have to choose between learning an extra say 30,000 words vs 5,000 more individual characters so that these new words become understandable in their context without the need for a dictionary, which option is more efficient? to me, it seems that the second approach is a more intelligent process, and therefore more promising in terms of building up a flexible understanding of how chinese is written and how meanings are composed, whereas the first is faster but more tedious. of course, there is no way of measuring this, but perhaps there are those who have experience with either two methods?

Link to comment
Share on other sites

Ah, I see it now. If you knew the top 5,000 English words (lemmas are a better comparison to Chinese, but the list I have only goes to 6,300), you would know 89.5% of the BNC, but you would need to know 9,010 of the top Chinese words to know the same amount!

To know an extra 1% of the respective corpus (90.5%) by knowing the next most common words, you would need to know 809 additional English words, but 957 additional Chinese words. The amounts gained are similar because the frequency for English words around rank 5000 are about the same as Chinese words around 9000.

This is all simplification, since nobody knows just the top N words and nothing else; it's a probability distribution, so that there's a chance of knowing even rare words. Also, the ability to "know" words by context or by guessing from the component characters is a factor. In addition, one can gain a specialized vocabulary in particular domains, so that you can be quite good at reading newspapers but struggle with novels.

edit: songlei, the word count of every word in the corpus is the basis for the rank. So, 的 is rank 1 with frequency 61,778 per 1 million words, 了 is #2 with 15,447 per million, etc.

Edited by c_redman
  • Like 1
Link to comment
Share on other sites

I really enjoy the direction that this discussion is going in. I've been pondering similar thoughts for the last year. I guess the simple but difficult answer is too read as much as one can, and the that's the strategy I've had the most success in following. I find that it's not too difficult to pick up an approximate meaning for a word through context, but it's the lack of a pronunciation for a character that's the most jarring. So in terms of dictionary and SRS use I find it more fruitful to focus more on pinyin readings than on definitions since definitions for the most part can be easier learned through reading. A further strategy based on this which I've contemplated is something like the Heisig book two and just learn the readings for say the 3,000 to the 5,000 most infrequent characters.

Link to comment
Share on other sites

The talk of percentages and corpuses and so on, I just have a feeling that while this might possibly have some use towards a quicker and more efficient learning of beginner-vocabulary, it gets a lot less useful as the percentages get smaller. You've also got to bear in mind that it can be much easier learning a brand new, rare, character which only appears in one word (say, 隼 = falcon) than remembering something like 同为 where both characters are very common and appear in lots of other words you already know but the meaning is not necessarily immediately intuitive, and therefore easily (at least in my case) forgotten.

Imron, I'm happy that my current way of doing things is similar to what has worked well for someone else, it means I can relax a bit about not learning all the new words I come across. Though if you've written about the quota idea before in the forums then I've almost certainly read it so perhaps that's why.... I've got a couple of months of just studying Chinese which is why I've ramped up to 30 a day.

One other small thing involves 成语: I hadn't really bothered with them until recently, but now I've started to learn some of the more common ones, I'm surprised about how often they crop up. Sure, sometimes it's obvious that a 4-character combo you don't understand is a 成语, but often they just seem to me like two sets of adjacent 2-character words which I don't really get ... until I realise it's a single piece of vocab I need to learn and recognise in the future.

Link to comment
Share on other sites

What I've most recently been doing is reading a book and recording every single new word and Chengyu, even if I was able to easily guess. Then I ask a native speaker if there are any words on the list that are simply too rare or obscure that I shouldn't even bother right now (This gets rid of about 15% of the unknown words)

I then put everything into anki and have it add 40 words a day.

I also ask my native speaker if any of the words are especially important, then I can flag them in Anki so I know to find some example sentences and learn how to (hopefully!) actively use them.

This seems to work pretty well. We'll see in a few months.

Link to comment
Share on other sites

I envy people that can learn that many new words in a day. I've always failed miserably when setting myself high rates, and often started out strong but then before long burnt out and wouldn't study much of anything for like two months :blink:

I'd certainly be interested in hearing back in a few months about the results, both in terms of being able to maintain that rate, and how effectively you feel you have learnt all those words.

I've posted previously about my belief that more is not always better. I wonder how that other poster is going now.

Link to comment
Share on other sites

A previous poster mentioned active and passive vocabulary. I think actually vocabulary can be divided into several layers. There's the everyday vocabulary that you can use without even thinking about it. Then there's a level below this of words which you can recall fairly quickly and use naturally, and below this, a level of words that you know, but have to struggle for a few seconds before you can recall them. I guess these all these belong to active vocabulary. Passive vocabulary to a certain extent overlaps with active vocabulary, in the sense that the bottom layer of active vocabulary merges with the top layer of passive vocabulary. Then in the lower layers of passive vocabulary, you have those words that you can't recall but understand when exposed to, and at the bottom, the words that don't mean much when you see or hear them, but you have an inkling that you've learnt them some time a few years ago.

My problem is that my passive vocabulary is quite large, and bottom layer of my active vocabulary is also fairly broad. But I'm finding it quite difficult to move vocabulary into the top layers of spontaneous recollection. Therefore, whilst my reading is quite good, and writing also not bad, my spoken Chinese is nowhere near the level I would like it to be. After having lived in China for over 4 years, I would have hoped to be able to speak fairly fluently on a deep level. On some days I find my speaking is not bad, but on other days, I still get stumped by even the most basic phrases when I need to say them out spontaneously. I'm not really sure how to overcome this, except just to keep on practicing.

As for reading novels, that is undoubtedly a good way of broadening vocabulary, but the precondition is that you enjoy reading. Personally, I'm not a big fan of reading novels, even in English, and therefore reading novels in Chinese is a bit of a chore for me, and not just because of the language.

  • Like 2
Link to comment
Share on other sites

Well, I am essentially a full-time student (I just work 15 hours a week), and I live in China, so I don't think 40 words a day is really that many.

Unfortunately, to be honest, this learning is largely passive. In the past few months I have not spent as much time as I should having Chinese conversations, though I've spent lots of time reading and SRSing.

I've increased by reading level somewhat quickly but still have a lot of problems activating my vocabulary when writing or speaking. I assume this has more to do with neglecting the exercise of active skills, and less to do with how many words I am studying daily.

I would expect this problem is not uncommon. I'm sure the simple/not-so-simple answer is to just practice more.

EDIT: Looks like anonymoose beat me to it, and what he wrote is probably more useful :)

Edited by valikor
Link to comment
Share on other sites

The talk of percentages and corpuses and so on, I just have a feeling that while this might possibly have some use towards a quicker and more efficient learning of beginner-vocabulary, it gets a lot less useful as the percentages get smaller.

Really, the point behind my bringing up the frequency distribution for Chinese words was just a linguistic question. I was wondering why Chinese seems to require speakers to know significantly more words than English speakers to achieve the same level of corpus coverage. On the face of it, this indicates that Chinese vocabulary is just plain onerous to learn :unsure: But it could also be the result of numerous cross-links between words with overlapping characters that make word learning easier (to a greater extent than English word families simplify English learning).

c_redman's statistic that 89.5% coverage is achieved with 5,000 words in English versus 9,010 words in Chinese is surprising, if accurate. The gap's even bigger than I expected. And I expect it widens as you move further up the % coverage scale. If the stat is based on a fair comparison (lemmas to lemmas), then if you get to an advanced level in Chinese you may end up knowing even more words in Chinese than in your native language (and still not be more literate in Chinese versus the latter).

Link to comment
Share on other sites

This is a great discussion.

I wanted to address one point that @realmayo made:

The talk of percentages and corpuses and so on, I just have a feeling that while this might possibly have some use towards a quicker and more efficient learning of beginner-vocabulary, it gets a lot less useful as the percentages get smaller.

While I agree that frequency data or percentages aren't extremely useful. Corpora can be useful in lots of different ways. For instance, a corpus could tell you whether the word you just read in that book by 老舍 is still commonly used or not. Additionally, a good corpus can provide a large number of examples and collocations for a word, which is extremely helpful when trying to better understand when and how a word is used. I fully expect corpora will be a key component of the next set of advances in language learning.

Unfortunately, there aren't any easy to use corpora for learning languages (let alone Chinese). If you're interested in seeing the different things a corpus can do, I recommend getting trial access on SketchEngine, which has one of the best selection of Chinese corpora. The website Wordnik is also doing a lot of corpus related work for English.

Link to comment
Share on other sites

Volume of new words: I put them into my SRS once I've learned them, and while i'm doing that I copy that data into a spreadsheet ... which informs me that over the last 6 months I I've been averaging 20 a day, though that includes individual characters. In the last 3 months it has risen to 35 a day: at one point I tried doing around 50 but it was way too much and yes, I think took a week off following that meltdown! And Imron, I'm sure your vocab is much more extensive than mine, so you will see recently-learned words less often while reading than I will, given that I'll be including some more common words than you will.

Effectiveness: as I say, I use SRS to help me remember, but I often fail a bundle (maybe as much as a quarter? though that sounds too much...) at about the 1-month mark; however, with SRS if you fail a card once you start with it from new, and I think that the huge majority of these 1-month-failed cards I am then able to relearn well enough so they stick in my brain a great deal more firmly the second time around.

It looks like I'm spending around 90 mins every day on new words: that includes learning, testing (SRS) and admin (ie putting them into the SRS, finding example sentences for them and so on). I won't have time for that next year, which is why I'm trying to make the most of the next couple of months.

90 mins every day does seem a big committment, but it equals 1000 words a month and it is only for a fairly limited time. I think I'll be halving it early in 2011.

I imagine there are plenty of people who would laugh at all this heavy review of vocab, and advise more and more reading instead but I'm fairly happy with recent progress and, as I say, I am reading lots too. Though I do have my own doubts occasionally....

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...