Jump to content
Chinese-Forums
  • Sign Up

Reading Milestones


黄有光

Recommended Posts

On 9/9/2021 at 11:37 PM, 黄有光 said:

I expect I'll be reading 三体 by this time next year.

 

I don't want to disrupt your studying process, since you might be holding it out as a spur for improvement, which is invaluable...

 

But I suspect you can easily read 三体 now, if you wanted to.  It's not as hard as it might seem.  I read it as my 6th book, right after I got the "hang" of reading. 

 

I had originally planned to stop after a few chapters, and revisit it when I got better, but then I suddenly realized that I didn't need to quit.

 

It has a lot of vocab and a lot of proper nouns, but the structure of the sentences are  straightforward.  He's an engineer by training, and other than using a lot of "灿烂" as adjectives, he's not that flowery.  Also, much of the setting is in urban environments with people living mostly in western-style lives & jobs.  

 

Since you've already read 2/3rds of it in English, and know the basic framework of the story, you'll have an easy time transitioning into it. 

 

If you do try it, Publius has already culled the proper names and vocab words of the first chapter in this thread (not the first chapter with the Cultural Revolution, but the first chapter with the police investigation).  I found it quite helpful when I started.

 

https://www.chinese-forums.com/forums/topic/58233-the-first-chapter-project-刘慈欣《三体》

  • Helpful 1
Link to comment
Share on other sites

On 9/9/2021 at 6:46 AM, phills said:

秘书长 - @realmayo calls it the political equivalent of 圈子圈套.  I am curious to read how the Chinese themselves describe the machinations of their politics.

If you do start this then let me know and I'll see if I can dig out my 'cast list' that I wrote to help me remember who is who and what all the job titles mean.

  • Like 1
Link to comment
Share on other sites

On 9/10/2021 at 3:06 AM, Woodford said:

Do I see obscure words with the 花 character at the end? It's a kind of flower, usually.

Oh god, flowers are the worst. And The Chronicles of Narnia is really bad about this sort of thing -- the author goes into detail about exact flowers, trees and animals all the time. Just yesterday, I was staring at Pleco trying to figure out what the hell a tapir is supposed to be.

 

Don't do this to me, C. S. Lewis. Be a bro and just....just say "a flower". I don't need to know the exact species ?

 

(Yeah, I skip over them a lot, too)

 

One of the frustrating things, though, is that oftentimes there will be a word in English whose exact meaning is unknown to me, but which I have seen in so many contexts throughout my life that I do have some implicit knowledge. For example, I don't really know what an elm tree looks like, but I know they used to be common im Europe before a fungal plague started wiping them all out, and now they are slowly going extinct. Problems quickly arise when these kinds of words show up in my Chinese novels. Normally I would connect the word directly with imagery and concepts in my mind, but with words like "elm", if I try to memorize them, I'm stuck awkwardly linking the Chinese word to the English word...and that's just bad vocab acquisition hygiene.

Link to comment
Share on other sites

On 9/10/2021 at 10:43 AM, 黄有光 said:

I'm stuck awkwardly linking the Chinese word to the English word...and that's just bad vocab acquisition hygiene.

 

And this is where a program like Anki shines where you can use pictures rather than English translations. Perhaps this, like a lot of other stuff, will be introduced in Pleco 4.0.

  • Like 1
Link to comment
Share on other sites

On 9/10/2021 at 8:11 AM, phills said:

But I suspect you can easily read 三体 now, if you wanted to

You're right, I probably could. If I only looked up the words that were most important to reading comprehension, I could go ahead and start reading it now. I'd most likely be able to muddle through the story and generally understand most of it, depending on how much I was willing to interrupt the flow of reading.

 

I...don't like doing that though.

 

My current study involves using Chinese Text Analyser and Anki in concert to memorize the essentially all of the vocabulary in a book on a chapter-by-chapter basis (barring some words which I may triage and skip), and then I read each chapter as I finish with it. That way, I get the benefit of super-intensive reading (learning lots of vocabulary) while also getting the pleasure of extensive reading (I am able to read each chapter smoothly). I started studying this way at the beginning of the year because after studying Chinese for years I was really, really tired of not being able to fluently read essentially anything at all. Just for once, I wanted to read a text and understand everything, even the details. And my study method allows me to do that. It keeps me motivated. And, even better, it has resulted in an explosion in my passive vocabulary. My passive vocabulary has gone from ~5000 words at the beginning of the year to nearly 13000 words now.

 

Currently, 三体 has about 3000 words, which is doable, but at my current study rate of 30 words per day it would take me 100 days to get through it. That's not horrible, but there are other books I can get through much faster right now. So I am going to work through those faster, easier books first, and get to 三体 in due time. Generally, I consider a book "within range" once the number of 生词 has dropped below 1500.

 

 

On 9/10/2021 at 10:49 AM, Insectosaurus said:

And this is where a program like Anki shines where you can use pictures rather than English translations

I do do that, sometimes! It's just, adding pictures is very, very slow and...all of that effort for a word I don't really fully understand in English anyway?  Eh. I usually just skip it.

Link to comment
Share on other sites

On 9/10/2021 at 10:54 AM, 黄有光 said:

all of that effort for a word I don't really fully understand in English anyway? 

 

There is a big chance the plant or animal you don't know in English is a common reference in other parts of the world. If you encounter "lingon" in a Swedish book, and brush it aside when you see the translation (what the hell is a lingonberry?), you're making a big mistake. You would struggle finding a Swede not knowing what precisely a lingonberry is, what it tastes like and what it symbolizes. I'm guessing you didn't know what a kui was either before studing Chinese. This all obviously depends on how deep vocabulary you want in the language you're studying. I think everything between "being able to order a taxi" to "knowing as many words as humanly possible" are valid stands.

 

Personally I haven't studies vocabulary per se in some time now, but only morphemes. Words seems to be quite a pointless way of looking at things (I'm pretty sure CTA considers stuff like 知道了 and 知道 as separate words, but I think few of us would say that means you know two words. I know this is no fault of imron (it's just the way CDICT or whatever it's called works, and it doesn't stop CTA from being a very useful application). From my own experience, C-E dictionaries are effective, and I prefer them a lot of times (especially the New Century one), but the positives from using a monolingual dictionary is that you internalize the actual morphemes and how they work as parts of a word, and in that way learn many words by just learning a few. To take a beginner-friendly example: adding 爱, 爱情, 情绪, 心情, 心绪 etc. is a huge waste of time that could be spent elsewhere.

 

I've even stopped studying new morphemes though, and know mostly try to catch up on my listening ability, which is lagging behind. The morphemes I don't know by know are either very rare or won't get nailed into my memory until I learn how to write the character and encounter it several times in my listening. To this day I'm still mixing up things like 捋 (luo1) and 捋 (lv3), even after studying them as flashcards (in context) several times.

 

I don't know how many morphemes I know by know, but it's almost all of the morphemes hiding behind around 4100 characters, which I'm guessing could be anything between 5000 and 8000. Perhaps even more but that would probably be stretching it, but it's hard to tell since I haven't been counting.

 

The times I don't understand a sentence in a book it's seldom because I'm lacking vocabulary, but rather screwed up the parsing or failed to understand a grammatical concept or reorganize a cultural reference (a common occurrence in Taipei People).

Link to comment
Share on other sites

One of the reasons I don't triage words in the same way you do (even though it would save me a lot of time) is because quite often, I will encounter a word whose meaning would seem obvious from the component morphemes, but upon consulting a dictionary I find that its meaning is greater than the sum of its parts. Or worse, in some cases it means something completely different from what the morphemes would suggest. However, that being said, if I look up a word in the process of making flashcards and find that its meaning really is easily intuited based on its morphemes, I will usually skip over it. But I have a pretty high bar for this -- I have to guess pretty much the exact meaning of the word before I see the dictionary entry. 

Link to comment
Share on other sites

On 9/10/2021 at 5:23 PM, Insectosaurus said:

Personally I haven't studies vocabulary per se in some time now, but only morphemes.

 

Interesting.  What's a morpheme?  I haven't really used C-C dictionaries yet (I'm not really good enough in Chinese to do it yet).

 

On 9/10/2021 at 4:43 PM, 黄有光 said:

Don't do this to me, C. S. Lewis.

 

I hear ya.  That's when I stopped trying to learn every character/word in every novel.  CS Lewis animals are British animals/plants, not Swedish ligonberries, and I don't need to know their name in Chinese!  I still love those books though, and they're short while telling a complete story, so the plusses outweigh the minuses.

 

But you guys have inspired me to take my vocab gathering a bit more seriously.  I had in my mind 10-15k words is really all a person really needs, if you don't aspire to be an "intellectual" in a foreign language.  More than HSK6 but not that much more.  Now I will  set my heights a bit higher.

 

 

Link to comment
Share on other sites

On 9/10/2021 at 11:38 AM, 黄有光 said:

but upon consulting a dictionary I find that its meaning is greater than the sum of its parts

 

If you can provide an example it would be interesting to discuss. From personal experience, there are a lot of such words, which can be separated into different categories.

 

1. Come to think about it, they actually *do* make sense. Example: 好容易 just being a variant of 好不容易, which makes total sense. If you could provide a few examples it would be easier to answer.

 

2. You think they don't make sense because you don't know the relevant morphemes. Example: 台甫 (used when asking someone for their name). Here we find two morphemes: 台 (敬詞, 用於稱對方或跟對方有關的事物、動作) + 甫 (古代對男子的美稱。多加在表字之後, 如孔丘表字的全稱是仲尼甫。後來尊稱別人的表字為“台甫”。) Or simply: 台 (you, your) 甫 (courtesy name). The Chinese definitions come from the Guifan (GF in Pleco), a dictionary I don't generally use for definitions but one that I have available on my PC where I am sitting write now.

 

Other examples of likely trip-ups: 姑 (for the time being),

 

3. They don't make sense, because they aren't a combination of two morphemes, but should rather be seen as separate morphemes. (Examples: 屎壳郎, 哈士蟆, 蚂蚱, 油葫芦, 扑棱, 蔻丹).

 

4. Perhaps they do make sense, but you still don't really understand how, because your level is not high enough to "get it". In these cases I just created a flashcard for it, since what makes a morpheme and not is just a guideline for effectiveness, nothing else. There are no set rules you have to follow.

 

On 9/10/2021 at 11:45 AM, phills said:

Interesting.  What's a morpheme?  I haven't really used C-C dictionaries yet (I'm not really good enough in Chinese to do it yet).

 

In most posts I've mentioned morphemes on this community I've tried to point out that they're not really morphemes per se, but it's rather my way of looking at it. The reasoning comes from other languages (if you read a basic Latin course your teacher most likely is going to mention morphemes). Take the word wildlife. No point adding wildlife to your flashcards, just make sure you understand it's a combination of the morphemes wild + life and what that combination actually means.

 

I stopped using CDICT ages ago. Lately my main dictionaries for consultation is Xiandai Hanyu Cidian (C-C), Xinhua Zidian (C-C) and New Century (C-E). I generally try to stay away from dictionaries that won't provide any context. I never rushed my transition to monolingual dictionaries and I still think it's very situational which of them I prefer. For animals and plants I always add pictures.

 

On 9/10/2021 at 11:45 AM, phills said:

Now I will  set my heights a bit higher.

 

@Woodford would argue higher. I would argue lower. So far I've always made sure I know all morphemes of a novel before I read it and nowadays it just takes a few days to to that (which is the reason I've basically stopped and instead prioritized other parts of the language).

 

Please not that there is no linguistic studies behind my reasoning, I'm just trying to be practical and it's been working well for me personally. Weather or not we should call it morphemes, language families or something else is up for discussion and not very relevant to the actual method, which in its core is arbitrary. You might want to look up Paul Nation who speaks a lot of word families when studying English (I think he mentions around 9000 word families to read actual novels without effort).

  • Like 1
Link to comment
Share on other sites

On 9/10/2021 at 11:45 AM, phills said:

I had in my mind 10-15k words is really all a person really needs, if you don't aspire to be an "intellectual" in a foreign language.  More than HSK6 but not that much more.  Now I will  set my heights a bit higher.

You may have seen me mention this elsewhere, but if you did not, I calculated that a passive vocabulary in the range of 40,000-50,000 words is necessary to understand every word or nearly every word in an average novel aimed at adults. Here's how I arrived at that number:

 

First, I used the data that I've collected so far in this spreadsheet to determine the average number of unknown words for several different books, and tracked how those numbers have changed over time for several months. (I am still collecting this information -- wheeeeee!) Here is the chart:

 

482025781_.thumb.png.6129818bfd20f6759af337aaed9227cb.png

 

Next, with the help of my fiance, I did some fancy averaging math and extrapolated those curves, so I could see how long it would take all of the books in that list to fall to <1 生词 per page:

 

1252175225_.thumb.png.975674c65f1352f7634249a2a70abe9f.png

So the answer to that question is "by 2024" -- most books should contain fewer than one unknown word per page by 2024, for me personally.

 

Next it is a simple matter to take my current daily intake of vocabulary (30 words) and multiply that over the remaining time from now until 2024. There are 112 days left this year, plus 3 years until the end of 2024, so that's 30(112+3[365]) = 36,210 words. Then we just need to add the vocabulary I already have: 36,210+12,800= 49,010 words.

  • Like 2
Link to comment
Share on other sites

On 9/10/2021 at 5:23 AM, 黄有光 said:

First, I used the data that I've collected so far in this spreadsheet to determine the average number of unknown words for several different books, and tracked how those numbers have changed over time for several months.

 

I notice there are 20 books listed on this chart. You may have said something about this earlier, and I may have forgotten, but are these books the source of your vocabulary acquisition? Does this chart imply that 20 books will get you to that "less than 1 word per average page" level? If so, that sounds about right. I'm trying to reach 20 books completed (and then some) by this time in 2022. 8-10 books was enough to get me from about 6-8 words per page to 1-2 words per page (as my chart illustrated). Because of the exponential math at play here, it will probably take me another 10 books just to get that tiny improvement I'm looking for of 1-1.5 less words per page.

 

On 9/10/2021 at 5:23 AM, 黄有光 said:

Then we just need to add the vocabulary I already have: 36,210+12,800= 49,010 words.

 

I think this will be easier than you think! CTA tells me I know more than 30K words, and I've only actively memorized 18K of them (just over half). It's like @Insectosaurus said above--CTA counts things like 知道 and 知道了 as two words. In my experience, the number of words I have to learn in a given book can be as little as half of the CTA figure. On my current trajectory (and using my personal method of counting), I figure that the "golden number" of flashcards for me is 20,000-25,000, which, given what I just stated above, matches perfectly with your 49,010 number (it's about half). The last book I read was about American history, and CTA was giving me a very, very huge number of unknown words, because it counted "George Washington," "Abraham Lincoln," "Gettysburg," "Thomas Jefferson," "Richmond, Virginia," etc., as unknown vocab (which is technically correct, I admit). Well, after a while, you tend to get used to identifying phonetic spellings. I didn't add any of those to my flashcards. Books with a lot of names and places in them tend to register high on the CTA count, in general. Granted, sometimes those names will have new/obscure characters that I want to learn, but often not.

 

  • Like 1
Link to comment
Share on other sites

On 9/10/2021 at 2:30 PM, realmayo said:

If you do start this then let me know and I'll see if I can dig out my 'cast list' that I wrote to help me remember who is who and what all the job titles mean.

 

Hit me up @realmayo  I'm going to read 秘书长 next, and it's always nice to preview a list of names / proper nouns to help me process the text.  Many thanks!

Link to comment
Share on other sites

On 9/10/2021 at 5:16 PM, Woodford said:

I notice there are 20 books listed on this chart. You may have said something about this earlier, and I may have forgotten, but are these books the source of your vocabulary acquisition? Does this chart imply that 20 books will get you to that "less than 1 word per average page" level?

Yes, these books are the source of my vocab acquisition. However, they aren't the only books on my reading list by far. They just happen to be the ones that I am currently tracking. Here is my reading list in full -- it includes quite a lot of books (in excess of 50, I believe). I have no idea how many books I will have to read to reach the ~40.000-50.000 word boundary I've calculated. Absolutely no clue. But I expect it will be quite a lot. I think, the more books I read, and the larger my vocabulary gets, the fewer new words I'll learn from each new book. Eventually I'll have to read 5x the amount of material just to maintain the same rate of vocabulary acquisition. Which is something I'm kind of looking forward to, because right now I'm learning at a rate of maybe one short chapter per day -- not nearly enough to be able to get lost in any page-turners.

Link to comment
Share on other sites

On 9/10/2021 at 4:43 PM, 黄有光 said:

Oh god, flowers are the worst.

Lol, totally!

I remember the nightmare I've been through while reading Watership Down. The heroes are rabbits. Herbivores that they are, they're obsessed with plants and birds. There were so many unknown words that I began to wonder whether I was reading a novel or Encyclopedia Botanica! At one point I was thinking of making an Anki deck with pictures pulled from Google Image. I never thought I would need a dictionary to read English novels. Well, Nabokov's maybe, but it was a frigging children's book. Very depressing.

  • Like 3
Link to comment
Share on other sites

  • 3 weeks later...

After looking at my progress and applying some mathematical nerdiness, I think I've roughly answered my own questions posed on this thread. I.e., what kind of effort will be required to reach certain vocabulary milestones? Well, of course, it follows an exponential curve. As you learn more vocabulary, your benefit decreases. 

 

Using my standard of word counting (the number of words I encounter in books whose meaning I can't guess, and I thus save them as flashcards), I've roughly worked out the following:

 

I am here assuming a book = 300 pages (I know that's arbitrary). I started out knowing the 5,000 HSK flashcards, so that's where I begin, and the rest of the sequence follows.


image.thumb.png.5931c56b102f9af08afad756f090d9e8.png

 

Of course, this isn't an exact science, but it's a general sketch that indicates that I'm on the verge of the 20,000 mark (January/February 2022), and if I want to reach my original goal of an average of 1 word for every 3 pages, that will take me somewhere on the order of a couple years (20+ additional books). After that, the plateau effect utterly dominates. 89 books would take me many years to complete, and all for a very, very tiny improvement. 

  • Like 2
Link to comment
Share on other sites

Quote

an average of 1 word for every 3 pages

 

I am curious:  What would be the equivalent number for your own native language?  Wouldn't it be roughly similar?

 

I read more than 200 books a year in English and for many of them I encounter new words all the time!  And just about every day when reading the New York Times or Washington Post, I encounter a new word.

 

I am just wondering whether you're setting an unrealistic and unnecessary goal.

 

Besides, isn't there some pleasure in encountering a new word?  For me there is.  As long as it's not every other line!

  • Like 2
Link to comment
Share on other sites

On 9/30/2021 at 4:58 PM, Moshen said:

I am curious:  What would be the equivalent number for your own native language?  Wouldn't it be roughly similar?

 

Yes, you're likely right! And in my native language, I don't exhaust myself over learning every individual word. Because Chinese is my second language, and I'm determined to "learn" it, I'm probably pushing myself in ways that are really a bit excessive. So the time is coming soon when I have to turn my focus elsewhere and just relax. I mean, it would defeat the pleasure of reading if I'm only reading in order to reach a goal of X number of books. "377 down, 582 more to go!" Ha ha, that would be bad.

Link to comment
Share on other sites

On 10/1/2021 at 6:07 AM, Woodford said:

I mean, it would defeat the pleasure of reading if I'm only reading in order to reach a goal of X number of books. "377 down, 582 more to go!" Ha ha, that would be bad.

 

I fight against that urge too, although for me, the goal is speed.  Speeding up seems to be (ha!) a slow grind, and according to various posts on here and my experience, it'll take on the order of 10 million+ characters (50-60 books) to get to a reasonable speed (250cps+).  It ends up about the same as your milestones. 

 

So anyway you slice it, I think you have to get through about the same amount of stuff (enough "data") to reach fluency.  

As a form of a mental self-defense though, I like to keep my ultimate goal nebulous.  My next goal is 200 chars per second @ 5 million chars, or ~25-30 books.  That seems doable, and I already have enough interesting books in my pipeline to fill it (although I'll swap stuff out for even more interesting books :)).

  • Like 2
Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...