Jump to content
Chinese-Forums
  • Sign Up

锵锵三人行 'corpus'!


realmayo

Recommended Posts

There's also a non-trivial amount of error introduced by CTA's segmenter (which is far from perfect), but even with that and the other errors in the transcripts, I think this sort of project still provides some very useful ballpark figures.

Link to comment
Share on other sites

Interesting to think that even at 98% you're going to have a steady sense of 'wait, what...'. Plus the fact that even the stuff you know well is coming fast and furious and requires all your attention. Oh, and did we mention this week we have heavy accents from Gaoxiong and Fujian... going to go watch cartoons, I think...

Link to comment
Share on other sites

Take for example the corpus you put together.  338,000 characters, so at the relatively slow reading speed of 150 cpm, it would take you about 40 hours to read through the entire transcript.  An hour a day, would see you get through the above wordlist in a little over a month.

 

After a year of doing that, the numbers will be a bit higher for quite a few of those single item frequencies.  Maybe you'd see them 2-3 times in the entire year - which if you translate that to SRS terms is a revision every 4-6 months, which is probably not so bad if you spend some time learning the word properly the first time.

 

The thing with this is that it assumes that you'll look up every word you don't know. That will then heavily impact the cpm rate!

 

However thinking about it I'm not so pessimistic. Yes, there appears to be diminishing returns after certain point. But once you've got to that point you'll have a good understanding of what different characters can mean in different contexts, and in the different ways two-character words can get put together.

 

This means it'll be much easier to either (a) guess the meaning correctly or (b) once you've looked it up, realise that it 'makes sense' and therefore remember it much more easily next time.

 

Of course there will still be words which don't 'make sense' but lots will. I read a definition somewhere of classical Chinese as something that is actually easy to understand, but only once someone's told you what it means. I think this is true of lots of vocabulary.

 

So again (!) this makes me think I should do more work on awkward but commonish words that I have forgotten or get mixed up, because much of the brand-new vocabulary that I'm going to come across ought to be much easier than those to learn and remember.

  • Like 2
Link to comment
Share on other sites

There may be outright errors in the transcripts, but more often I've found that they are too 'correct', i.e. too literal, and if you read someone's spoken words out of context, half-finished or interrupted or referring to something he was saying the last time he opened his mouth, it's often incomprehensible even to a native speaker.

  • Like 1
Link to comment
Share on other sites

Uh, this might be a little off topic so apologies for that, but I was wondering if there was a way to download qqsrx episodes to my computer to watch at a later date.  In fact, I'd like to do this with different TV shows but most websites just allow you to stream it seems.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...