Jump to content
Chinese-forums.com
Learn Chinese in China

wibr

Graded Watching - TV series ranked by difficulty

Recommended Posts

Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

Lu Jo dido

Waw it's great! It'll be very helpful for all of us, for sure. Thanks a lot!!!! 👍😀

Share this post


Link to post
Share on other sites
Weyland

Might be worth it to add genres.

I haven't watched any of these series, fore-mostly because most of these are from Taiwan, but also because all the Chinese series are (ancient) period focused. 

Creating a distinction between current day and period series is, I would say, important. As period series include a lot of dated words you'd never hear nowadays.

Share this post


Link to post
Share on other sites
Jan Finster
12 hours ago, wibr said:

a ranking based on the number of words, to find TV series at your level

 

Thank you for this project!

 

Just as a feedback: I ran the vocabulary list of "Decoded" in CTA:

5950 words

2292 unique characters

but, apparently 68% (~4000) of the words are non-HSK words. 😲

 

Having a lot of non-HSK words is not all that surprising, but I wonder how your lists really helps identifying the difficulty level. Is it just number of total words? 

Share this post


Link to post
Share on other sites
wibr

@Weyland

Genre would be nice to have, but I don't think it's worth the effort. Usually you can guess it by the name or check the linked wikipedia page. You can sort the table by any column, while currently the majority of the shows are from Taiwan I counted more than ten shows from China which are playing in modern times with normal language.

 

@Jan Finster

Keep in mind that the list does not include the basic 1000 words which are used by more than 90% of all shows and are available in a separate list. The HSK coverage for those should be higher, although also not 100%. HSK doesn't contain words like 閉嘴. I think in general HSK vocabulary is more oriented towards written language.

 

I would say the number of words you need to know is a good indicator for difficulty? The differences can be pretty large, especially if you look at the number of words per hour in the first four hours.

  • Like 1

Share this post


Link to post
Share on other sites
PerpetualChange

You're a god in my eyes now! 

 

Sorry to see how easy "On Children" was - I had an incredibly easy time with that one and chalked it up to the study paying off 😂

 

Same with A Sun, which I watched earlier this week 🤣

  • Like 1

Share this post


Link to post
Share on other sites
Jan Finster
On 2/9/2020 at 9:38 PM, wibr said:

If you have soft subs for more shows I'd be happy to include them.

 

How did you do your statistics? Did you manually download all subs and then use CTA?

Is there a way to automate this for the whole show (not just for single episodes)?

Share this post


Link to post
Share on other sites
wibr

I manually downloaded the subtitles, the rest is automated using my own python scripts and several python libraries for word segmentation. If you want to analyze a whole season in CTA I guess you could merge all the subtitles files into one and load it in CTA.

Share this post


Link to post
Share on other sites
imron
4 hours ago, Jan Finster said:

Is there a way to automate this for the whole show (not just for single episodes)?

Merging all subtitles into a single file is one way.  The other way would be to write a Lua script to process all files in a single directory (or similar).

Share this post


Link to post
Share on other sites
Jan Finster
22 minutes ago, wibr said:

I manually downloaded the subtitles,

 

I thought so... That is a lot of work! 😞 It thought of doing it for 都挺好, but is has 45 episodes...😨 

 

22 minutes ago, wibr said:

I guess you could merge all the subtitles files into one and load it in CTA.

 

18 minutes ago, imron said:

Merging all subtitles into a single file is one way.

 

This is the obvious solution for us mere mortals, who have no idea what phython or Lua is...😉

 

Share this post


Link to post
Share on other sites
wibr

Where is 都挺好 available with Chinese subs? For shows on youtube and many other websites (not Netflix) you can use youtube-dl to download a whole playlist, including subtitles. However, downloading 45 episodes shouldn't take long, even if you do it manually.

Share this post


Link to post
Share on other sites
Jan Finster
32 minutes ago, wibr said:

Where is 都挺好 available with Chinese subs?

https://www.youtube.com/watch?v=YtzqsA-a8MM&list=PLQqbdnAgoRmYhfPJgYB9YQxDsNQ-ErQBd

 

Thanks for your tip, but I do not really understand how to run those github scripts.... 

But, you are right, it should not take too long. I could use downsub.com. I will do it at some point and send you the merged files.

Share this post


Link to post
Share on other sites
wibr

In the linked youtube video I only see the option for English soft subs?

Share this post


Link to post
Share on other sites
Jan Finster
1 hour ago, wibr said:

In the linked youtube video I only see the option for English soft subs?

Yes, when you watch there are only English soft subs, but using Donwsub.com or Lingq you get the Chinese subtitles too.

I will send it soon.

 

Share this post


Link to post
Share on other sites
imron

Are these official subs or youtube 'speech to text' subs?

Share this post


Link to post
Share on other sites
Jan Finster
10 minutes ago, imron said:

Are these official subs or youtube 'speech to text' subs?

I do not know. All I can say is that the subtitles do not always follow the speech word by word. Sometimes they use a different way to express what is being said. I guess this is not a feature of TTS (?)

  • Like 1

Share this post


Link to post
Share on other sites
Jan Finster

OK here it is: 

All is Well 都挺好

46 Episodes

Available on Youtube: https://www.youtube.com/watch?v=YtzqsA-a8MM&list=PLQqbdnAgoRmYhfPJgYB9YQxDsNQ-ErQBd

Topic: family relationships, family conflicts

 

My CTA data:

Total words: 76839

Unique words: 5213

56.97% of unique words are non-HSK

23.76% of all words are non-HSK

2022 unique characters.

 

 

I think this show is fantastic for learning the following vocabulary: every day life, family relationship, food, some business language!

I am going to binge watch the last 4 episodes today and then I am done :) 

AllisWell-complete1-46.txt

  • Like 2
  • Helpful 1

Share this post


Link to post
Share on other sites
Lu Jo dido
On 2/22/2020 at 9:54 AM, Jan Finster said:

Yes, when you watch there are only English soft subs, but using Donwsub.com or Lingq you get the Chinese subtitles too.

I will send it soon.

I didn't know that website, downsub.com, it's interesting to know new things; it seems like it downloads subtitles from different video platforms, and it allows to auto-translate, but is that translation good? Any experience with that?

 

Thanks!

Share this post


Link to post
Share on other sites
Jan Finster
13 hours ago, Lu Jo dido said:

I didn't know that website, downsub.com, it's interesting to know new things; it seems like it downloads subtitles from different video platforms, and it allows to auto-translate, but is that translation good? Any experience with that?

 

 All I can say is that the subtitles do not always follow the speech word by word. Sometimes they use a different way to express what is being said. I guess this is not a feature of TTS (?) They are the same subtitles that can be extracted by Lingq.com and I am quite sure Lingq.com does not extract auto-translates. I have not come across any non-sensical text in the subs of that show either. As much as I can assess the correctness of the grammar at my stage, it sounded OK.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...