Jump to content
Chinese-forums.com
Learn Chinese in China

  • Why you should look around

    Since 2003, Chinese-forums.com has been helping people learn Chinese faster and get to China sooner. Our members can recommend beginner textbooks, help you out with obscure classical vocabulary, and tell you where to get the best street food in Xi'an. And we're friendly about it too. 

    Have a look at what's going on, or search for something specific. We hope you'll join us. 
imron

Introducing Chinese Text Analyser

Recommended Posts

Yadang
On 6/26/2019 at 12:17 AM, murrayjames said:

what is the best indicator of the difficulty of a text in CTA, if you've never uploaded a list of your Known Words?

 

On 6/26/2019 at 2:35 AM, imron said:

The number of unique words is one potential indicator of difficulty, but I'd also look at the number of words it takes to get to 98% comprehension of the text and see how big a proportion of total words that is, and I'd also look at what percentage comprehension you get if you learnt every word that appeared more than once.

 

This is what I do. In addition, I look at the bottom few words on that list (the list of words I'd have to learn to get to x% comprehension - I use 95%) and how often they appear in the text. If they don't appear at least 3 or 4 times in the text, I consider the text to hard. If I only see a word that I learned once in text, I probably am not going to remember it, and so it's not as worth learning. Basically because of what Imron says:

 

On 6/26/2019 at 5:57 PM, imron said:

If many of those unique words only appeared once or twice in total, but when combined made up a significant percentage of total words, then that would affect difficulty, because it would mean lots of words you need to put in work to learn, but that don't really lead to increased comprehension for the rest of the text.

 

 

 

22 hours ago, drungood said:

Shouldn't it be possible to segment a txt file with a superior but slower segmentation library, save the segmented version, and have CTA use that?

It is possible, and this is what I do. The reason is, as Imron said, CTA's native segmentation is perfectly good for comparing texts and finding my next text to read, but less good for segmentation on a sentence by sentence level, which is what I need to create cloze deletion flashcards of unknown words with their corresponding sentences.

 

I use the Stanford Word Segmenter described in this post. It segments the words by spaces, so it's perfectly compatible with CTA. After I export the cloze sentences, I use excel to remove the spaces.

 

Share this post


Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...