Jump to content
Learn Chinese in China

  • Why you should look around

    Since 2003, Chinese-forums.com has been helping people learn Chinese faster and get to China sooner. Our members can recommend beginner textbooks, help you out with obscure classical vocabulary, and tell you where to get the best street food in Xi'an. And we're friendly about it too. 

    Have a look at what's going on, or search for something specific. We hope you'll join us. 

Introducing Chinese Text Analyser

Recommended Posts

On 6/26/2019 at 12:17 AM, murrayjames said:

what is the best indicator of the difficulty of a text in CTA, if you've never uploaded a list of your Known Words?


On 6/26/2019 at 2:35 AM, imron said:

The number of unique words is one potential indicator of difficulty, but I'd also look at the number of words it takes to get to 98% comprehension of the text and see how big a proportion of total words that is, and I'd also look at what percentage comprehension you get if you learnt every word that appeared more than once.


This is what I do. In addition, I look at the bottom few words on that list (the list of words I'd have to learn to get to x% comprehension - I use 95%) and how often they appear in the text. If they don't appear at least 3 or 4 times in the text, I consider the text to hard. If I only see a word that I learned once in text, I probably am not going to remember it, and so it's not as worth learning. Basically because of what Imron says:


On 6/26/2019 at 5:57 PM, imron said:

If many of those unique words only appeared once or twice in total, but when combined made up a significant percentage of total words, then that would affect difficulty, because it would mean lots of words you need to put in work to learn, but that don't really lead to increased comprehension for the rest of the text.




22 hours ago, drungood said:

Shouldn't it be possible to segment a txt file with a superior but slower segmentation library, save the segmented version, and have CTA use that?

It is possible, and this is what I do. The reason is, as Imron said, CTA's native segmentation is perfectly good for comparing texts and finding my next text to read, but less good for segmentation on a sentence by sentence level, which is what I need to create cloze deletion flashcards of unknown words with their corresponding sentences.


I use the Stanford Word Segmenter described in this post. It segments the words by spaces, so it's perfectly compatible with CTA. After I export the cloze sentences, I use excel to remove the spaces.


Share this post

Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.


I've just released a new version of CTA.


This is more a maintenance release rather than a big new feature release.  The two main thing it adds are:


1. Fixing a crash bug in macOS if the document contained characters that didn't exist in the current font, and

2. High DPI support for windows (in both single and multi-monitor setups).


It also adds a bunch of minor bug fixes, plus minor new features such as drag-and-drop for opening files, and keyboard navigation - both with arrow keys and vi `hjkl` keys.  With keyboard navigation, you can also press `d` to show the definition of a word.


The full release notes are here.

  • Like 1

Share this post

Link to post
Share on other sites


is there a way to export a work with chinese definition?
I tried to find a way but without success



Share this post

Link to post
Share on other sites

Not easily, but it's something on my list of things to do.


By not easily, I mean currently you'd need to have a CEDICT formatted file containing Chinese definitions instead of English ones.  If you are able to create such a file, then it making CTA use it is quite easy.

Share this post

Link to post
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...