On 6/26/2019 at 12:17 AM, murrayjames said:

what is the best indicator of the difficulty of a text in CTA, if you've never uploaded a list of your Known Words?


On 6/26/2019 at 2:35 AM, imron said:

The number of unique words is one potential indicator of difficulty, but I'd also look at the number of words it takes to get to 98% comprehension of the text and see how big a proportion of total words that is, and I'd also look at what percentage comprehension you get if you learnt every word that appeared more than once.


This is what I do. In addition, I look at the bottom few words on that list (the list of words I'd have to learn to get to x% comprehension - I use 95%) and how often they appear in the text. If they don't appear at least 3 or 4 times in the text, I consider the text to hard. If I only see a word that I learned once in text, I probably am not going to remember it, and so it's not as worth learning. Basically because of what Imron says:


On 6/26/2019 at 5:57 PM, imron said:

If many of those unique words only appeared once or twice in total, but when combined made up a significant percentage of total words, then that would affect difficulty, because it would mean lots of words you need to put in work to learn, but that don't really lead to increased comprehension for the rest of the text.




22 hours ago, drungood said:

Shouldn't it be possible to segment a txt file with a superior but slower segmentation library, save the segmented version, and have CTA use that?

It is possible, and this is what I do. The reason is, as Imron said, CTA's native segmentation is perfectly good for comparing texts and finding my next text to read, but less good for segmentation on a sentence by sentence level, which is what I need to create cloze deletion flashcards of unknown words with their corresponding sentences.


I use the Stanford Word Segmenter described in this post. It segments the words by spaces, so it's perfectly compatible with CTA. After I export the cloze sentences, I use excel to remove the spaces.


I've just released a new version of CTA.


This is more a maintenance release rather than a big new feature release.  The two main thing it adds are:


1. Fixing a crash bug in macOS if the document contained characters that didn't exist in the current font, and

2. High DPI support for windows (in both single and multi-monitor setups).


It also adds a bunch of minor bug fixes, plus minor new features such as drag-and-drop for opening files, and keyboard navigation - both with arrow keys and vi `hjkl` keys.  With keyboard navigation, you can also press `d` to show the definition of a word.


The full release notes are here.

is there a way to export a work with chinese definition?
I tried to find a way but without success



Not easily, but it's something on my list of things to do.


By not easily, I mean currently you'd need to have a CEDICT formatted file containing Chinese definitions instead of English ones.  If you are able to create such a file, then it making CTA use it is quite easy.

