Jump to content
Chinese-Forums
  • Sign Up

Introducing Chinese Text Analyser


imron

Recommended Posts

@Imron: according to CTA, my current known vocabulary is at 11912 words, but I've been analysing both simplified and traditional texts, and if I'm not wrong traditional characters and their simplified counterparts are counted as different words, so the real figure must be lower. With the first novels I analysed, reaching 95% was easy (I chose Yu Hua and San Mao), but for instance, now I'm reading 玉米, by 畢飛宇, and I've had to learn around 200 words to go from 90% before starting it to 95%.

Link to comment
Share on other sites

and if I'm not wrong traditional characters and their simplified counterparts are counted as different words

That's correct, because not all of them will be easily guessable if you know the other.

 

and I've had to learn around 200 words to go from 90% before starting it to 95%.

Thanks.  It's nice to hear some concrete figures.  At some point I'd like to add a 'tell me which words I need to learn to get up to X% of this text' feature.

Link to comment
Share on other sites

  • 2 months later...

Yes, I'm still not happy with that phrasing (feel free to suggest something else) but that's what it means.

 

At the moment the filter dropdown is a bit confusing because you only have one word list (not including <All> which is not a wordlist per se, but rather just all the words in the document).

 

CTA can actually support multiple word lists though, there's just not a GUI for it yet, and the idea is that when exporting words from a document, you might want to see only words that are in HSK4, or only words that are in HSK5 and that's what the filter allows.  Likewise, excluding [Known] words, gives you all unknown words.

Link to comment
Share on other sites

More or less feature complete with the windows version, but still needs some small tweaks to the GUI and to sort out the free trial.

I also need to make some website changes before the official release to handle downloads of different versions, and also add some documentation.

I hope to get it all finished up over the Chinese New Year, but if you already have a license I can send you a OSX version that will be more or less the same as the first official OSX release.

  • Like 2
Link to comment
Share on other sites

 

I hope to get it all finished up over the Chinese New Year, but if you already have a license I can send you a OSX version that will be more or less the same as the first official OSX release.

 

No rush; glad the release is coming so soon.

Link to comment
Share on other sites

CTA can actually support multiple word lists though, there's just not a GUI for it yet

 

This would actually be useful, for instance, following from the other topic, having a second list of high frequency/HSK/priority/whatever unknown words you particularly want to learn. Then you paste in text from say parts of a novel that you plan to read over the next few days. Any words on the second list are then revealed and can be learned, in the sentence where they appear, and a little later will be encountered for real when reading. So you get the benefit of targeted learning-off-a-list, but only in a 'real-life' context!

Link to comment
Share on other sites

Just a quick heads up to say that the code for the initial release for OSX is now complete.

All that remains before an official release is some website changes to allow for downloading different versions, and (finally) writing some documentation.

I hope to finish all that in the next 2 weeks.

When that all goes live, I also plan to increase the cost from AUD$10 to AUD$15 so if you've been waiting for the OSX version and want to get a licence at the cheaper price now is the time to do it.

Existing licences for the windows version will work just fine on OSX, and if anyone wants a pre-release copy of the OSX version, let me know and I'll provide you with a download link.

  • Like 3
Link to comment
Share on other sites

It's been a long time coming, but the native OS X version is now available: https://www.chinesetextanalyser.com/download

 

It's not quite feature complete with the Windows version, but it's feature complete enough.

 

There's also a new version available for Windows, which in addition to a number of minor bug fixes also has some new features including:

 

* Windows version can run as a portable application with --portable command line switch (only works with licensed copies, and you'll need to make your own shortcut)

 

* Configurable line spacing - there's no gui to change this yet, but you can edit: c:\users\<your username>\AppData\Local\ChineseTextAnalyser\data\config to adjust)

 

* Looked up words are now shown in a different colour from known and unknown words.  If desired, you can set this colour to black so they look just the same as 'known' words during the current session (internally the program will still treat them as unknown). Colours can be changed in C:\Users\Imron\AppData\Local\ChineseTextAnalyser\colour-schemes\default.colours.

 

* Windows installer no longer requires administrator permissions when installing to a non-protected location
 

  • Like 4
Link to comment
Share on other sites

(UPDATE: Nevermind this post. Imron answered it earlier in the thread: "No need to uninstall previous versions, and installing over the top will keep all your old settings.")

 

Hi Imron, should I uninstall the version I'm using before installing the new one?

Link to comment
Share on other sites

Imron,

 

I installed the new version for Windows!

 

The changes to color and line spacing hamper readability, for me at least. I noticed this with the color changes, especially: having to discriminate between three different colors of text was distracting. The new default for line spacing is 1.5. It looks more like double spacing to me (using 26pt STKaiti font) and reminds me of writing papers for school :wall​ I've set both back to the way they were before.

 

On my Windows 10 computer, the Font command no longer works. Nothing happens when I select it from menu (Format > Font) or by the keyboard shortcut (Ctrl+Shift+F).

 

Also, congratulations on the OS X version :clap

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...