Jump to content
Chinese-forums.com
Learn Chinese in China

  • Why you should look around

    Since 2003, Chinese-forums.com has been helping people learn Chinese faster and get to China sooner. Our members can recommend beginner textbooks, help you out with obscure classical vocabulary, and tell you where to get the best street food in Xi'an. And we're friendly about it too. 

    Have a look at what's going on, or search for something specific. We hope you'll join us. 
imron

Introducing Chinese Text Analyser

Recommended Posts

fabiothebest

I'm about to purchase Chinese Text Analyser. I have a pc with Windows, though I often use Linux and maybe in the future I'll buy a macbook air..did you test and confirm that the program runs also on Linux and Mac OSX using Wine? I'd be interested in a native app as well.
I read that the program allows to  "Export word lists of known or unknown words for use in SRS or other programs". I'd be interested in creating worlists for Pleco. Is it easy to do? From my understanding I should generate a tab separated list containing these fields: "characters{tab}Pinyin pronunciation{tab]definition" . Pinyin and definition are optional, if not specified they are provided by Pleco itself. If I purchase a license I can use it on multiple PCs, right? And can I get unlimited updates or not? I'm going to purchase the program and try it now.

Share this post


Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

imron

Yes it works on wine under Linux though it has some minor graphical glitches (e.g. images for icons don't always display correctly), and I do testing on Linux through wine before each release. I haven't tried wine on OSX, but I have a MacBook Air myself and do all development and testing on a windows virtual machine through VMWare Fusion.

You can test it for yourself before purchase, as it has a free 14 day trial. Licences will work across OSes and versions, and for personal use can be used on any computer where you are the primary user. This works on an 'honour system' and I don't do any intrusive checking to confirm and you can just copy the licence to whatever computer you like - but the licence files contain enough personal information to discourage sharing openly.

Word lists are trivial to export to pleco. CTA exports tab separated files for a range of fields, including all those supported by pleco, and you simply choose which ones you need.

I recommend downloading the trial and checking it out. If you have any questions about usage, post them here and I'll do my best to answer them.

  • Like 4

Share this post


Link to post
Share on other sites
Xiao Kui

I bought CTA in April but bc I'm in grad school just now getting around to using it and enjoying it immensely.  Is there a keyboard shortcut for marking a word as known when it's highlighted in the right hand window unknown words list? Thank you!

Share this post


Link to post
Share on other sites
imron

Not at the moment, but I should probably make that double clickable like the main text view. Will add it to my todo list, and it should be ready for the next version.

  • Like 3

Share this post


Link to post
Share on other sites
imron

Ahh, actually, I've just realised that double-clicking a word in the list searches for it in the text.  Hmm, might need to think of an appropriate shortcut.  Any preferred keystroke?

Share this post


Link to post
Share on other sites
Xiao Kui

On 2nd thought, i guess a keyboard shortcut is not entirely necessary  - it hadn't occurred to me till now that I could use Control or shift to select multiple entries at a time at mark them as known all at once, which is saving me a ton of time. duh! :)

  • Like 1

Share this post


Link to post
Share on other sites
imron

Hopefully as well, as time goes by and Chinese Text Analyser develops a more accurate model of your vocabulary, it will be less and less necessary to do bulk markings.

  • Like 1

Share this post


Link to post
Share on other sites
character

Scanning of just a section of text - this would be useful with large documents. For example, in a current book I just want to scan the first part of chapter 1 (第一篇 贡品 1). Would be great if I had a way to select all text between the markers 第一篇 贡品 1 and 第一篇 贡品 2 and just scan that, without having to copy and paste in a separate document first. Also, if CTA could recognize some common chapter markers, such as those above, and split automatically that would be very convenient.

I was thinking about a more general version of this. For a document, the user could enter a number indicating the average number of lines per page. Then CTA could provide options to export unknown words ordered by their first appearance in a document, grouped by some number of pages, such as:

// pages 1-10

...

// pages 11-20

...

Share this post


Link to post
Share on other sites
imron

Internally, everything in Chinese Text Analyser works on byte offsets from the beginning of the file.

 

It internally calculates a page size in bytes equal to the total number of bytes on the last visible page (so when you drag the thumb on the scrollbar to the end, it fits perfectly on the last page).

 

It probably wouldn't be too difficult to export based on this, however while such a page size works great for scrolling the UI, it might not be ideal when exporting text, especially if the last page doesn't have much on it (e.g. one or two characters per line) causing the page size to be a low number of bytes relative to other pages of text.

 

I could also add something to export from the current position in the file for X bytes/pages.

Share this post


Link to post
Share on other sites
character

^ Sounds like pages wouldn't necessarily be a good metric, then. Could you export unknown words ordered by their first appearance in a document, grouped by some number of words, such as:

// words 1-15

...

// words 16-30

...

Either way, the idea is to give the user chunks of the vocabulary they need to learn in the order they need to learn it, instead of a long, undifferentiated list of unknown words. Perhaps have an option to put low-frequency words into a separate group at the end of the list.

Share this post


Link to post
Share on other sites
imron
Could you export unknown words ordered by their first appearance in a document, grouped by some number of words, such as:

Is your meaning that you'd just like to have markers inserted in to the exported file?  Otherwise you can sort of do this already, just set:

 

'Word List': Unknown

'Sort By': First Occurrence (Ascending),

First: N words ordered by 'First Occurrence (Ascending)'

 

Where N is the number of words you want per group.

 

Then just make sure to mark exported words as known.

 

When you open the export dialog again, then because the previous N words are now 'known', the next group of N words will be from the next part of the document with unknown words.

 

Perhaps have an option to put low-frequency words into a separate group at the end of the list.

Perhaps an option to ignore words below a certain frequency?

Share this post


Link to post
Share on other sites
character

Is your meaning that you'd just like to have markers inserted in to the exported file?

The idea was to have it break up the list into separate categories for Pleco, but I guess one could use Pleco's Splitting function on the entire list instead.

Share this post


Link to post
Share on other sites
imron

That should be relatively easy to add.  Will put it on the todo list.

Share this post


Link to post
Share on other sites
DanielW
Imron generously gave me a license even though I am rather new to this forum several weeks ago. This review is coming from a beginner-intermediate/lower-intermediate level and I hope it will be of use to some learners. 

It is, as advertised, really fast! I loaded several novels in a fraction of a seconds. I do hope that better recognition of names can be done, though, because I do not want to mark character component of names as known if I don't know them well enough. Together with Pleco flashcards, this app has helped me improve my Chinese very quickly. Thank you Imron!

Share this post


Link to post
Share on other sites
hedwards

FYI, Benny Lewis is recommending your software in his fi3mplus premium package. And I can't say I disagree with him there. While there's a ton of resources out there for pay, I think this one is more than worth the cost.

Share this post


Link to post
Share on other sites
imron
I do hope that better recognition of names can be done

Better name recognition is on my list of improvements for the segmenter, but currently segmenter improvements are low down on the priority list while I get the rest of the application in place.  I'll look to see if I can come up with an interim solution - maybe explicitly marking something as a name.

 

Benny Lewis is recommending your software in his fi3mplus premium package

I'm glad to hear he likes it and thinks it's worth recommending. Do you have a link to anything specific? A quick google search doesn't turn up anything mentioning Chinese Text Analyser.

 

I think this one is more than worth the cost.

I agree :mrgreen:

Share this post


Link to post
Share on other sites
hedwards

I can't provide a link. Well I can, but it's behind his paywall. I suppose while that's a compliment, but probably not as much exposure as if it were on his regular site or in the non-premium portion of the site.  I wouldn't expect you to see much difference in the number of sales. The premium members are more serious about language learning, but only a subset are going to be interested in Chinese.

Share this post


Link to post
Share on other sites
imron

Ok, no problem.  I might drop him a line separately.

Share this post


Link to post
Share on other sites
imron

Version 0.99.4 is now released.  New features include:

A 'recent files' menu item
Remembering the position in the file for recently opened files
Search history
Improved wordlist management that allows for revision history of wordlists to be stored.  This will be expanded upon in future releases with support for multiple wordlists and the ability to restore previous versions of a wordlist.

One annoying bug I've also just spotted is that if you install while the application is running, then it won't install the new executables (even though it will say it is running the newer version, it will still be the older executable).  Therefore if you're upgrading, make sure to exit Chinese Text Analyser completely before installing (I'll be addressing this problem properly in the next release).

  • Like 1

Share this post


Link to post
Share on other sites
imron

Version 0.99.5 is now out, and fixes the install problem.

  • Like 1

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...