Introducing Chinese Text Analyser

May 6, 2014 at 11:17 PM

In keeping with Chinese Forums' giveaway month, Roddy has kindly offered to sponsor 10 licences for Chinese Text Analyser. To get one, download the program and from the main menu, select: Help -> Send Feedback and send me a message stating why you'd like a licence.

Also include your name, email and physical address. I never use your physical address, I just add it to the licence to discourage people from sharing them :-)

(Note: consideration will only be given to requests sent through the Send Feedback dialog of the program).

Also mention your Chinese Forums username, because as with the book giveaway, you'd be expected to write a review, so if more than 10 people ask, consideration will be given to how long you've been a member, the posts you've made and so on.

Update: These are now all gone.

May 8, 2014 at 12:45 AM

Imron was kind enough to give me a license in exchange for a review.

I wasn't sure what to expect, or how useful it would be to me, but I dusted off my ancient Tablet PC(!) with Windows Vista(!!) and installed it.

I'm impressed. The program runs quickly, even on a machine on which everything else is pretty slow. It's very handy to be able to quickly go from a bunch of text to the list of words you need to learn to understand that text. You can then export the list of those words (and the list of words you know in the text, if you wish).

I understand the program is still in development; I noticed a few rough edges:

- At least on my ancient PC, the display of the Chinese characters was pretty rough in the program, a bit rougher than shown here. It was OK to use the program to segment text, but as it stands I could not use the program to read a document. I tried a couple different fonts and increasing the font size.

- There needs to be a way to manually change segmentation. Hard to emphasize enough how important a feature this is.

- I would really like to see a way to export to Pleco flashcard format.

So in summary, I think right now it's a very fast, useful utility. Give it a try; it won't take long to see what it can do.

May 8, 2014 at 02:45 AM

I'm impressed. The program runs quickly, even on a machine on which everything else is pretty slow.

High performance has been one of the core design goals from the very beginning, so it's good to hear it holds up even on older machines.

I would really like to see a way to export to Pleco flashcard format.

It exports to a tab separated file which is supported by Pleco. See here. Just make sure you export the fields that pleco needs.

the display of the Chinese characters was pretty rough in the program,

Can you please post a screen shot so I can see what you mean? Can you also maybe find an example (e.g. using another program) of what you consider to be acceptable so I can try and figure out a way to bring it up to that level.

There needs to be a way to manually change segmentation. Hard to emphasize enough how important a feature this is.

In the future I plan to support a range of different segementers as well as manual segmentation. Probably not until the 1.0.0 release though.

May 8, 2014 at 10:06 AM

Can you please post a screen shot so I can see what you mean? Can you also maybe find an example (e.g. using another program) of what you consider to be acceptable so I can try and figure out a way to bring it up to that level.

I've tried to attach examples. I realize Word is probably a high standard to meet, but perhaps something can be done.

It exports to a tab separated file which is supported by Pleco. See here. Just make sure you export the fields that pleco needs.

Sorry I missed that. You might want to consider adding named export options with preset fields.

May 8, 2014 at 10:36 AM

I realize Word is probably a high standard to meet, but perhaps something can be done.

I have quite high standards :-) So I'll try to see what I can figure out, because the text looks really bad there - it doesn't look anywhere near that bad on my machine. What fonts are you using in both Word and in Chinese Text Analyser?

You might want to consider adding named export options with preset fields.

Yes, I plan to do something like that eventually, it's just a matter of trying to figure out how to fit it in nicely with the dialog.

May 8, 2014 at 01:25 PM

It sounds like it might be a perfect tool for me! I'm a self learner who loves to learn by reading, but since I'm still at the beginning of my Chinese adventure, it's really difficult to find texts to read at an appropriate level (other than in textbooks). I'm definitely going to download it and check it out and maybe get a free license if they're still available

May 8, 2014 at 02:35 PM

I'm a little conflicted.

I wish I'd had the tool 10-15 years ago. Then again, there wasn't that much Chinese text available online then, nor were computers powerful enough to have the performance required.

Still, I'm at the point (whether skill-wise or free-time-wise) that every hour spent using study tools is an hour I'm not reading novels or watching television serials. I have about 10 novels and at least 30 television serials stacked up (including 3-4 I want to re-watch) and waiting, so I really need to prioritize my time.

On the other hand, I always want to improve my Chinese, and I have discovered that my current tendency to recognize characters/pronunciation by context rather than actual composition is both lazy and dangerous (potentially embarrassing, at least). Not to mention, there are now at least a dozen characters that I have learned the meaning of via reading that I have no idea what the pronunciation is (I often read away from any electronic device, resulting in it being inconvenient to look up characters merely for pronunciation), and I expect that to grow as I continue reading.

So it is quite possible this tool is perfect for me.

What other tools are required to maximize the usefulness of your Text Analyser? Do I need Pleco? Anki? Skritter?

I assume that these other tools aren't required, but I want to maximize the utility and success of using it.

Or to put it another way: what is the best suite of applications to use with the Text Analyser?

What is the cost for full versions of the best combination of supplemental applications?

May 8, 2014 at 03:21 PM

Hi. I'm an upper intermediate level self study student. I'm a heritage learner, so my spoken chinese is much better than my reading. I think this tool would be a great asset to helping me acquire better reading skills. I anticipate using it with Pleco.

I downloaded your trial and requested participation in your license giveaway as a technologically illiterate guinea pig! I'm trying to get started, but I'm actually confused on the first step of importing lists of known words. This is technically a Pleco question, not a question directed towards your text analyzer, imron, but do you know how to only export a list of words from Pleco that are "known"? In Pleco I define "known" as having a score above a certain value.

Like I said, I'm not computer savvy, but I am willing to write a review from the stupid user's point of view, lol.

May 8, 2014 at 10:01 PM

What fonts are you using in both Word and in Chinese Text Analyser?

Good news: both were using SimSum, but CTA was 12 point, while Word was 10 point. When I changed CTA to 10 point, the characters looked fine. So the issue is much less severe than it seemed last night, when I tried different fonts and increasing the font size in CTA.

May 8, 2014 at 10:31 PM

Never mind my previous post. I figured out how to export the known cards from pleco. I ended up making a new category, and performing a search with a score restriction (>1000) and putting them all in one category. Then I exported just that category as a word list. And I successfully imported my known words into the text analyzer!

May 8, 2014 at 10:48 PM

Still, I'm at the point (whether skill-wise or free-time-wise) that every hour spent using study tools is an hour I'm not reading novels or watching television serials.

Chinese Text Analyser is still useful even if you only intend to read paper books and watch TV series. The reason being that you don't actually need to use Chinese Text Analyser to read the texts. CTA is used to identify unknown words in texts, so simply find the electronic version of the book, or subtitles of the TV series, load it in, find the top X number of words that you don't know sorted by frequency and first occurrence, and then prelearn those words so that you already know them when you come across them in the book. Repeat as often as necessary - say with the top 10 unknown words daily. This works especially well if you have an electronic version split in to chapters as you can just load a chapter at a time and prioritise that vocabulary. You could end up spending less than a minute a day in CTA if all you did was open the current chapter and export the next 10 unknown words (though obviously you'd then still need to spend time to learn those words).

What other tools are required to maximize the usefulness of your Text Analyser? Do I need Pleco? Anki? Skritter?

I assume that these other tools aren't required, but I want to maximize the utility and success of using it.

Chinese Text Analyser helps you identify words in a piece of text that you don't know. It doesn't provide any way to study those words separately. As such, to get the most amount of benefit from it you should have some way of studying the lists of words that you export from the program.

This can be Anki or Pleco, or it can be an Excel spreadsheet, or you can print out the lists and manually look them up in a dictionary. CTA just helps you prioritise lists of words to learn based on context. However you normally learn words, you should then use the exported lists to learn them. You can export a large variety of different fields depending on your needs, including the word itself, pinyin, English definitions, sentences with the words, cloze deletion sentences containing the words and so on.

If you don't have Pleco, I would recommend getting it regardless of whether you purchase CTA or not. At a minimum, you should purchase the flashcard module and because you are at a level where you can read Chinese novels, also the Xiandai Hanyu Guifan Cidian Chinese-Chinese dictionary. Again, this is regardless of whether you purchase CTA or not. I'd also recommend the ABC C-E English dictionary as a backup because it has a larger number of words than Guifan.

From Pleco's site, Flashcards + Guifan will cost USD ~30. ABC will be USD 20 on top of that. It's well worth exploring if there are any other options you might like. Personally, I have the professional bundle + Guifan, which will set you back USD $100. This is more than worth it, especially considering you can upgrade across devices and versions free of cost. I've been using Pleco now for 7 years so currently it's cost me ~$0.04 a day (actually a little bit more, because prices used to be higher - Guifan originally cost $60).

CTA is then AUD $10 on top of that (or free if you get a licence while they are still on offer).

If the price of Pleco still seems a bit steep, then Anki also provides a good way to revise words, and maybe it's something you can download first to see if the workflow works for you. The problem with Anki is that it doesn't have a built in dictionary so flashcard creation has to be done manually - which is time consuming and a pain. CTA will take most of that pain away, because you can simply import in to Anki words exported from CTA, however CTA currently only provides definitions from CEDICT, which is not as comprehensive as the paid dictionaries available in Pleco.

Pleco is also great because it's quick and easy to look up a word, and then (this is the key), trivial to add looked up words to a list for later revision. The problem with this approach is that it's still a pain to break from reading to look up words and so you end up not always looking up words that you should, or maybe you don't bother to add these words to a list for review because you 'already knew the word, but just wanted to check' (i.e. you didn't really know the word but you've pretended to yourself that you did).

CTA helps get around this by allowing you to prioritise and prelearn words in a given piece of text that you intend to read. It also gives you a good indication as to whether or not a piece of text is appropriate for your current level. For example, say you have 10 books to read, but you're not sure which one to read next. The best one to read first is the one that has the lowest number of new words because as you read that, you'll pick up a whole bunch of new vocabulary which will then make the next book easier, and the next book easier, and so on. Assuming you've been using CTA for a while and have a reasonably accurate list of known words, then with CTA, you can load up each book and see which one has the lowest number of unknown words and then read that one.

When I changed CTA to 10 point, the characters looked fine

It seems really strange that decreasing the font size improved the clarity! Still, if it works, then great.

Then I exported just that category as a word list. And I successfully imported my known words into the text analyzer!

Excellent! Glad to hear it works.

May 8, 2014 at 11:54 PM

Thank you for that comprehensive reply.

I'll probably download it soon.

Quote

When I changed CTA to 10 point, the characters looked fine

It seems really strange that decreasing the font size improved the clarity! Still, if it works, then great.

I've actually seen that before with one of the fonts (in word, I think...maybe NJ Star). It comes from the pixel-mapping (or vector line point definitions?) being designed for one size...if it gets too big, the shapes distort, so it can actually look better in a smaller font.

I'm probably not explaining it correctly, but the point is, I have actually seen that happen before with Chinese characters and font sizes.

May 9, 2014 at 12:53 AM

~~I can't get any of the links to your program to work.~~

Okay, I'm in.

May 9, 2014 at 01:49 AM

downloaded, but having a difficult time finding novels I can download.

May 9, 2014 at 01:59 AM

The 圈子圈套 thread has links to that novel.

May 9, 2014 at 02:15 AM

Chinese Text Analyzer review:

I had the pleasure of reviewing Imron's new Chinese Text Analyzer program upon receiving a free license courtesy of Chinese Forums. You can download the program here.

I'm an upper intermediate level self study student. I'm a heritage learner, so my spoken chinese is much better than my reading. I thought this tool would be a great asset to helping me acquire better reading skills. I planned on using it with Pleco, so this review will discuss the integration of both applications.

So I have a desktop running windows and a macbook pro running Mac OSX, and I installed the Chinese Text Analyzer on both. I used Wine to install Chinese Text Analyzer on the Mac, and there were no problems with installation. The windows installation was much easier, as the program runs natively on windows. It was fast, and I had no issues. It takes literally two button clicks to install on windows. Most of this review will be based on my experience with the program in the Windows setting.

So the first step the program recommends is for you to import a list of your known words so the text analyzer would be able to identify known and unknown words appropriately.

I use Pleco for all of my study and flashcards, so it has a complex list of all the words that I've been tested on and know. In Pleco you can define your known words however you like. I define my "known" words as words that have a score of greater than 1000 points.

So the first thing I did was figure out how to export my list of known words from Pleco into the Text Analyzer. In Pleco, I did this by going to Organize Cards, and making a New Category called "Known". I then used the search function in Organize cards to search for score >= 1000, then I batch added all the cards that came up to the Known category.

I then used the Import/Export selection, changed Export cards to "cards in categories" instead of "all cards", and selected my Known category. I exported as a text file in UTF-8, and exported words only (no definitions), as I think Text Analyzer only requires a list of words.

I think the only down side to doing it this way is that you have to do a manual search to add new cards to your Known category and manually export the known card list into Text Analyzer each time if you want to keep your known words updated. I don't know if there is an easier way of exporting your known words out of Pleco, but this way worked for me.

Anyway, I then used the File Manager in Pleco to upload the file via wifi to my computer. (I love this feature of Pleco). The exported file is just a txt file that you can then import into the Chinese Text Analyzer.

When you first install the Chinese Text Analyzer, it has a popup that says "Welcome to Chinese Text Analyzer! Before you begin you should import lists of words that you already know. Chinese Text Analyzer can read files exported by popular flashcard programs such as Pleco and Anki, or you can import words from pre-made lists of HSK vocabulary. Later on you can manually add words while you are reading Chinese content."

In this window, you can either click "Import..." or you can import words using File --> Import...

I imported my list of known words from Pleco, and it imported, but I would have liked to see a success message or something to let me know it worked ok. Rather, it just took me back to a blank screen, and I wasn't sure if anything had happened. I was able to get confirmation by going to Word Lists --> View Known and seeing a list of words there.

I then tried opening some reading practice files. By going to File --> Open, I was able to find my txt files and they open very very quickly. I was even able to shift & select an entire folder's worth of files and open them at once. Even opening 10+ files, the program was very very snappy. If you open multiple files at once, they open in individual tabs in the program which is very nice. I did try to overload the program with a bunch of longest texts I have, and it was still amazingly fast to analyze. However, I did notice that if you have more than about 7 or so tabs open, you will be unable to maneuver to the tabs on the right, since there is no way to access tabs that don't fit on the screen. I don't know how important this is, as people theoretically won't be reading 10 books at once, but I thought I would note this finding.

I am very impressed by how fast the program opens and analyzes the documents. Here are some well known novels that I've tested with their processing time in seconds (I opened all 4 at the same time using shift & select in the file--> open box):

Journey to the West - 0.14 seconds

Red Chamber - 0.17 seconds

The Three Kingdoms - 0.11 seconds

Water Margin - 0.09 seconds

The processing time is taken from the upper left statistics window which I will describe in more depth later. It probably does vary based on your computer specs, and I have to admit my computer is pretty decked out for photo and video processing. But I imagine the program will run pretty fast on all computers, and I think the segmenting a novel in under 1 second claim is definitely true.

The default font that the program uses is ok, but not my favorite. You can go to Format --> Font, and there are a few other font options that you may like more. I'm not sure where the program gets its fonts from - if it is using pre-installed fonts on your computer, or if the program has a set of fonts that comes with it - but I went through the font options that I had, and there are quite a lot of font options that do not display Chinese characters correctly or at all (white boxes). Given that the sole purpose of this program is to display Chinese text, I think it would be really helpful if you curated the available fonts to only those that display Chinese text. I didn't go through all of the options, but in my cursory look I would say that 80-90% of the font options are not suitable for Chinese characters. Again, I'm not sure if this varies based on what fonts you have pre-installed on your computer or not.

I don't know if this is an option, but it might be nice if you could include a more brush script-y type font. I like the FZKaiTi font available as an add on in Pleco.

Now on to the statistics windows on the right side. The top window appears to have statistics for the entire document, including total number of words, total known words, percent known words, number of unique words. I noticed that the headings "Known" and "Percent Known" are used under the "Total" and "Unique" categories, and I recommend you make a clearer division between the "Total" section and the "Unique" section. Otherwise, it might look like the "Known" and "Percent Known" are duplicates, but they have different numbers.

The program also lists some character statistics and File statistics. I'm not really sure how important the File Statistics are, but I guess it doesn't hurt to have them there. I probably would never really look at it though in real usage.

One additional statistic that I think would be good to have is Number of Unknown words in a document. This way you could get an idea of how many words are left to learn for any particular text. I guess you could always calculate this yourself with number Unique minus number Known unique, but it shouldn't be hard to implement the Number of Unknown as well, which may be more helpful than the number of known words.

The bottom right window has statistics broken down for each word. For each word, it lists Frequency, % Frequency, Cumulative % Frequency, and First Occurance. I think the Frequency and % Frequency columns are the most important, especially if you want to prioritize vocabulary studying. You can very easily sort words by frequency.

I'm honestly not sure what "Cumulative % Frequency" means, and I was not able to figure it out.

I'm not sure how helpful the "First Occurrence" column is either. I haven't determined a use for it.

I did notice that if you double click anywhere in the row for a word in the bottom right window, it will automatically take you to the first occurrence of that word and will highlight all other occurrences in pink. I think an additional feature I would like to see would be a set of left right arrows so you can go to the next occurrence of the word of interest fairly easily if you are in a long document, and see each place the word is used in context. I think you can use the Edit ---> Find feature for this as well, but it would be nicely streamlined if a set of left/right arrows popped up when you double clicked a row in the word statistics window, without having a window blocking your text. Or even better, have the left and right keyboard keys move between each instance of the word.

There are three tabs on the bottom of the window to look at All words, Known words only, or Unknown words only. I have no issues with that layout. There is also a search field, which I have not used extensively. I think it only works if you type the characters. Maybe one future feature could be allowing pinyin search as well.

Now my review of the reading experience. I imported a few documents, and there were quite a few words marked in red as unknown that I already knew, perhaps I just never made Pleco cards for them. I found it very annoying to have to right click a known word and mark it as known. I think it would be nice if there was a keyboard shortcut for marking words as known - maybe hitting the spacebar or enter key or something to make this process easier and less intrusive on the reading experience. I just don't like having to right click and select from a text list to mark words as known, it really does take some of the flow away from reading that I think hitting a keyboard key would improve.

I'm not sure if Imron had in mind designing the Chinese Text Analyzer as just a tool to aid in picking which books/texts to read , or as a stand alone reader, or both. But I've been spending some time with it, and I find that it is very helpful in determining how appropriate a text is to your vocabulary level. This is of course assuming you update your known words list periodically which may be kind of a hassle.

However I'm not sure if I will spend most of my time doing dedicated reading on it. I have to admit that I miss having a pop up dictionary feature. I understand that Imron left this out purposefully to discourage bad habits. I'm sure over time I can get used to not having a pop up dictionary and studying the unknown words independently, but as of now I'm finding it hard to give up the crutch. I think especially in cases where not knowing a few key words in a sentence completely prevents you from understanding the meaning of the sentence. I do find it more challenging to read without a pop up dictionary, and there is somewhat of a mental block knowing that you don't have something convenient to fall back on.

One thought that I had for people who may choose to use the Chinese Text Analyzer as a dedicated reader is the fact that there is no Bookmark feature in the program. Especially for longer novel length books, it would be immensely helpful to have a bookmark feature so you don't have to find your place again if you stop reading and close the program. It may also be helpful to have an option to record notes in certain sections of the books or mark up places that you had difficulty reading and may want to go back and re-read after studying the vocabulary in that section.

Now I'll review the Export settings. I have tried the File --> Export --> To File settings. I have never tested the To Email, because I think it works through Microsoft Outlook, and I do not use Outlook.

When you go to File --> Export --> To File, a dialog box opens up with two tabs. The Document tab is first, and is sectioned into Document, Paragraph, and Word sections with "Pre" and "Post" under each section with a text box field. I actually do not know what these options do, as it was entirely unclear in the program. I think there should be a sentence or two of explanation here. I left all the fields blank, and it exported the entire document I had open with no changes. I do not know what the Pre and Post mean and what that tab is meant to do.

The second tab under Export is labeled Word List. This tab seemed much more intuitive. You can export All words, Known words, or Unknown words. I personally think that the default should be set to Unknown instead of All, as I think that is how most people will be using the program. I for one intend to use it to identify unknown words that need further study in Pleco, and I found that I very easily accidentally exported "All" words instead of "Unknown" words since All is currently the default. You can sort by Frequency, First Occurrence, or Word in ascending or descending order. I think the Frequency (Descending) as the default is appropriate for this one. You can select to export All rows, or the Top X number of rows (in case you want to just study the most frequently used 100 words in a novel for example). I think this is a very useful feature.

There are lots of fields available for export: Word, Simplified, Traditional, Simplified[Traditional], Pinyin (Tones), Pinyin (Numbers), English Definition, Sentence, Cloze Sentence, Frequency, % Frequency, Cumulative Frequency, First Occurrence. And you have the option of selecting as many fields as you want to export, so there is a lot of flexibility.

I'm not really sure what the difference between Word and Simplified is, since I exported both fields and they are the same in my test set. Perhaps it depends on what format the original document that the word came from uses. All of my texts were imported in Simplified.

Most of the other fields seem self explanatory. I'm not sure what dictionary is used, but each word has several of the most common definitions separated by "/". My test set seemed to import fine into Microsoft Excel as a Tab delineated file.

One very interesting part of the Chinese Text Analyzer is its ability to export Sentences where your word is found. It seems to be exporting the sentence that has the first occasion of the word. I did notice that when I exported the "Sentences" and "Cloze Sentences" fields, some of the fields exported with the previous sentence's period preceding it. An example:

众人

。许多道[...]等，送到后山，指与路径。

Other than the leading period, it seems to parse the sentences well. Not all of the 100 words I exported in my test set had a leading period, but the majority of them did. It may have something to do with the source document I used, so this may vary for other people, I don't know.

I also tested the export function with both the Sentences field and Cloze Sentences field exported. I am not sure why, but some of the rows imported weird into Excel. As in the Cloze sentence was cut off and put into a second row. I don't know if this has to do with tabs being in the actual text giving it problems or not.

This example was exported with the fields: Word, Simplified, Traditional, English Definition, Sentence, Cloze Sentence. You can see that the Cloze sentence got put on the second row.

走

/to walk/to go/to run/to move (of vehicle)/to visit/to leave/to go away/to die (euph.)/from/through/away (in compound verbs, such as 撤走)/to change (shape, form, meaning)/

楔子　张天师祈禳瘟疫　洪太尉误走妖魔

楔子　张天师祈禳瘟疫　洪太尉误 [...] 妖魔

This is the sentence in context of the actual text. Note that it is not an actual sentence, there are no periods (and no leading period) but it is followed by a return.

水浒传

楔子　张天师祈禳瘟疫　洪太尉误走妖魔

纷纷五代乱离间，一旦云开复见天！草木百年新雨露，车书万里旧江山。寻常巷陌陈罗绮，几处楼台奏管弦。天下太平无事日，莺花无限日高眠。

I think the problem occurs when there is an Enter/return at the end of the sentence that the program picks. I haven't investigated this extensively, but I thought I would let you know there may be a slight bug with exporting of sentence fields. This of course is probably dependent on the quality of the text you are deriving the sentences from, I understand. But I think it might be worth investigating and seeing if these small issues are repeatable and can be fixed before the big release.

Overall I think the program is pretty useful and seems good. These were just some comments that I had while extensively exploring the program for a full day or so. I will try some more extensive reading using the program in the next few weeks and I'll give an update if necessary.

Thanks

Kikosun

May 9, 2014 at 02:21 AM

hmm some of my formatting disappeared when I pasted the above post. I originally had the exported results in tables to demonstrate the export issue I found. I'm not sure how to get my tables back, sorry. Imron, I'll email a copy to you directly so you can better see what I mean.

May 9, 2014 at 04:55 AM

Thanks for the comprehensive review!

It takes literally two button clicks to install on windows

I put in a lot of effort to make it only take two clicks! Nice to see this appreciated.

"I think the only down side to doing it this way is that you have to do a manual search to add new cards to your Known category and manually export the known card list into Text Analyzer each time if you want to keep your known words updated.

Now that you have Chinese Text Analyser set up, going forward, the alternative is to export words from Chinese Text Analyser, and then import them in to Pleco. When you export, there is an option to automatically mark exported words as known (with the expectation that if you don't know them yet, you will after importing them to another program like Pleco) and this way the lists should stay relatively in sync. Once every few weeks or months you could then do an export from Pleco just to catch any you had added outside of CTA.

"I imported my list of known words from Pleco, and it imported, but I would have liked to see a success message or something to let me know it worked ok. Rather, it just took me back to a blank screen, and I wasn't sure if anything had happened."

Added to my list of things to do.

"However, I did notice that if you have more than about 7 or so tabs open, you will be unable to maneuver to the tabs on the right, since there is no way to access tabs that don't fit on the screen."

Improving this is on my list of things to do. In the meantime, you can use Ctrl-Tab and Ctrl-Shift-Tab to cycle between tabs (even those off-screen).

"I think it would be really helpful if you curated the available fonts to only those that display Chinese text."

Windows has a mechanism for doing this, however it can be a little too strict, and sometimes results in no fonts shown. Fixing this up is on my list of things to do, but for the pre 1.0.0 releases I figured better to have too many fonts, than too little. Remembering the last font used across sessions is also on my list of things to do.

"but it might be nice if you could include a more brush script-y type font."

Unfortunately, due to font licensing costs, including a nicer font is probably not really doable at this price point - at least not until there's a much higher volume of users.

"I'm not really sure how important the File Statistics are, but I guess it doesn't hurt to have them there."

Not really important, but possibly interesting for some people.

"I recommend you make a clearer division between the "Total" section and the "Unique" section"

Already on my list of things to do.

"One additional statistic that I think would be good to have is Number of Unknown words in a document."

Added to my list of things todo.

"I'm honestly not sure what "Cumulative % Frequency" means, and I was not able to figure it out."

It's similar to column 4 of the Jun Da frequency lists. It's just the sum of all frequencies for that word and all words more frequent than it.

"I'm not sure how helpful the "First Occurrence" column is either. I haven't determined a use for it."

First occurrence is actually really useful for prioritising words. For example, you can export the top 100 unknown words by frequency, and then sort them by first occurrence. That way you can learn the most frequent words in the order that they will appear in the text you are reading.

"I think an additional feature I would like to see would be a set of left right arrows so you can go to the next occurrence of the word of interest fairly easily if you are in a long document, and see each place the word is used in context."

Double click again, and it will take you to the next instance and so on. If you have used the Edit->Find dialog, then 'F3', 'Ctrl-G' or 'n' will take you to the next occurrence of the word without needing to have the dialog open (no shortcuts for previous words yet, that is on my list of things to do).

"there is no Bookmark feature in the program.

Good idea. Added to my list of things to do.

"It may also be helpful to have an option to record notes in certain sections of the books or mark up places that you had difficulty reading and may want to go back and re-read after studying the vocabulary in that section.

Added as a low-priority to do item.

"Now I'll review the Export settings. I have tried the File --> Export --> To File settings. I have never tested the To Email, because I think it works through Microsoft Outlook, and I do not use Outlook."

The program just opens your default email application as specified by the OS. Actually this feature is not that useful because it only works with small amounts of data, and I may end up removing it all together.

"I do not know what the Pre and Post mean and what that tab is meant to do.

A document has one or more paragraphs, a paragraph has one or more words. Pre and Post allow you to add things before (Pre) or after (Post) each document, paragraph, and word during the export process. So for example, you could set:

document pre: <html><head><title>CTA Export</title><meta charset="UTF-8"><style>.word:hover {color:red;}</style></head><body>

document post: </body></html>

paragraph pre: <p>

paragraph post: </p>

word pre: <span class="word">

word post: </span>

Then you'll get an html file split in to paragraphs and words, with the color changing red when you hover over a word. Or you could just have them all empty, except have a single space for 'word post' and then you'd get a segmented file with words separated by spaces and so on.

"I think it would be nice if there was a keyboard shortcut for marking words as known"

Unfortunately I can't know where your eyes are looking, and I don't want the reading process to require the user to keep going next, next, next, next with arrow keys or something. What I will probably do is make double click toggle the known/unknown status.

"I'm not sure if Imron had in mind designing the Chinese Text Analyzer as just a tool to aid in picking which books/texts to read , or as a stand alone reader, or both"

The hint is in the name: Chinese Text Analyser, not Chinese Text Reader. I also have plans for a separate reader program, but Chinese Text Analyser can work as a basic reader. Actually the plan has always been to develop the reader, but the reader required a segmenting engine and so I wrote that first, and thought it was useful enough to release as a standalone program in the interim.

"I have to admit that I miss having a pop up dictionary feature.

Popup definitions will be in the next release (0.99.3), with looked up words automatically being marked as unknown.

"but as of now I'm finding it hard to give up the crutch"

It might be an indication that you need to look at easier texts. One of the design goals of CTA was to make shortcomings in your ability obvious, rather than letting you gloss over them. The logic being that by making such things obvious, you can know what you need to focus on to improve.

So if there is a point of pain, then it is a possible indication of something in your learning that you need to address. CTA will not coddle you and is meant to give you an accurate view of your real ability.

"I personally think that the default should be set to Unknown instead of All"

Added to my list of things to do. Note however that currently the program will save your last choices, so once you set the list as 'Unknown' it will remember this the next time you export.

"I'm not really sure what the difference between Word and Simplified

Word is the actual word as it appears in the text. If your text is all in Simplified, then Word and Simplified will be the same. If however your text was all in Traditional, then Word and Simplified would be Traditional and Simplified respectively.

"I'm not sure what dictionary is used,"

Currently CEDICT, but with the possibility of supporting other dictionaries later.

"some of the fields exported with the previous sentence's period preceding it."

This is a bug, it should not be doing this and should stop on full-stops and/or newlines. Can you please provide me a complete paragraph of text with the problem (or send me the source text via email), and I'll look in to it.

"You can see that the Cloze sentence got put on the second row."

Likewise a bug that I will look at if you send through example source text.

Thanks again for such detailed feedback. I'll try to address many of those issues in the next release.

May 9, 2014 at 05:41 AM

Imron - I'm occasionally using as a reader at work and finding it very helpful. One feature that I'd like to see (which perhaps I'm missing somehow) is to double click to change the status of the word. Currently it seems only a right click to menu can be used to change a word from unknown to known (or vice versa). Out of habit I find myself frequently double clicking - any chance of adding this in the future? [Edit: just noticed this as coming from your comments above - great!]

Also, I'll write a review myself in the near future.

May 9, 2014 at 07:22 AM

Wow, great job! Works so quickly! So much attention is given to different use cases (browsing through the selected words is one of them). I'm so happy that developers prioritize performance, it is so crucial. It works great on Wine.

Yes, I agree with someone's comment about the good tools coming so late

Question: How are the word combinations analyzed? Is it based on the CC-CEDICT? If it does, is it on your roadmap to allow users to choose how to analyze the combinations? Just a thought for now. I'll probably use it for classical Chinese, therefore an additional combination list will be needed I assume.

My two cents will be, together with other comments on this thread:

1) I understand your argument about "bad practices" in looking up every word. However adding pinyin,and a translation (or a link to ZDIC or another online dictionary) as an optional column can do the work and make it more powerful.

2) Giving the possibility to 1) create new doc in the software and 2) to save it 3) or edit existing document would be a great addition. Now the clipboard is a bit limiting - I usually scrape texts from the internet and would like to throw it in directly into CTA.

Kudos!

Sign In

Introducing Chinese Text Analyser

Recommended Posts

imron

Link to comment

Share on other sites

character

Link to comment

Share on other sites

imron

Link to comment

Share on other sites

character

Link to comment

Share on other sites

imron

Link to comment

Share on other sites

Ania

Link to comment

Share on other sites

Nathan Mao

Link to comment

Share on other sites

kikosun

Link to comment

Share on other sites

character

Link to comment

Share on other sites

kikosun

Link to comment

Share on other sites

imron

Link to comment

Share on other sites

Nathan Mao

Link to comment

Share on other sites

Nathan Mao

Link to comment

Share on other sites

Nathan Mao

Link to comment

Share on other sites

imron

Link to comment

Share on other sites

kikosun

Link to comment

Share on other sites

kikosun

Link to comment

Share on other sites

imron

Link to comment

Share on other sites

icebear

Link to comment

Share on other sites

Dani_man

Link to comment

Share on other sites

Join the conversation