Jump to content
Chinese-forums.com
Learn Chinese in China

  • Why you should look around

    Since 2003, Chinese-forums.com has been helping people learn Chinese faster and get to China sooner. Our members can recommend beginner textbooks, help you out with obscure classical vocabulary, and tell you where to get the best street food in Xi'an. And we're friendly about it too. 

    Have a look at what's going on, or search for something specific. We hope you'll join us. 
imron

Introducing Chinese Text Analyser

Recommended Posts

Pall
18 hours ago, imron said:
On 2/25/2020 at 6:36 AM, Pall said:

It would be fine to add also a function that can mark characters from HSK1, 2, 3, 4, 5, 6

I get that people are interested in things like this, but CTA aims to subtly push people away from thinking in terms of HSK.  In fact I only provide the HSK statistics to drive home the point that for most native content, the vocabulary for the HSK doesn't give you very much at all, and you're better off using frequently occurring words in what you are reading.

 

On 3/25/2020 at 10:24 AM, Pall said:

It would be especially great if CTA could also mark and counter 'head' characters even though they're of conditional nature.

I had a look through your link, but I'm still not entirely sure what you mean by head characters.

I understand your point. It's true, HSK5 is not enough even for reading newspapers. But the idea is to learn some basis exlusively well, to be able to feel and observe  it in one's mind, and that'll make things much easier when encountering a new character, since it can be fit in the firm HSK5 framework. As to me, basing on three first poems, for B-P-M-F, D-T-N-L and Z-C-S , I've learnt all characters from HSK5  very confidently. However, let's asume I may doubt sometimes if I know a new character, which pinyin I looked up, and it happened to be one of the learnt syllables. I check it in the Table and... (1) see it's there. I'm sure now it's very unlikly that next time I'll hesitate to recognize it. (2) It's not there. OK, I just add it to a certain card and cell in the Table marking it in green. In this case it's also very likely that I'll memorize it much quicker compared to the situation when there is no firm basis in the form of the Table and the cards (two types, 'intermediate poem presentation cards' and 'head character cards').

Head characters are just some characters selected to represent an entire syllable with respect to tone. For instance, in the HSK5 there are three characters sounding fáng:房,防,妨。 We take one of them as a representative. I selected 房 as such. For 'fang' in the other tones head characters are also selected. For 1st tone  it's 方 , for 3rd 访, and for 4th  放 . They're all 'head' characters. We select also one of their meanings to use in formulas (it concernes all characters): the 'corner 'for 方, 'building' for 房 (though 'flat' might be more often meaning), 'explore' for 访 and 'advertise' for 放。 Head characters are used in 'head character cards', on one side of which there is the head character, and other characters of the same pinyin are on the reverse, the latter being arranged in the special order for better memorization, see picture (at the bottom of it). In the 'head character cards' other characters are linked to 'head' ones by a 'horizontal' formula, a phrase connecting their meanings one after the other (in the same word order), with the use of other necessary words, of course. The meanings of the characters are 'target' words. 

 

The possible number of the head characters is about 1200-1300 for the whole language, and within HSK5 it's 880 (just to begin with ). But the number of head character cards required may be much less, for hundreds of syllables are represented by a single character while others by a number of characters of the same pinyin. For example, in the HSK5 we need only about 350 head character cards.

 

Then one of the 'head' characters is selected as the 'key' character  to represent one of the 400 syllables without considering for tones. I chose 方。 The 'key' characters, their meaning (one of) is used in 'poems' composed according to 声母 vs 韵母 correspondence. All head characters, including key characters, are presented in 'intermediate poem presentation' cards, see pic (at the top of it). Head characters of the same syllable (of different tones) are linked to one another by 'vertical' formulas.  I managed to compose these formulas  in English.

 

Thus, one has to learn only 120 'intermediate poem presentation cards' and some hundreds of 'head character cards' (for HSK5 only 350) to know all characters according to his level. In the Dictionary of Contemporary Mandarin which I have 20,000 words are based on only 4,500 characters. So, starting from the HSK5 basis one can move to the objective of 4,500 by adding new characters marked in green. 

IMG_0366.jpg

Share this post


Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

Pall
16 hours ago, roddy said:

Did you ever know that you're my hero
And everything I would like to be?
I can fly higher than an eagle
For you are Imron beneath my wings

 

Good poetry! I'm sure you could manage to compose in English even long 'horizontal' formulas linking a number of characters in a given order.

 

Share this post


Link to post
Share on other sites
Jan Finster

@Imron: is there a way to export a [word list] to txt or xls?

 

When I go to [menu: word lists] and [manage...] it lets me delete words etc, but I cannot select all words and copy/paste them elsewhere...

Share this post


Link to post
Share on other sites
imron

There isn't a way to export them, but there is a way to access them.

 

Assuming you are on Windows you can open file explorer and go to:

 

C:\Users\USERNAME\AppData\Local\ChineseTextAnalyser\wordlists\cache

 

Where USERNAME should be replaced with your computer username.

 

Each file in that directory corresponds to each wordlist, and will be a .txt file with one word per line.

 

NOTE: You should copy/open these files after closing CTA as recent changes might not have been written out to disk.  Also note, these are not the actual saved wordlists themselves, just a cached copy that CTA uses so it doesn't have to rebuild the full list from the actual saved format.  If you edit these files, those changes will be overwritten when CTA detects changes have been made and recreates the cached copy.

  • Helpful 1

Share this post


Link to post
Share on other sites
murrayjames

Imron, question for you. Today I updated CTA to the latest version (0.99.18). After updating, I noticed that the known word % of texts I was reading had fallen 0.50–1.00%. Any idea why this happened?

  • Good question! 1

Share this post


Link to post
Share on other sites
imron

The latest version included an update to use the latest version of cedict, which means there’s a bunch of extra words, which means some things will be segmented differently. 

Share this post


Link to post
Share on other sites
murrayjames

Ah, that would do it! I was slightly sad when those percentages went down. 

Share this post


Link to post
Share on other sites
imron

Keep learning and they'll go back up!

  • Like 1

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...