Jump to content
Chinese-forums.com
Learn Chinese in China

  • Why you should look around

    Since 2003, Chinese-forums.com has been helping people learn Chinese faster and get to China sooner. Our members can recommend beginner textbooks, help you out with obscure classical vocabulary, and tell you where to get the best street food in Xi'an. And we're friendly about it too. 

    Have a look at what's going on, or search for something specific. We hope you'll join us. 
jiaojiao87

At what known-word % should I start reading a book?

Recommended Posts

imron
3 hours ago, ZhangKaiRong said:

What does the Chinese Text Analyser analyse?

It aims to analyze at the word level because that's the basic unit learners should be using.  It also tracks very basic character statistics such as total character count, but it doesn't track known/unknown characters, or provide frequency information at the character level, only the word level.

 

It has a dictionary of words (including polysyllabic ones and Chengyus) and matches against that.

 

I say 'aims' to work with words because the segmenter is fairly basic and so doesn't always get the word boundaries correct, but 'words' is the basic unit it works with.

 

5 hours ago, Dawei3 said:

A fascinating perspective.....

It's one that most people overlook (including myself at the time) until you start getting in to reading.

Share this post


Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

Dawei3
6 hours ago, Moshen said:

post the ISBN number

Moshen, I still have the book. When I get home, I’m glad to check its ISBN. My friend bought it for me in China. It seemed like a legitimate copy because of its quality. 

Share this post


Link to post
Share on other sites
Dawei3
7 hours ago, Moshen said:

ISBN number?

The ISBN is 978-7-5613-4411-8

 

In the back of the book, it gives [email protected]  and also suggests their website. However, I got the book ~10 years ago. 

 

 

  • Like 1
  • Helpful 1

Share this post


Link to post
Share on other sites
Wurstmann
8 hours ago, imron said:

I say 'aims' to work with words because the segmenter is fairly basic and so doesn't always get the word boundaries correct, but 'words' is the basic unit it works with.

Did you write your own parser or are you using something like jieba?

Share this post


Link to post
Share on other sites
imron

I wrote my own parser. CTA predates jieba by a couple of years.  The parser is faster but less accurate.  I've been working on a more accurate (but still fast) parser, but life has been busy the last couple of years and haven't been able to put as much time in to it as I'd like.

Share this post


Link to post
Share on other sites
Pall

Sorry, I don't know how to reply with quotes and refer to several forume contributors simultaneously.

 

First, I believe speaking is the most valuable skill in learning foreign languages. It's not only for communication. If one can speak a language he is confident he knows it, and even reading becomes easier. Nevertheless, the bulk of new words comes from reading predominantly. But the problem is that when one just reads texts in a foreign language looking up for new words he trains mainly his ability to recognize the new words in the future. In speaking or writing these words are not easily recollected by him to be “on the tip of his tongue”. I'm a native Russian speaker, I can read in English almost everything without a dictionary, but whet it comes to speaking or writing it becomes much more difficult. It's a well known thing, of course. Therefore it's necessary to learn words not "from the language" only but "to the language", in order to place them in one's active memory, not only passive one. However, we need to know how to use the words, so we can't manage without reading.

 

Second, in the case of Chinese one's active and passive vocabularies are very close to each other since we are confident we know a Chinese word only if we can write it by hand. Recognition is not enough. So even to read we need to learn how  words are written, i.e. we learn them "to the language", not "from the language".

 

These two reasons prompted me to use a technique the idea of whicn I developed on the basis of the Russian-Chinese pidgin language, 中俄混合语, that existed in some Siberian and Northern China areas in late XIXth - first half of XXth century. It was based on Chinese grammar, but mainly Russian vocabulary. Russian is very flexible, almost any words order is possible without changing the meaning, it's easy to construct new words that will be understandable, etc.

 

So, what I do. I open a text of a reasonable length (a Chapter, say) with CTA program, where a list of words that I know (with ability to write them by hand) is loaded already, see, that known words make up 70%, or even 50% (or even less, it doesn't matter) - unique known words share is twice as lower, of course - and I mark additional 50 - 60 words as known. They can be the next words by frequency or the words that appear more than 3 times or words picked up by another criteria. Then I write in Russian what I call "gateway superscript" of the Chinese text (not a translation) - word by word I copy the Chinese replacing Chinese words for Russian ones, so the sentence structure and punctuation remain Chinese. With Russian it's possible, and sentence remains easily understandable. If in the Mandarin text there is one word for the notion that can be expressed in Russian only in several words, these can be written with the symbol “+”. If Russian requires an additional word (e.g. preposition), it is taken in brackets. In places where on the contrary there is an additional word in the Chinese text, in Russian empty brackets are put down. In braces I give other variants of superscript, and in square comments. Also I mark with color or underline the words, which I know and which I'm going to make known (that are marked in the CTA window) . After that, reading the superscript, I just replace the marked words for characters, whereas the segments that are not marked (uknown words, which I'm not going to learn know) I just rewrite in Rusian. I enclose a picture how it looks in my copybook. I repeat the exercise several times with intervals (hours or days). When all the words are known I add additional words in the same text or go to another text (the next chapter or quite another text). By this way  words are learnt in their usage, in an integral text (that is better than seperate examples), and by "layers" according to frequency. I can stop at any proportion of known words and switch to another text.

 

I understand my post is not helpful for those who don't know Russian. However, taking into account this way of learning foreign languages is applicable for learning any language, not onle Chinese, they may consider learning Russian for this purpose. 

IMG_0114.jpg

  • Like 1

Share this post


Link to post
Share on other sites
Pall

In addition especially difficult characters I learn with Mnemosyne SRS program, similar to Anki, loading in it a video of strokes sequence.

Share this post


Link to post
Share on other sites
Lu

@Pall, I took the liberty of adding paragraphs to your text, hope that it OK.

  • Like 1
  • Helpful 2

Share this post


Link to post
Share on other sites
Pall

The technique described above is best suited for (1) begginers like me, (2) those, who want to acquire writing skills and (3) also for advanced learners when they need texts that don't exist in simplier form, professional texts or newspapers. But the mainstream learners of the intermediate and advanced levels will be bored with the necessity  to write again and again common words, which they can write very well already, just for the purpose to introduce some dozens of new words, and also with obvious limitations for using larger texts.  For them another kind of gateway would be more helpful, which can be called "inverted gateway". It doesn't require writing a superscript for the whole text in advance and it doesn't need hand-writing. 

 

But in this case it's necessary to find texts suitable for their current level, because the objective is to learn almost all the words in them. Then one replaces for Russian superscript only those segments, which he can easily understand and also express  in Chinese (speak or write). Uknown words or parts difficult in grammar sense will be left in characters. I'll show this with a simple sample though. 


Школьная运会(ученические 运动会)


В+один+год 一度( )школьная运会снова началась ( )!С  раннего утра мы уже одевались (в) 整齐 ( ) ученическую форму пришли (на) школьную ( ) 运动场。
运动会 начала до,школьный директор перед нами держал ( ) речь。Он говорил,мы (в) соревнований ( ) время,должны достичь “友谊 во-первых,состязания во вторых”。Соревнования начались ( )。(В) нашем классы был одноклассник 参加跳远,он исключительно сильный 厉害,一下子сразу прыгнул 中间 перешел ( )。Мы поддержали ( ) его 喝彩。跳绳 ( )соревнования самые 难忘,потому что я сам 参加 ( )。

 

Chinese words in characters are supposed to be uknown. Then we start reading the text aloud as follows:

 

Xiao4yun4hui4 (xue2xiao4 yun4dong4hui4)

Yi4nian2 yi2du4 de xiao4yun4hui4 you4 dao4 le...

 

We read both Russian in Mandarin (replacing superscript for Chinese words in our mind) and characters. If we don't know how to pronounce a Chinese word or don't know it's meaning we look  it up and read further. In a day or two it's necessary to repeat reading. So, new words are learnt in respect of recognition only, but efficiently.

 

Another mode to work with the text is to write it with a computer replacing Russian superscript for characters. 

 

By this way we learn new words in context, consentrating on them (SRS programs is less needed), and we also train our ability to speak and write in Mandarin. Of course, there can be places where synonims can be used, but it doesn't hinder much. 

 

As soon as I approach intermediate level I'll use mainly this technique, I think.

 

Share this post


Link to post
Share on other sites
Pall

I've checked how the inverted gateway works on a text in which I don't know only 10% of words by recognition (15% of unique words), and can make some observations already (it was another text, not the one in the sample). 
 

As I understand, ideally we should be able to write by hand some 1500-3000 most common words. Untill we reach this level the direct gateway is very efficient method to learn new words. There is no boring in fact, it's a true pleasure for me to write characters, but in real phrases, not just repeating them in line. However, to move further faster it is better to switch to the inverted gateway and not to strive for the ability to write new words by hand. We will acquire only skills to (1) recognise the new words with their meaning, (2) to remember their pinin, and (3) to type them with a computer, what needs recognition of them in the pop-up menu and also knowing pinin at least without tones to open the menu. 

 

The inverted gateway works very well. It involves the following steps. First, it's writing a superscript in Russian for a Chinese text, but only for those Chinese words, which we know (in the respect of only the recognition of their meaning and knowning of their pinin, forget hand-writing skills), leaving uknown words in characters as in the sample in the previous post. Second, it's trying to read the inverted superscript aloud in Mandarin. By that we memorise the pinin of the new words (memorising their meaning is easier since the words are in context) as well as train our ability to convert Russian superscript to Chinese. The latter we may do with mistakes, of course, but it's not necessary to check if we say it right at this stage. Third, as soon as we have remembered the meaning and the pinin of the new words, we start typing the text with a computer in characters looking at the inverted superscript (which is in the sample above). At this stage we can also check if we convert Russian to Chinese in the right way. 
 校运会(学校运动会)
一年一度的校运会又到了!一大早我们就穿着整齐的校服来到学校的运动场。
运动会开始前,校长给我们讲了话。他说,我们比赛的时候,应该做到“友谊第一,比赛第二”。比赛开始了。我们班有同学参加跳远,他非常厉害,一下子就跳到中间去了。我们都为他喝彩。跳绳的比赛最难忘,因为我自己参加了..........
 

Share this post


Link to post
Share on other sites
Pall

Ability to read aloud and write by hand are very important skills, which make a basis for developing other language skills. I saw how my Chinese students did both. Once, I gave them an article on world political geography in Chinese, a very complicated one, containing, for example, dozens of small island countries names. And they were able to read it aloud without interruption! Then they knew how to pronounce all those rare characters that were in the text. Other day I gave them a task to translate to Chinese short  Russian  texts, writing on a shiet of paper, and some of them did it very fast, covering the paper with written characters that I couldn't distingwish. The others couldn't do it, but it was because  they didn't know Russian to the level. It means logically that foreigners, who want to learn Chinese, should master to do the same at least comprising some thousand most used words, 1500-3000, in my opinion. Although, the texts in the latter exercise were not complicated since I had written those for them on purpose, using only the most used Russian words from the top 1000. The thing is when I realised that many Chinese students in the group to which I delivered lectures (human geography)  couldn't speak Russian good enough to understand me, I made videos on learning Russian by Zamyatkin's matrix method (by the way, it's good for learning any language in developing good pronunciation and overcoming the initial language barrier, inclusive Mandarin, I started with it) and in addition composed a series of short texts (essays, stories, dialogues) containing only the words from the top Russian 1000 most used words. It's a pity, only a couple of the students took the opportunity, but those have shown a good progress in mastering Russian so far. It may be interesting for some here, so I give a link to the  main video about Zamyatkin's matrix method on the example of Russian, but it's suitable for all languages, and the most used words tool, which I developed for those foreign students, who want to learn Russian. One version is in English, the other in Mandarin. In these videos (in the discription and in the video itself) there are links to other videos. As to Zamyatkin's method, I have all the audios (37 dialogues) and relating materials  (in Russian though) for learning Mandarin, which I can upload here free, since Zamyatkin gave me the right to distribute Chinese matrix free of charge. 
https://youtu.be/8wKIBJgzdeo

https://youtu.be/u6KHTi20LI8

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...