Jump to content
Chinese-forums.com
Learn Chinese in China

  • Why you should look around

    Since 2003, Chinese-forums.com has been helping people learn Chinese faster and get to China sooner. Our members can recommend beginner textbooks, help you out with obscure classical vocabulary, and tell you where to get the best street food in Xi'an. And we're friendly about it too. 

    Have a look at what's going on, or search for something specific. We hope you'll join us. 
chijyh

Total Number of Chinese characters

Recommended Posts

smalldog
How does it work?

The complicated answer is... it assumes that the probability of you knowing a character is 1/(1 + exp(A*(x-W)), where x is the position of the character in a list of all Chinese characters ordered by frequency. The program fits data to this curve and returns W as an estimate of the number of characters you know.

The simple answer... well, I don't have a simple explanation. :)

Do take the results with a pinch of salt! I need to test the validity of my model by getting some volunteers to wade through a large number of characters, but I haven't written any code to record that data yet.

Share this post


Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

gato
The complicated answer is... it assumes that the probability of you knowing a character is 1/(1 + exp(A*(x-W)), where x is the position of the character in a list of all Chinese characters ordered by frequency. The program fits data to this curve and returns W as an estimate of the number of characters you know.

Wow, fancy statistics!

What is A? Is there a name for this function, so I can look it up?

Share this post


Link to post
Share on other sites
in_lab

Nice program! I don't know how accurate it was, but I was happy with the range that was coming up. :D But please, we want to learn the fancy statistics.

Share this post


Link to post
Share on other sites
smalldog

Wow, didn't expect so much interest in this program. :D

The function is a logistic function. http://en.wikipedia.org/wiki/Logistic_function . I'm just using it because it looks like what I expect, not because there's any model of language learning that suggests this is correct.

graph.gif

In this graph the x axis represents Chinese characters in order of frequency, so x=1 is 的 and x=8000 is a character you will probably never come across. The y axis represents the probability of you knowing each character. The blue blobs show characters that you have been tested on... then we are certain that either you know them (y=1) or you don't (y=0). The pink line is the equation above fitted to the blue blobs. W is the point at which the line crosses y=0.5, and is also a good approximation of the total area under the curve, which is the total number of characters you know. The parameter A is just a measure of how steep the curve is.

Share this post


Link to post
Share on other sites
roddy

Smalldog, why don't you make a new topic introducing your program, either in Textbooks and Resources (if you think it's finished and useful) or Computing (if you want some help with designing / programming it). I think a lot of people would be interested.

Roddy

Share this post


Link to post
Share on other sites
gato

Cool, is this what's called logit regression? Maybe we should call you bigdog from now on. :mrgreen:

Share this post


Link to post
Share on other sites
smalldog

Ok Roddy, I've started a new thread here in the computing forum. I want to make some improvements and check my model before 'releasing' it.

Gato, I've never heard of logit regression before but it does seem to be similar... need to do some more reading. Stick to smalldog... 我是你的大大狗,你是我骨头 doesn't sound so good. 8)

Share this post


Link to post
Share on other sites
gato
Gato, I've never heard of logit regression before but it does seem to be similar... need to do some more reading. Stick to smalldog... 我是你的大大狗,你是我骨头 doesn't sound so good. 8)

骨头就骨头吧。 :D How's the teaching going? I check out your cugb forum and took that English test you linked to. :wink: Your test's shorter and therefore better. I predict it'll be a hit.

Share this post


Link to post
Share on other sites
woodcutter

I second what Roddy said about it, except that I don't see how anyone could bear to take that test for more than 5 characters. Do I know that character? Yeah....think so! I want to check!

Share this post


Link to post
Share on other sites
錢 勇 龍

wow Gato, you English is excellent! I hope one day I will be as good in Traditional Chinese as you are in English!

Share this post


Link to post
Share on other sites
Sgt_Strider

Gato,

Most of the links in your post are dead. Do you think you can update them for us? I know this thread is old, but it contains useful information.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...