Jump to content
Chinese-Forums
  • Sign Up

Total Number of Chinese characters


chijyh

Recommended Posts

How does it work?

The complicated answer is... it assumes that the probability of you knowing a character is 1/(1 + exp(A*(x-W)), where x is the position of the character in a list of all Chinese characters ordered by frequency. The program fits data to this curve and returns W as an estimate of the number of characters you know.

The simple answer... well, I don't have a simple explanation. :)

Do take the results with a pinch of salt! I need to test the validity of my model by getting some volunteers to wade through a large number of characters, but I haven't written any code to record that data yet.

Link to comment
Share on other sites

The complicated answer is... it assumes that the probability of you knowing a character is 1/(1 + exp(A*(x-W)), where x is the position of the character in a list of all Chinese characters ordered by frequency. The program fits data to this curve and returns W as an estimate of the number of characters you know.

Wow, fancy statistics!

What is A? Is there a name for this function, so I can look it up?

Link to comment
Share on other sites

Wow, didn't expect so much interest in this program. :D

The function is a logistic function. http://en.wikipedia.org/wiki/Logistic_function . I'm just using it because it looks like what I expect, not because there's any model of language learning that suggests this is correct.

graph.gif

In this graph the x axis represents Chinese characters in order of frequency, so x=1 is 的 and x=8000 is a character you will probably never come across. The y axis represents the probability of you knowing each character. The blue blobs show characters that you have been tested on... then we are certain that either you know them (y=1) or you don't (y=0). The pink line is the equation above fitted to the blue blobs. W is the point at which the line crosses y=0.5, and is also a good approximation of the total area under the curve, which is the total number of characters you know. The parameter A is just a measure of how steep the curve is.

Link to comment
Share on other sites

Smalldog, why don't you make a new topic introducing your program, either in Textbooks and Resources (if you think it's finished and useful) or Computing (if you want some help with designing / programming it). I think a lot of people would be interested.

Roddy

Link to comment
Share on other sites

Ok Roddy, I've started a new thread here in the computing forum. I want to make some improvements and check my model before 'releasing' it.

Gato, I've never heard of logit regression before but it does seem to be similar... need to do some more reading. Stick to smalldog... 我是你的大大狗,你是我骨头 doesn't sound so good. 8)

Link to comment
Share on other sites

Gato, I've never heard of logit regression before but it does seem to be similar... need to do some more reading. Stick to smalldog... 我是你的大大狗,你是我骨头 doesn't sound so good. 8)

骨头就骨头吧。 :D How's the teaching going? I check out your cugb forum and took that English test you linked to. :wink: Your test's shorter and therefore better. I predict it'll be a hit.

Link to comment
Share on other sites

  • 1 year later...
  • 1 year later...

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...