smalldog Posted May 12, 2005 at 01:26 AM Report Share Posted May 12, 2005 at 01:26 AM How does it work? The complicated answer is... it assumes that the probability of you knowing a character is 1/(1 + exp(A*(x-W)), where x is the position of the character in a list of all Chinese characters ordered by frequency. The program fits data to this curve and returns W as an estimate of the number of characters you know. The simple answer... well, I don't have a simple explanation. Do take the results with a pinch of salt! I need to test the validity of my model by getting some volunteers to wade through a large number of characters, but I haven't written any code to record that data yet. Quote Link to comment Share on other sites More sharing options...
gato Posted May 12, 2005 at 01:52 AM Report Share Posted May 12, 2005 at 01:52 AM The complicated answer is... it assumes that the probability of you knowing a character is 1/(1 + exp(A*(x-W)), where x is the position of the character in a list of all Chinese characters ordered by frequency. The program fits data to this curve and returns W as an estimate of the number of characters you know. Wow, fancy statistics! What is A? Is there a name for this function, so I can look it up? Quote Link to comment Share on other sites More sharing options...
in_lab Posted May 12, 2005 at 03:28 AM Report Share Posted May 12, 2005 at 03:28 AM Nice program! I don't know how accurate it was, but I was happy with the range that was coming up. But please, we want to learn the fancy statistics. Quote Link to comment Share on other sites More sharing options...
smalldog Posted May 12, 2005 at 05:15 AM Report Share Posted May 12, 2005 at 05:15 AM Wow, didn't expect so much interest in this program. The function is a logistic function. http://en.wikipedia.org/wiki/Logistic_function . I'm just using it because it looks like what I expect, not because there's any model of language learning that suggests this is correct. In this graph the x axis represents Chinese characters in order of frequency, so x=1 is 的 and x=8000 is a character you will probably never come across. The y axis represents the probability of you knowing each character. The blue blobs show characters that you have been tested on... then we are certain that either you know them (y=1) or you don't (y=0). The pink line is the equation above fitted to the blue blobs. W is the point at which the line crosses y=0.5, and is also a good approximation of the total area under the curve, which is the total number of characters you know. The parameter A is just a measure of how steep the curve is. Quote Link to comment Share on other sites More sharing options...
roddy Posted May 12, 2005 at 05:20 AM Report Share Posted May 12, 2005 at 05:20 AM Smalldog, why don't you make a new topic introducing your program, either in Textbooks and Resources (if you think it's finished and useful) or Computing (if you want some help with designing / programming it). I think a lot of people would be interested. Roddy Quote Link to comment Share on other sites More sharing options...
gato Posted May 12, 2005 at 05:35 AM Report Share Posted May 12, 2005 at 05:35 AM Cool, is this what's called logit regression? Maybe we should call you bigdog from now on. Quote Link to comment Share on other sites More sharing options...
smalldog Posted May 12, 2005 at 06:21 AM Report Share Posted May 12, 2005 at 06:21 AM Ok Roddy, I've started a new thread here in the computing forum. I want to make some improvements and check my model before 'releasing' it. Gato, I've never heard of logit regression before but it does seem to be similar... need to do some more reading. Stick to smalldog... 我是你的大大狗,你是我骨头 doesn't sound so good. Quote Link to comment Share on other sites More sharing options...
gato Posted May 12, 2005 at 06:47 AM Report Share Posted May 12, 2005 at 06:47 AM Gato, I've never heard of logit regression before but it does seem to be similar... need to do some more reading. Stick to smalldog... 我是你的大大狗,你是我骨头 doesn't sound so good. 骨头就骨头吧。 How's the teaching going? I check out your cugb forum and took that English test you linked to. Your test's shorter and therefore better. I predict it'll be a hit. Quote Link to comment Share on other sites More sharing options...
woodcutter Posted May 18, 2005 at 03:34 AM Report Share Posted May 18, 2005 at 03:34 AM I second what Roddy said about it, except that I don't see how anyone could bear to take that test for more than 5 characters. Do I know that character? Yeah....think so! I want to check! Quote Link to comment Share on other sites More sharing options...
錢 勇 龍 Posted August 6, 2006 at 02:11 PM Report Share Posted August 6, 2006 at 02:11 PM wow Gato, you English is excellent! I hope one day I will be as good in Traditional Chinese as you are in English! Quote Link to comment Share on other sites More sharing options...
Sgt_Strider Posted March 24, 2008 at 12:14 AM Report Share Posted March 24, 2008 at 12:14 AM Gato, Most of the links in your post are dead. Do you think you can update them for us? I know this thread is old, but it contains useful information. Quote Link to comment Share on other sites More sharing options...
amego Posted March 24, 2008 at 05:03 PM Report Share Posted March 24, 2008 at 05:03 PM In the literal sense the Zhonghua Zihai, records a staggering 85,568 single characters, although even this fails to list all characters known http://en.wikipedia.org/wiki/Chinese_character Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.