jrolle Posted December 29, 2005 at 11:37 AM Report Share Posted December 29, 2005 at 11:37 AM I am looking for an electronic list of chengyu and other 4-character fixed expressions for the purpose of determining the most commonly used. I have a list of about 14,000, but I know there are over 40,000. Does anyone have a list or know where I could find one? Thanks Quote Link to comment Share on other sites More sharing options...
yan420honggg Posted January 1, 2006 at 10:32 AM Report Share Posted January 1, 2006 at 10:32 AM http://www.kingsnet.biz/asp/chengyu/ i think it's helpful to you Quote Link to comment Share on other sites More sharing options...
jrolle Posted January 1, 2006 at 01:41 PM Author Report Share Posted January 1, 2006 at 01:41 PM Thank you. I am familiar with this website and, although useful, it is not really in a format that I can easily make use of. Quote Link to comment Share on other sites More sharing options...
roddy Posted January 1, 2006 at 01:56 PM Report Share Posted January 1, 2006 at 01:56 PM I'm not aware of anything available electronically. However, the owners of Chengyu.info and OneaDay.org might be aware of something. Incidentally, if you can make that list of 14,000 idioms public, I'm sure plenty of people on here would be interested. Is fo, there's an attachment function, or you can email it to me at admin@chinese-forums.com and I'll make it available. If not, never mind. Roddy Quote Link to comment Share on other sites More sharing options...
self-taught-mba Posted January 1, 2006 at 02:51 PM Report Share Posted January 1, 2006 at 02:51 PM 14,000, that is pretty impressive. Definitely, if you care to share that would be greatly appreciated as Roddy mentioned. But I'm not sure if finding more of them will meet your stated goal: "determining the most commonly used" For this purpose may I suggest using a lexical database or using one of many linguistic evaluation tools that can analyze mass quantities of text. From there, you can do statistical analysis to determine the frequency of usage. That way you can focus on learning those that are most useful first-- prioritizing. I have been working on doing something similar in my own content production. I have a good friend who has two masters in linguistics. She maybe will to give you more specific advice as she has helped me. I can't guarantee though because she is terrifically busy, but a few p.m. me and give me your contact information I can pass it on to her and she is better at some of these things. Also an introductory data mining course might be of some use to you. Also you might want to try to identify authors of similar compilations: One good example is here: http://kamares.ucsd.edu/~arobert/hanziData.html And this site: A Review of Chinese Word Lists Accessible on the Internet The main page here also has good sources: http://technology.chtsai.org/ These people of obviously use data mining techniques in the past and may be of assistance to you and you can learn a little bit more about it. Furthermore, I believe SourceForge has a tool that you can use. Good luck, and if you are willing out of love to see your list. OK just as I about to post, I realize that you may be using your compiled list as being input data from which to use a lexical scanner to look for matches. So maybe never mind. But maybe this helps, I don't know. Good luck, Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.