Jump to content
Chinese-Forums
  • Sign Up

Far East Chinese-English Dictionary Database Help


Kobo-Daishi

Recommended Posts

At one time the chinalanguage.com web site had their CCDICT database available for free download. Their content is essentially the head character entries from the Far East Chinese-English Dictionary (seen on quite a few forumites' shelfies. I also have a copy). So the CCDICT is like the zidian portion of the Far East. All zi's and no ci's.

 

Chinese, as most if not all languages, have at least two or more definitions for each character.

 

I'm trying to bring my Chinese to another level by learning the different definitions for each character.

 

I have a copy of the Far East, but, my eyes are not what they used to be and I don't relish the thought of holding a magnifying glass to read through it.

 

At this thread they have a link to where there's a file with the data.

 

http://www.chinalanguage.com/forums/viewtopic.php?f=8&t=2005&sid=b3251d98e314007a1d833177d34ef030&start=30

 

The Perl repository where file located.

 

http://search.cpan.org/~drolsky/Lingua-ZH-CCDICT-0.05/lib/Lingua/ZH/CCDICT.pm

 

But unfortunately the characters are in Unicode codepoints.

 

I'm not a programmer so know next to nothing about coding.

 

My question is how do you turn the codepoints into characters so that the file is useful to a layman?

 

Kobo.

Link to comment
Share on other sites

i gave the data a quick and cursory overhaul; you can download it as https://raw.githubusercontent.com/loveencounterflow/ccdict/master/Lingua-ZH-CCDICT-0.05-transformed.txt (repo at https://github.com/loveencounterflow/ccdict). i converted the U+XXXX notations to characters (encoded as UTF-8 ) and also replaced the character references in the glosses. have fun.

  • Like 1
Link to comment
Share on other sites

At one time the chinalanguage.com web site had their CCDICT database available for free download. Their content is essentially the head character entries from the Far East Chinese-English Dictionary (seen on quite a few forumites' shelfies. I also have a copy). So the CCDICT is like the zidian portion of the Far East. All zi's and no ci's.

 

Okay, it's more than just the Far East head character entries. The definition part is mostly.

 

But they've added extra character entries probably derived from Unicode's Unihan with radical and stroke count, and Cantonese pronunciation. And their own Hakka pronunciations from several sources.

 

Now to try to get the information into a format that can be used with StarDict and GoldenDict.  :)

 

Kobo.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...