Jump to content
Chinese-Forums
  • Sign Up

CC-CEDICT dictionary convert to excel


chinesemadrush

Recommended Posts

Hi everyone,

 

CC-CEDICT is a dictionary which you can search for words in. It is available in a text file here (https://www.mdbg.net/chindict/chindict.php?page=cedict)

 

Unfortunately, I do not know how to parse the text file into nice columns in Excel. Does anyone know how to do this?

 

I am aware that Excel has a text to column function but it doesn't seem advanced enough for the file structure used by CC-CEDICT

 

Thanks,

Kevin

Link to comment
Share on other sites

Yeah, it should be tsv, but upon importing it to excel, it looks like it's delimited by space, which is problematic... You could probably write a function to delimit it based on when the pinyin starts (which is enclosed in brackets)... Let me know if you need help.

Link to comment
Share on other sites

It's not tsv.  The format is specified here.

 

Do you have access to an editor that handles regular expressions?  If not, download notepad++.

 

Then open the CC-CEDICT file.

 

Then Search->Replace (Ctrl+H)

 

Set the 'Search Mode' to 'Regular expression'.

 

In the 'Find what' field type: ^([^ ]+) ([^ ]+) (\[.*\]) (.*)$

(probably best to copy/paste this from this post).

 

This is a regular expression that matches 4 fields - Traditional, Simplified, Pinyin, Definition

 

In the 'Replace with' field type

 

\1\t\2\t\3\t\4

 

This replaces each matching line with the individual fields separated by a tab character.

 

Then hit Replace All and wait 10-20 seconds and you should be good to go.  Just save the file and import it directly in to excel.

  • Like 3
Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...