Jump to content
Chinese-forums.com
Learn Chinese in China

  • Why you should look around

    Since 2003, Chinese-forums.com has been helping people learn Chinese faster and get to China sooner. Our members can recommend beginner textbooks, help you out with obscure classical vocabulary, and tell you where to get the best street food in Xi'an. And we're friendly about it too. 

    Have a look at what's going on, or search for something specific. We hope you'll join us. 
chinesemadrush

CC-CEDICT dictionary convert to excel

Recommended Posts

chinesemadrush

Hi everyone,

 

CC-CEDICT is a dictionary which you can search for words in. It is available in a text file here (https://www.mdbg.net/chindict/chindict.php?page=cedict)

 

Unfortunately, I do not know how to parse the text file into nice columns in Excel. Does anyone know how to do this?

 

I am aware that Excel has a text to column function but it doesn't seem advanced enough for the file structure used by CC-CEDICT

 

Thanks,

Kevin

Share this post


Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

Yadang

Yeah, it should be tsv, but upon importing it to excel, it looks like it's delimited by space, which is problematic... You could probably write a function to delimit it based on when the pinyin starts (which is enclosed in brackets)... Let me know if you need help.

Share this post


Link to post
Share on other sites
imron

It's not tsv.  The format is specified here.

 

Do you have access to an editor that handles regular expressions?  If not, download notepad++.

 

Then open the CC-CEDICT file.

 

Then Search->Replace (Ctrl+H)

 

Set the 'Search Mode' to 'Regular expression'.

 

In the 'Find what' field type: ^([^ ]+) ([^ ]+) (\[.*\]) (.*)$

(probably best to copy/paste this from this post).

 

This is a regular expression that matches 4 fields - Traditional, Simplified, Pinyin, Definition

 

In the 'Replace with' field type

 

\1\t\2\t\3\t\4

 

This replaces each matching line with the individual fields separated by a tab character.

 

Then hit Replace All and wait 10-20 seconds and you should be good to go.  Just save the file and import it directly in to excel.

  • Like 3

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...