Learn Chinese in China
marcoesposito

How to have a list of unique characters in a document?

7 posts in this topic

I have Dim Sum Chinese Tools, it's possibile to have a list of all the "unique characters" in a document, but it's impossible to copy it (to transfer it in a word/excel document) or stamp it :-(. Does anybody know a program/method to do that? I'm trying to use http://lingua.mtsu.edu/chinese-computing/vp/index.php?CNTEXT_Session=7e955985e909f25ad52ee49d05b783e6 but it doesn't work with very long documents... :-( Thank you!

Share this post


Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

Hi marcoesposito,

I also quickly made one for you with C#, as your basic demand, input a txt file with Chinese encoding support(like unicode, utf-8 ), and press count button, output is a csv file, you can open to read with EXCEL.

get attached zip file, unzip and remove suffix name - '.removeme', then run it with window ui

Hope this can help

LaoJian

WindowsFormsApplication1.exe.removeme.zip

Share this post


Link to post
Share on other sites

If you have Word and Excel, it's easy to do. First, paste the text into Word. Go to "Find and Replace" and put these settings:

Save.png

That will put each character onto one line. From there you copy and paste it into Excel. With the data still selected in Excel, do an advanced filter like in the image below and when you click "OK" you'll have a list of all the unique characters in a document.

filter.png

ALSO, I built a free tool a while ago that gives you unique words in a document (with a bunch of other killer stuff):

Trevor's Chinese Reader

You just have to paste in the text and press the button. Then go to Tools --> Download Word List. It's pretty neat.

TrevorsChineseReader.png

Here is what the output looks like when you open it in Excel. Gives you some interesting information to direct your study.

TCR_WordListOutput.png

Best of luck with all your studies!

1 person likes this

Share this post


Link to post
Share on other sites

T-Revor, congratulations, your reader is a wonderful tool. It works very smoothly. No doubt I'll be using it in the future. Thanks for making it available.

I suppose the csv file is encoded in UTF-8? I have an oldish version of Excel (2002?) which, apparently, does not understand UTF-8. Do you know how I can force Excel to properly display the file I downloaded?

Another question: I suppose the HSK classification you use (1-4) corresponds to the old HSK?

Share this post


Link to post
Share on other sites

Yes, unfortunately, the reader is in bad need of my attention, but I'm trying to finish up some other projects before I get back to it. The HSK is the old HSK unfortunately.

As for the UTF8 issue, I've heard people have had problems with it but I don't know if it's an Excel verison thing or if it's a OS setting, or an Excel setting or what. Again, something that badly needs my attention.

If you have any insight as to why it's not showing up or how I would fix it, please let me know. I'll move it up on my priority list and see what I can do.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now