Jump to content
Chinese-forums.com
Learn Chinese in China
  • Why you should look around

    Since 2003, Chinese-forums.com has been helping people learn Chinese faster and get to China sooner. Our members can recommend beginner textbooks, help you out with obscure classical vocabulary, and tell you where to get the best street food in Xi'an. And we're friendly about it too. 

    Have a look at what's going on, or search for something specific. We hope you'll join us. 
marcoesposito

How to have a list of unique characters in a document?

Recommended Posts

marcoesposito    0
marcoesposito

I have Dim Sum Chinese Tools, it's possibile to have a list of all the "unique characters" in a document, but it's impossible to copy it (to transfer it in a word/excel document) or stamp it :-(. Does anybody know a program/method to do that? I'm trying to use http://lingua.mtsu.edu/chinese-computing/vp/index.php?CNTEXT_Session=7e955985e909f25ad52ee49d05b783e6 but it doesn't work with very long documents... :-( Thank you!

Share this post


Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

LaoJian    40
LaoJian

Hi marcoesposito,

I also quickly made one for you with C#, as your basic demand, input a txt file with Chinese encoding support(like unicode, utf-8 ), and press count button, output is a csv file, you can open to read with EXCEL.

get attached zip file, unzip and remove suffix name - '.removeme', then run it with window ui

Hope this can help

LaoJian

WindowsFormsApplication1.exe.removeme.zip

Share this post


Link to post
Share on other sites
T-revor    12
T-revor

If you have Word and Excel, it's easy to do. First, paste the text into Word. Go to "Find and Replace" and put these settings:

Save.png

That will put each character onto one line. From there you copy and paste it into Excel. With the data still selected in Excel, do an advanced filter like in the image below and when you click "OK" you'll have a list of all the unique characters in a document.

filter.png

ALSO, I built a free tool a while ago that gives you unique words in a document (with a bunch of other killer stuff):

Trevor's Chinese Reader

You just have to paste in the text and press the button. Then go to Tools --> Download Word List. It's pretty neat.

TrevorsChineseReader.png

Here is what the output looks like when you open it in Excel. Gives you some interesting information to direct your study.

TCR_WordListOutput.png

Best of luck with all your studies!

  • Like 1

Share this post


Link to post
Share on other sites
laurenth    388
laurenth

T-Revor, congratulations, your reader is a wonderful tool. It works very smoothly. No doubt I'll be using it in the future. Thanks for making it available.

I suppose the csv file is encoded in UTF-8? I have an oldish version of Excel (2002?) which, apparently, does not understand UTF-8. Do you know how I can force Excel to properly display the file I downloaded?

Another question: I suppose the HSK classification you use (1-4) corresponds to the old HSK?

Share this post


Link to post
Share on other sites
T-revor    12
T-revor

Yes, unfortunately, the reader is in bad need of my attention, but I'm trying to finish up some other projects before I get back to it. The HSK is the old HSK unfortunately.

As for the UTF8 issue, I've heard people have had problems with it but I don't know if it's an Excel verison thing or if it's a OS setting, or an Excel setting or what. Again, something that badly needs my attention.

If you have any insight as to why it's not showing up or how I would fix it, please let me know. I'll move it up on my priority list and see what I can do.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Blog Entries

    • roddy
      Signese Revival 9 By roddy in Signese 0
      One fairly random photo of Chinese characters in action, per week, until sometime in 2018. And perhaps longer if I'm encouraged. Those who want to contribute their own random photos of Chinese characters are welcome, just get in touch and I'll add you to the contributor list so you can post directly, from computer or phone.
       
      I wonder what a non-民用 key is. 

    • abcdefg
      Dim Sum Menu By abcdefg in Signese 0
      Here is the menu for the recent food article in which I reported on three mornings of Cantonese dim sum. This menu is from Yulong Seafood Hotpot Restaurant in Macau, near Ponte 16. The dim sum article is here: https://www.chinese-forums.com/forums/topic/54982-enjoying-dim-sum/?tab=comments#comment-424075
       
      (You can click the photos to enlarge them.)
       

       
       

       
       
      The waitress brings a pencil along with the menu, and you put a check mark below the items that you want to eat. She told me it didn't matter which box I checked, one of which is for ordering an item a la carte 单点 and the other for ordering an item as part of a larger meal 加单。
       
      She returns later with a typed receipt for the order as it was entered into their system. Always a good idea to double check at that point to be sure there was no mixup. Pricing category designations appear beside the name of the item: 特点,大点,中点,小点。
       

       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
      I always try to pick up a blank extra menu so I can study it at my leisure later in the day and do a better job of ordering the next day.
       
       
    • js6426
      Week 3 By js6426 in Chinese Language And Literature Degree 4
      Sure thing!  So the book I was too lazy to grab is called 'Conversational Chinese 301'.  It's not bad, but unfortunately it has pinyin all the way through.  I find it so hard to concentrate on the characters when the pinyin is written underneath, but in theory you don't even need to pass HSK 3 to do this degree so I can understand it.  The degree itself is 'Chinese Language and Literature', and the only requirement was high school graduation, so very easy to get in for.  However, once you're in it seems like they won't have a problem kicking you out if you're not serious.  My teacher was not amused today when a guy strolled in an hour late, and another of my teachers said our class will probably go from the 28 we are out now down to around 20 students or so in the next couple of weeks as they deal with people not coming to class etc!

      Tomorrow we will finish the final chapter (8) of the first book of the 'Threshold' level of the Road To Success series, which contains 4 books.  On Wednesday we are meant to have a test on all the characters we have covered in the book (there aren't actually any in there, but we either had to find them or were given them so we could learn them).  By the end of the 4th book in this series we should have studied 1200 words (according to the back of the book).  The next stage then has 2 books, which gets us up to 3000 words, then the final stage has another 2 books, leaving us at 6000 words.  I actually really like this book, in fact I really like all the books we are using, I have found them especially helpful for stroke order.  I am far from perfect, but I find myself actively thinking about stroke order and getting it right much more of the time now.  Also, even though they are beginner books, I find I am having to learn characters that I would never have taken an interest in learning to write otherwise (things like fruit and vegetables).  This is great because it means I'm not getting bored just hearing stuff I have already learned repeated. 

      Last Friday I gave a brief description of a family photo.  It was an on the spot thing rather than prepared, so it wasn't until afterwards that I realized how bad it had been!  I pretty much just went through and said who everyone was, pointing at people or using the colour of their clothes to describe them.  I should have been using words like 旁边,前面,后面 etc. but I didn't.  Anyway never mind, it was good fun and reminded me to slow down and think a little bit more before I speak. 
       
      The quality of the teaching at this point is fantastic.  It's almost 100% Chinese which is great (although obviously spoken at more of a basic level so we can understand).  Our 'comprehensive' teacher relies very little on the book, and breaks off into his own little world all the time, which I actually really like as we end up getting all sorts of new words and culture points out of it.  He also teaches us things that we probably wouldn't learn for a while otherwise, like 公主病, 王子病,or how Q is commonly used in place of 可爱 on social media, or 3Q for 'thank you'!

      It's hard to know what to put in an update, but as I said, I would love to look back on this in 4 years and remember the start of this journey, so most of this is for me rather than anyone else!  But if anyone has any questions or anything, then please feel free to ask!
  • Recent Posts

×