Jump to content
Chinese-Forums
  • Sign Up

How to count unique characters in a document?


fredrik_w

Recommended Posts

Does anyone know how to count the amount of unique characters in a document?

In Word, I can see how many characters my document contains but I want to know how many unique characters there are. Can I do something with Word or is there any on-line tools that is possible to use.

Link to comment
Share on other sites

I don't think Word can do it, and I'm not sure if there are online tools for this either (it wouldn't be really practical to paste huge chapters into an online text box).

I simply wrote a tiny program which counts the number of unique characters in a unicode document. It's not very user-friendly, though.

Link to comment
Share on other sites

I did something similar to what renzhe is talking about. I usually use OpenOffice to work on Chinese texts. What I did was save the file to plain Unicode text and then I ran a python script on it to compute how many unique characters were present and their frequency and show the results in order of decreasing frequency.

Link to comment
Share on other sites

You might find this vocabulary profiler to be useful...

http://lingua.mtsu.edu/chinese-computing/vp/index.php

It can provide total character count, unique character count, frequencies of those characters, whether each character is in the HSK (or other) lists, and also do an analysis of the incidence of bigrams trigrams etc.

Link to comment
Share on other sites

  • 2 years later...

I think this Excel macro may help.

It basically compares whatever text is highlighted against a "reference" list and then gives a count of each character that occurs, split by whether it's in the reference list or not. If you don't want to compare against "known" characters, then simply delete the reference list.

If you are happy running the macro directly, rather than just pressing the analyse button, you simply select the text in your sheet and run the macro via alt+F8. (The macro is called "Main". I apologise!) Otherwise you need to cut and paste into the "Source" sheet of the macro workbook and select the text you want analysed.

Hope this helps. Let me know if you have problems.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...