Jump to content
Chinese-Forums
  • Sign Up

Download more Chinese characters in Windows XP?


Mark Yong

Recommended Posts

I currently use Windows XP's Chinese (Taiwan) setting for inputting Chinese characters (either by pinyin or using the sketch pad for characters which I do not know the pinyin for), and have a couple of questions:

1. Does anyone know how how many Chinese characters (particularly Traditional Chinese) are coded in the standard Windows XP set of fonts? Last I read, it was in the region of 12,000.

2. Does anyone know if it is possible to download any Windows XP compatible packages that include more characters beyond the standard set of fonts covering the 12,000 characters (particularly for obscure dialect characters)? Or does Microsoft provide updates for this?

Link to comment
Share on other sites

The number of characters is not limited by XP, but rather is related to the individual fonts you have on your system.

MS Arial Unicode has almost 39,000 characters, and should come with Windows. There are other fonts that contain more. See here for a list.

Link to comment
Share on other sites

Hi, imron,

Thanks for the information. Question: How I normally test whether a character is available in my fonts list is like this: As and when I am unable to input a character via pinyin, I go to the IME pad in Chinese (Taiwan) setting, and hand-write it out as accurately as I can - if the character does not appear in the list, then I assume it is not part of the character set. Is this the correct/best way to exhaustively look up a character?

E.g. I tested this using the character "4-dragons" (康熙字典 classifies it under the radical, and defines it as 龍行也). This character does not appear, which is surprising if the MS Arial Unicode has 39,000 characters (unless I was unlucky enough to choose the odd one that is not included!).

In general, how do I view the full list of characters available in my character font set(s)?

(Sorry, I am not very IT-savvy, so please bear with my rather rudimentary questions! :))

Link to comment
Share on other sites

One way to see all the characters in a font is to open MS Word, select Insert - Symbol, then choose Font. All the characters are displayed.

I'm sure someone will come along with something better, though.

MS Arial Unicode has almost 39,000 characters

Is that 39,000 Chinese characters, though?

Link to comment
Share on other sites

liuzhou wrote:

One way to see all the characters in a font is to open MS Word, select Insert - Symbol, then choose Font. All the characters are displayed.

Thanks for the tip. I just tried it, using several Chinese font settings on my PC, i.e. PMingLiu and TW-Kai. Of course, there is no way to practically count the number of Chinese characters in the list! :lol:

Also, the characters do not appear to be listed in any sensible order (for TW-Kai, they appear to be listed by radical, but it is not 100%), which means finding and selecting a character from the list is not a practical way to go.

So, unless I either (1) know the pinyin for the character (2) know the Cangjie/Wubi/etc. code for it (3) write it out accurately using my mouse/stylus, there is really no way I will be able to locate and select a particular (obscure) character that I want. :cry:

Actually, there is one way: If I go to the IME pad, select 部 (radical) as the input, and then search through the whole list of characters categorised under that radical by residual stroke-count, I should be able to locate the character - assuming I get the Kangxi radical/residual-stroke correct.

Question: Would the number of characters differ between different font sets? If the answer is 'yes', then my next question would be: Then which character set is being represented in the IME list in Windows XP? Let's say if I download a new font set with even more characters (e.g. say XP came with 39,000 characters, and I now download a new font set with 45,000 characters), would it then update the IME list on my PC with the additional characters not originally there? How does it work?

Link to comment
Share on other sites

It could just indicate that the handwriting pad doesn't know to look for that character. Pleco recognized what I'm assuming is the character you mean (Three dragon characters, one on top of two others, as though in a pyramid?), but I'm not sure if any of the free fonts I have at home will.

Link to comment
Share on other sites

Probably the best place to go would be the Unihan database.

First select the number of strokes in the radical. In the case of 龍 it's 16. So choose that one, and from the resulting page, select the dragon radical. On this page, be sure to click the use utf-8 checkbox before hitting submit. Checking this will make the next page use the fonts installed on your computer, otherwise, it will use images.

This will then show a list of all characters that use that radical (or you can limit it by the number of strokes). If you see a question mark, it means that your fonts don't support that character. You can click on the individual characters to view a more detailed version. You can also click on the question marks to see an image of what character they actually represent. You can also copy and paste that character into another program such as word.

Anyway, it seems

Link to comment
Share on other sites

imron wrote:

If you can see it too (instead of just a large ?), then it means that it is perhaps an issue with your IME, rather than with your fonts.

http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=2A6A5&useutf8=false. I clicked on the character to go to the relevant page. In the column "The Unicode Standard", I see the character refglyph?24-2A6A5 (displayed as an image). However, in the "Your Browser" column, I get a large ? where the character should be. Guess that means it is an issue with my IME. Any way to increase the number of characters recognisable by IME?

imron wrote:

The IME is separate from the font system. If the IME doesn't support the character, then there is no way to type it, even if you have a font with that character available. In these situations, your best bet (even though it's significantly slower) will be to copy/paste it from the Unihan page I linked to above.

That means I will be copying-and-pasting the character as an image, right? :cry:

Link to comment
Share on other sites

No, the '?' means that you need a new font. And you'll still be copy/pasting, but you won't be copy pasting the image. There is a fairly large difference :).

There'll be fonts out there that will display it. I'm a little surprised the default Windows ones don't. (Not so surprised about the default Linux ones :)).

Link to comment
Share on other sites

Hi, ipsi(),

Okay, so if I understand you correctly, I now need to find, download and install a new font set that includes this character, if I am to view it correctly on the screen instead of seeing a large '?' in its place. Any recommended places where I can download a good set of fonts?

Hi, imron,

If I understand you correctly, downloading the necessary font will allow me to display the character, but this does not mean I will be able to input it, because it is not supported by my IME? If so, is there any way I can enable my IME to support new characters not already in its database?

On a separate note: I have heard that it is possible to input a character just by keying in the Unihan code / Big-5 code / etc. How does that work? Where does one key-in the code to generate the character (if, let's say, I knew the code for the character)?

Link to comment
Share on other sites

Mark, While I'm afraid I can't answer your questions, you at least understand correctly :).

It can be possible to input characters directly via their UTF-8(16, etc) code, but I'm not sure how to do so on Windows. I think it can be done in Word with something like [alt]+

, or something. Not entirely sure, sorry.
Link to comment
Share on other sites

As ipsi() mentioned, the large ? means you just don't have the font. Actually, you could copy and paste that large question mark into a post, or a word document, and anyone who did have the correct font would see the character and not the question mark.

Also, when talking about copying and pasting, that was what I meant, to copy the text and not copy/paste the image.

For most IMEs, there usually have a way to add characters to a dictionary. I just tested it now with the google IME. I can add the 4-dragon

Link to comment
Share on other sites

If you don't need to share the document electronically with others but are satisfied with storing and printing it, the four dragons and lots of interesting non-standard character varieties are included in the Mojikyo fonts. I suppose it would even be possible to make a pdf file from your document, but I haven't tried that yet.

Link to comment
Share on other sites

To get more info about a font file, install the MS Font properties extension:

http://www.microsoft.com/typography/truetypeproperty21.mspx

You can also install the trial version of FontLab Studio of AsiaFont Studio (saving is crippled but reading & browsing is fine):

http://www.fontlab.com/

Last, if you want really to edit Asian fonts, you can install FontForge, but that's not easy:

http://fontforge.sourceforge.net/

Very few fonts will have your 4 dragons character because it's a 3/4 bytes character and most of the fonts only covers a part of the 2 bytes range. On my computer, only Simsun Founder extented (sursong.ttf) has it, normal Simsun does not have it.

If you want to try some new fonts, type 字体 in google :)

Link to comment
Share on other sites

  • 1 year later...

Reviving a thread that I started, but did not quite get down to the answer I needed...

Using the same "4-dragons" example again. I looked it up in the Unihan website as suggested by imron above. It displays the link to the character as a graphic, but when I click the link, where the character should be displayed is just a box with the Unicode for it, i.e. 02A6A5. Now, when I copy-and-paste that 'box' into Google, and do a search on it, I do get results on the "4-dragons" (the first link is to zh.wictionary's definition page of it).

Now, I suppose this means that my Windows XP does not have this character in the character set, so it cannot be displayed. So, back to my original question: What can I do in order to be able to get them into my OS's character database, such that I can:

1. View such characters correctly in my browser

2. Generate/type them out

I realise I am phrasing the question in rather non-tech and laymen's terms. :oops:

(BTW, I am using the "4-dragon" character just as an example - I have other far more useful characters - mostly dialect ones - that I want to generate, but cannot.)

Link to comment
Share on other sites

What web browser are you using?

This is purely a font issue. As long as you have a unicode font that contains the character, then it should display if you select that font (and in fact even if you don't select that font because the OS should perform font-substitution for you).

Link to comment
Share on other sites

I am running Mozilla Firefox 3.5.2.

Okay, let's say I do have the Unicode font that includes a particular character. Does it mean that if, let's say, I:

1. Open up MS Word

2. Go to Insert > Symbol

3. Select the font (SimSun, PMingLiu, TW-Kai, etc.)

4. Search for the character I want (for simplicity, let's use the "4-dragons" again - so I scroll down to all the characters with the dragon radical, and search for it

... I should then be able to locate it, if that character is in my character set?

Using the "4-dragons" case, I searched through all the three font sets I listed above. They had 3 dragons , but not 4.

So, what you mean is, I do not have the Unicode font that contains the "4-dragon" character (amongst others). Where I can I download and install them, then?

Okay, to put things in context: I am trying to generate the character for Hokkien word bue, which is a fusion of and . I have seen it printed in books before, but have no idea how to generate it in Windows XP, so I wonder how the printers did it. Strangely, neither have I seen it displayed on any website before.

Link to comment
Share on other sites

Ok, it turns out I have only one font on my system that contains the four dragons character. The font is called 宋体-方正超大字符集 and the filename is sursong.ttf. A google search for sursong.ttf should provide clues on how to obtain it.

Regarding the other characters, how are 勿 and 會 combined (left/right, top/bottom etc). I did a quick search on the unihan page and couldn't find the appropriate character (though there are quite a few others with 勿 as the radical).

It's also quite possible that the publishers of books with this in it created their own specialised font to print it.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...