Jump to content
Chinese-Forums
  • Sign Up

GB2312 character gone in UTF-8


Elrin

Recommended Posts

Hi there!

I'm trying to help my wife to be able to type all the languages we share (Chinese, Japanese, English, Afrikaans and Korean) on our computers.

So far, I've had great success with both Windows XP (IME) and Linux (SCIM smart pinyin). However, both of them suffer from the same odd problem...

It's about getting: gb_28.jpg and not:

When typing her family name 崔 (Cui), the character is almost right, but not quite. There is a mark below the 山 radical that should slant to the left - like for instance in the right-hand part of 凗.

We can find web-pages with the correct character, (encoded in GB2312) but when we copy and paste it into any UTF-8 application (like MS Word, etc.), it automatically gets changed to the wrong one. (The character you see above is actually a bitmap picture. and sadly not a character.)

It would be fantastic if anybody here knows of a way to get the correct character - especially using the pinyin input systems. Is there perhaps another UTF-8 character in the set that we don't know how to access? I'm not so sure if it is a font problem, since I actually tried all of the fonts I could find, and in UTF-8, they all have the mark slanting to the right - but I'm open for suggestions.

I am fairly computer-literate, and my wife is born-and-bred Chinese, but I can't read Chinese yet, and she is still learning computers... Since there is little overlap, it's proving to be tricky to search for answers they way we usually do.

Any help will be really appreciated! :help

Jaco Krüger

Link to comment
Share on other sites

When I write Cui in my Richfind text I will get the right character because I use encoding GB2312.

But when I write Cui in my email { Outlook Express 6} I will get the Wrong Character because Outlook Express 6 is always Programmed to open on encoding West European ISO .

However you can change this encoding

Link to comment
Share on other sites

When typing her family name 崔 (Cui), the character is almost right, but not quite.
The image and the character in your post appear identical to me. Is the image correct? Could you give us an image of an incorrect character?
Link to comment
Share on other sites

My UTF-8 PoV shows the typed character to have a dot that falls from right to left, whereas the photo has a dot that falls from left to right. When I ask tong wen tang for a simplified character, it shows one identical to the bitmap. I assume this is just a UTF font issue as zhongwen.com doesn't show any variant to cui1.

Link to comment
Share on other sites

Here's what I'm seeing, on Chinese XP, Firefox, in Beijing :mrgreen:

I have some notes somewhere for questions I meant to ask on font variants, which I suspect may be related. Will try to find them tomorrow.

Link to comment
Share on other sites

This is just a subtle difference between fonts. I've made a quick check on MS Word, checking several fonts, and the tiny dot stroke in 隹 is slanted to the right (like this: ) with, for example, SimSun and MS Song fonts, while it slants to the left (like this: / ) with a MingLiu font. This difference doesn't just affect the character 崔, but any character with that "bird" component (隹).

The form with the stroke is closer to the usual handwritten form, and seems to be used consistently by simplified fonts. The other version, with the / stroke, seems to be the more common one in traditional fonts.

That's why only those who use traditional characters by default are likely to see any difference, albeit slight, between the two characters in the original post.

Link to comment
Share on other sites

What I don't understand, Jaco, is why you find the variant form incorrect at all. Does your Chinese wife find any fault with the tiny dot slanting from left to right or from right to left? I wouldn't expect Chinese people to find the two forms of the characters different at all. It's the same as if you change a roman letter from say Arial to Verdana. The letter won't look exactly the same, but we would still recognise it as the same letter.

Link to comment
Share on other sites

Thank you for all your fantastic replies! I did not expect to see so much reaction to be honest - I am pleasantly surprised!

Has any consideration been given to changing her name? :wink:

Not really. She is considering adopting a western middle name though - mostly for the ease of people here in Canada. Apparently my South African sound-lexicon allows me and my family to say all the names correctly, but most mono-lingual Canadians have some trouble.

The image and the character in your post appear identical to me. Is the image correct? Could you give us an image of an incorrect character?

Roddy also seems to have the same problem. I wish I had the same as on your screen! Hehe... Here's a snapshot of the same thing in my browser: (Firefox on English XP)

myscreen.jpg

This is just a subtle difference between fonts. I've made a quick check on MS Word, checking several fonts, and the tiny dot stroke in 隹 is slanted to the right (like this: ) with, for example, SimSun and MS Song fonts, while it slants to the left (like this: / ) with a MingLiu font. This difference doesn't just affect the character 崔, but any character with that "bird" component (隹).

That is fantastic! I downloaded the SimSun font - and there it is! Now at least I know how to get the correct character. Now I can at least reproduce the right character on printed documents. (By the way Jose - you have left & right swapped - but no matter - I got what you mean) The difference is still a bit odd, but it is a workaround that will at least solve the problem in most cases. Thanks! I am in your debt Jose!

What perplexes me though, is that clearly the character with the left-slanting mark is there somewhere in the other fonts too - I just can't access it via UTF-8 or Unicode! If I use GB2312, it diplays correctly in all the fonts that display it at all, but if I use Unicode, it only displays correctly in some fonts. Perhaps it is mapped incorrectly?

Maybe some of you could tell me - is this character written differently in Simplified Chinese and Traditional Chinese? Maybe it was assumed the difference was small enough to merge the two ways of writing into one when the unicode mappings were created. The Japanese also use it slanting to the right. Hmmm...

I can fix the mapping, but looking though all those tables for the right character is not a task to be taken on lightly... :shock:

And lastly,

What I don't understand, Jaco, is why you find the variant form incorrect at all. Does your Chinese wife find any fault with the tiny dot slanting from left to right or from right to left? I wouldn't expect Chinese people to find the two forms of the characters different at all. It's the same as if you change a roman letter from say Arial to Verdana. The letter won't look exactly the same, but we would still recognise it as the same letter.

I also thought it was a difference just like the font differences between Arial and Veranda - but I have been thoughrougly made to understand by a semi-not-impressed wife that it is just not right the other way around. Since I'm not a native user of Chinese Characters, I'm not going to argue on that one for sure...

By the way - I also have a family name (Krüger) that does not always display correctly - and when people tell me to drop the umlaut on the "u", I tell them the exact same thing - so I understand her point quite well...

Once again - thanks for all the great replies! You guys have been a great help!

Link to comment
Share on other sites

Thank you for all your fantastic replies! I did not expect to see so much reaction to be honest - I am pleasantly surprised!

Has any consideration been given to changing her name? :wink:

Not really. She is considering adopting a western middle name though - mostly for the ease of people here in Canada. Apparently my South African sound-lexicon allows me and my family to say all the names correctly, but most mono-lingual Canadians have some trouble.

The image and the character in your post appear identical to me. Is the image correct? Could you give us an image of an incorrect character?

Roddy also seems to have the same problem. I wish I had the same as on your screen! Hehe... Here's a snapshot of the same thing in my browser: (Firefox on English XP)

myscreen.jpg

This is just a subtle difference between fonts. I've made a quick check on MS Word, checking several fonts, and the tiny dot stroke in 隹 is slanted to the right (like this: ) with, for example, SimSun and MS Song fonts, while it slants to the left (like this: / ) with a MingLiu font. This difference doesn't just affect the character 崔, but any character with that "bird" component (隹).

That is fantastic! I downloaded the SimSun font - and there it is! Now at least I know how to get the correct character. Now I can at least reproduce the right character on printed documents. (By the way Jose - you have left & right swapped - but no matter - I got what you mean) The difference is still a bit odd, but it is a workaround that will at least solve the problem in most cases. Thanks! I am in your debt Jose!

What perplexes me though, is that clearly the character with the left-slanting mark is there somewhere in the other fonts too - I just can't access it via UTF-8 or Unicode! If I use GB2312, it diplays correctly in all the fonts that display it at all, but if I use Unicode, it only displays correctly in some fonts. Perhaps it is mapped incorrectly?

Maybe some of you could tell me - is this character written differently in Simplified Chinese and Traditional Chinese? Maybe it was assumed the difference was small enough to merge the two ways of writing into one when the unicode mappings were created. The Japanese also use it slanting to the right. Hmmm...

I can fix the mapping, but looking though all those tables for the right character is not a task to be taken on lightly... :shock:

And lastly,

What I don't understand, Jaco, is why you find the variant form incorrect at all. Does your Chinese wife find any fault with the tiny dot slanting from left to right or from right to left? I wouldn't expect Chinese people to find the two forms of the characters different at all. It's the same as if you change a roman letter from say Arial to Verdana. The letter won't look exactly the same, but we would still recognise it as the same letter.

I also thought it was a difference just like the font differences between Arial and Veranda - but I have been thoughrougly made to understand by a semi-not-impressed wife that it is just not right the other way around. Apparently it not only slants in the wrong direction, but the stroke direction is also different. Since I'm not a native user of Chinese Characters, I'm not going to argue on that one for sure...

By the way - I also have a family name (Krüger) that does not always display correctly - and when people tell me to drop the umlaut on the "u", I tell them the exact same thing - so I understand her point quite well...

Link to comment
Share on other sites

The Simsun Font is a Chinese Font that is why you will get the Cui with .

Ms Mincho for example is a Japanese font and that is why you will get the Cui with / .

The Utc-8 code is a font for all languages who doesn't use the Latin alphabet for example: Arabic, Hindi, Korean but also Chinese and Japanese and since many Chinese and Japanse characters overlap such as Cui they will use the Japanese Version because it saves them the input of thousands of characters extra and since Japanese characters were already computerised they only had to input a few dozen Chinese Characters.

But you ''always'' can change the encoding

Link to comment
Share on other sites

I notice in some Mainland published dictionaries:

If the character is WRITTEN' date=' the cui is with

If the character is PRINTED, the cui is with /[/quote']

*

Well, this made me think about Radical number 163/172 (for fowl), it is always PRINTED with /, isn't it?

How is that radical normally WRITTEN? With / or with ?

Link to comment
Share on other sites

What perplexes me though, is that clearly the character with the left-slanting mark is there somewhere in the other fonts too - I just can't access it via UTF-8 or Unicode!

the official unicode table address 5D14 for 隹 is slanted left.

font designers make the choice about how it appears, in many cases slanted right.

when i convert from unicode to utf-8, i get this one as well...

http://www.pzlabs.com/soleri/s.php?q=%E5%B4%94

please check my table and see what i actually get (left or right slanted)..

http://www.pzlabs.com/soleri/zwbrowse.php?q=46

Link to comment
Share on other sites

  • 3 weeks later...

It's like what Jose has said:

cui1 with / , traditional Chinese

cui1 with , simplified Chinese, although many dictionaries published in Mainland also used / (in print)

One can see this (side by side - how do they do it?, it is in text format with utf-8 encoding setting!) if one fill in cui1 :

http://www.yellowbridge.com/language/worddict.html

You can also fill the radical 163 SC / 172 TC: zhui1 just to compare.

Link to comment
Share on other sites

Yes, they are NOT two different characters. They are the same character with different ways of writing it, just as the grass radical on top can be written in two different ways. The Chinese authorities in mainland China have decided on a standard way of writing characters, that's all.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...