Jump to content
Chinese-Forums
  • Sign Up

best way to represent Chinese characters online


4fingers

Recommended Posts

Hi,

I hope you don't mind if I ask a technical question about the best way to represent Chinese characters on websites.

I see that some websites use Unicode e.g. & #20844; while other use the actual Chinese symbol e.g. 火.

Does any one know what the difference is between these two methods or have any tips on making sure the visitor is able to view the characters?

Thanks

Link to comment
Share on other sites

IMHO, it's best to use unicode-encoded characters and state that in the document header so the browsers know what to use.

I don't know what you mean with " actual Chinese symbol e.g. 火.". Each (non-Ascii) character is encoded in some way. The 火 you posted is actually a unicode character, only your browser interprets it correctly and displays the character.

Link to comment
Share on other sites

Depending on your target audience, you might also wanna go with either the simplified (GB) or traditional (Big5) rather than Unicode, since many Chinese-speaking users don't use Unicode. Though as renzhe suggests, it could just be a case of crappy coding, i.e. those websites where I have to manually change the encoding in my browser might not have been coded properly.

Link to comment
Share on other sites

No, go with unicode and make sure you specify that in the header of the webpage. The browser will then be able to correctly detect the encoding and adjust the display accordingly. There is no reason to use GB or Big5 encodings nowadays, and the sooner people stop using them, the better.

Link to comment
Share on other sites

yes it would make more sense this way.. .

So if you code the website correctly, it's not possible that a visitor will see 亂碼 at all? I understand that the 亂碼 I occasionally run into online might be due to wrong coding on the website, like when the website doesn't tell my browser that's set to Unicode that it is using GB or Big5... But don't some people force-set their browsers to a given character set? Wouldn't they see 亂碼 if the website was in Unicode?

Link to comment
Share on other sites

In most (all?) browsers, the force encoding setting only works for the current page/tab. So, you could force a Unicode page to display as GB (and get 乱码) after it had loaded, however once you open a new tab/page, then the browser will set the language based on the what is specified in the header of the page, or based on the default language of the OS if no appropriate header is present.

So, your best choice is always to use utf-8, and always set the header. If you see a page with 乱码 it's almost guaranteed that either they are not using utf-8, or they have not set the header, or both.

Link to comment
Share on other sites

So, your best choice is always to use utf-8, and always set the header.

So would an example of that be this:


Then there is the question of how I should represent the characters, either using escaped Unicode e.g. & #20844; or as a direct encoding:

IMHO, it's best to use unicode-encoded characters

Although imron seems to suggest that there is no difference between the two and popular sites like www.google.cn use symbols themselves.

After some reading it seems that Numeric character references are just used when the characters cant be directly encoded while in the editing process of the HTML document:

http://en.wikipedia.org/wiki/Numeric_character_reference

Numeric character references (NCR) are typically used in order to represent characters that are not directly encodable in a particular document. When the document is interpreted by a markup-aware reader, each NCR is treated as if it were the character it represents.

I assume there aren't any bugs about in web browsers that cause a NCR to be displayed as a different character than what it should be.

Link to comment
Share on other sites

I think that there is a trend towards unicode for Chinese sites nowadays, anyway, so it's less likely to be a problem in the future. For example, verycd.com and many other sites use utf-8.

In this day and age, one should only use something else if there is a strong reason. Especially if using more than one language.

Link to comment
Share on other sites

@4fingers, yes that is the correct way to specify the header.

I think renzhe and I are in agreement, it's just that there is perhaps some confusion on your part due to initially using slightly incorrect terminology. Normally when someone says "use unicode" as you did in your original post it doesn't refer to numeric character references e.g. #20844, but rather a direct encoding of Chinese characters in unicode (as opposed to GB2312, Big-5 etc). So when renzhe said "it's best to use unicode-encoded characters " he actually was in agreement with your second question asking if it was better to use "the actual Chinese symbol e.g. 火." (Side note, Chinese doesn't have symbols, it has characters).

Anyway, the answer, as you correctly deduced is to just use plain direcly encoded unicode characters. There is no need to use the escaped character references. I don't know of any sites that would do this, except those that demonstrate that such a thing is possible.

Link to comment
Share on other sites

Yes. What I meant is that you should type/cut'n'paste/enter Chinese characters (using a unicode locale, not GB or Big5), and indicate the proper encoding in the header (like the example you posted).

You definitely shouldn't write the whole webpage in numeric codes like & #20844; Such a webpage would display fine, but it would be almost impossible to edit and update.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...