Jump to content
Chinese-Forums
  • Sign Up

is everyone just using Unicode now?!


geek_frappa

Recommended Posts

  • 2 weeks later...

Unicode is far and away better than BIG5 or GB***.

When dealing with Chinese, the main reason for this is that you can happily intermix Traditional and Simplified characters with Unicode, but with the other encodings it's usually an either/or kind of thing.

However, in addition, Unicode also supports and allows you to intermix many other languages e.g. Thai, Korean, Arabic, Hebrew etc and even fictional ones like Tolkien's runic languages etc. Basically the aim of Unicode is to be all encompassing, and currently it does a better job of that, than anything else out there.

See this page for a nice introduction: http://www.unicode.org/standard/WhatIsUnicode.html

Now, there are those that will say "aha, but the recent (-ish) GB18030 standard has both Traditional and Simplified forms so you can just use that, and in addition, like unicode, GB18030 also has support for all these other languages". However if you do a bit of research regarding these encodings, you will find out that GB18030 is essentially just a mapping table that maps GB code points to Unicode. In fact, the GB18030 standard is defined mostly in terms of Unicode codepoints.

A great introduction of the issues involved for programmers can be found here:

http://oss.software.ibm.com/icu/docs/papers/gb18030.html

And an xml mapping table for the GB18030 standard can be found here:

http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-2000.xml?rev=1.4&content-type=text/plain

However rather than using this mapping table directly to convert between codesets, you're better of using the support of the OS you are using to do this. Under linux this can be done with libiconv, and under windows you can use the WideCharToMultibyte, and MultibyteToWideChar API calls.

Anyway, put simply, Unicode is a global standard, and GB18030 is China's way of maintaining backwards compatibility with previous GB standards, while still remaining future-compatible with the rest of the world. Plus they get the "face" of maintaining their own encoding system.

I'm not too sure on recent developments with the BIG5 encoding, so I can't really say too much on that. However separate encodings for separate languages are a remnant of an unconnected world, and don't really fit in with how things are today.

Personally, the sooner the rest of the world starts using Unicode, the better off we'll all be.

Link to comment
Share on other sites

Agree with Imron.

Big5 is inferior than GB and GB is inferior than unicode in most situations. Unless your visitors are all from Hong Kong/Taiwan, or from mainland China, you should use unicode.

I'm so happy that now I can mix up everything in my writing, traditional Chinese and a Japanese Kanji, then some kana and Vietnamese. 用統一码, 打廣東話又得, 写普通话也行, 日本語もできる、tiếng Việt cũng được. It's a dream finally comes true, and I am so glad to have finally seen this day by my eyes! haha.

Link to comment
Share on other sites

if you're using the Traditional Chinese IME 2002a under Windows XP, is there some way you can configure it to output in utf-8??? Because I don't like using NJStar Communicator, even though it can do unicode.

Link to comment
Share on other sites

if you're using the Traditional Chinese IME 2002a under Windows XP, is there some way you can configure it to output in utf-8??? Because I don't like using NJStar Communicator, even though it can do unicode.

The Windows XP IME always outputs in unicode. If you save the text to a file, depending on the program, you can save it in different encoding schemes (UTF-8, Big5, GB, etc.). If you submit the text on a web page (like this forum for instance), it submits it in the encoding scheme that the web page uses (which for this site, as roddy says, is UTF-8 ).

Link to comment
Share on other sites

And since Win2k, the Microsoft OS's have been built upon Unicode. All my webpages I do, whether I am using them on my site or on my PocketPC, I stick with UTF8. I agree with the posts above. UTF-8 in web pages is the way to go. The Big5 vs. GB is just a headache that will eventually have to come to an end.

K

Link to comment
Share on other sites

And since Win2k, the Microsoft OS's have been built upon Unicode

Actually, all versions of WinNT were Unicode too. It's only the 'home' OSes (3.xx, 95, 98, Me etc) that didn't use it. This is why Win2k and WinXP have it, because they are basically built on top of WinNT.

Link to comment
Share on other sites

  • 2 weeks later...
  • 2 weeks later...

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...