Jump to content
Chinese-Forums
  • Sign Up

Simplified / Traditional variants


westmeadboy

Recommended Posts

bit of a vague title there but...

Suppose I have a traditional char A with simplified form B (not the same)

1. Can I reasonably (say, 90% sure) assume that B is not part of the Traditional character set?

2. Also, can I reasonably assume that A is not part of the Simplified character set?

Link to comment
Share on other sites

1. Cannot because in many cases few traditional characters were substituted by one simplified form during simplification process. For example: 只 in simplified script can be 只, 隻 or 祇 in traditional.

2. Also cannot. This time because GB character set was eventually extended to include most of the traditional characters. http://en.wikipedia.org/wiki/GB_18030

Link to comment
Share on other sites

Thanks very much.

When I talk about the character set, I really mean the set of traditional characters used in Traditional chinese rather than the computer-style character set, if that makes sense...

A) If I have a traditional character (which has a simplified variant), could it possibly match any characters in a piece of text written in Simplified chinese?

B) And vice-versa?

From your post I assume the answer to both is no.

EDIT: Actually from your example, it looks like 只 has a traditional variant 隻 but that 只 also appears in Traditional chinese.

That would mean the answer to the (A) is no, but to (B) is yes

Edited by westmeadboy
Link to comment
Share on other sites

In that case I misunderstood you.

There is no ambiguity in mapping from traditional to simplified characters. So if traditional character 祇 has simplified form 只, you can be sure that 祇 can't be found in simplified text (unless by mistake).

"Vice-versa" case is always possible, but if 90% confidence is really all you want, then consider this:

About 2/3 of reasonably frequently used traditional characters have same appearance in simplified Chinese. Now, out of all simplified characters, 82 have more then one traditional variant (according to the data found in Unihan database). So your confidence is about 97% if you consider all 2593 simplified characters having at least one traditional variant.

Can you tell me why this is important anyhow?

Link to comment
Share on other sites

Those stats are exactly what I wanted, thanks.

I'm writing some code to search the CC-CEDICT dictionary. I want to be able to search with either simplified or traditional chars and get the same results/entries, if that makes sense.

Link to comment
Share on other sites

I just discovered something. There are seven traditional characters, which have two simplified variants each. These are :瀋, 畫, 鍾, 靦, 餘, 鯰, 鹼. This is important because so far I was convinced that there are no such cases.

Here is the mapping:

瀋 -> 沈 渖

畫 -> 划 画

鍾 -> 钟 锺

靦 -> 腼 䩄

餘 -> 余 馀

鯰 -> 鲇 鲶

鹼 -> 硷 碱

Link to comment
Share on other sites

Westmeadboy, at the moment frequencies are drawn from here http://www.dataparksearch.com/, the one that's called Traditional.freq.

Glenn & chrix, you are probably right. By using data from CEDICT I only could confirm such split for four out of seven characters I found in Unihan. The other tree 畫, 鯰, 鹼 map to 画, 鲇, 碱 respectively.

Link to comment
Share on other sites

There is no ambiguity in mapping from traditional to simplified characters. So if traditional character 祇 has simplified form 只, you can be sure that 祇 can't be found in simplified text (unless by mistake).

This is also incorrect, although very many people seem to believe it. Many people also choose to use traditional characters on the computer because they think that they won't lose information if they convert to simplified character-for-character. Even wikipedia is wrong on this.

There is a small number of characters where the mapping of traditional characters to simplified is not unique. This was obviously not by design, it's just that the original (traditional) character set was already so rich with variants and meanings, so a perfect mapping was not always possible.

For example, the character 於 is simplified into 于 when used as a preposition, but is still written as 於 when used as a surname (N.B. 于 is also a surname, but a different one!)

The character 矇 is almost always simplified in the word 蒙胧, but is written as 矇 when it means "blind". You cannot always simplify it without looking at the word structure, because 蒙 has several meanings, and 矇 doesn't, so you might be introducing ambiguity. On the other hand, 矇胧 and 蒙胧 mean exactly the same thing. As does 濛扠, which is simplified to 蒙扠 and pronounced exactly the same. 朦胧, on the other hand, is pronounced the same, but means something different, and can't be simplified to 蒙胧. Everything clear? :)

I can't think of any other now, but I remember coming across a few more.

Link to comment
Share on other sites

Also one of my pet peeves is the assumption that 祇 and 纔 are commonly used in traditional: they aren't, most traditional texts use the simplifications 只 and 才, even though originally 祇 and 纔 might have been the way to write them.

Just one thing I wanted off my chest :mrgreen: (I'm not saying that anyone on this thread made this assumption)

Link to comment
Share on other sites

I also find that, despite a de-facto standard for both traditional and simplified character sets, the issue is much more complex than "simplified uses these characters here, and traditional uses those".

The phonetic loans, shorthands and other types of simplifications have simply always been a part of the Chinese language. The PRC performed a rather sweeping reform in the late 50s/early 60s which some people found to be too excessive and wanton, not without reason. But even so, many of the "new" characters were in fact old variants, handwritten variants and common loans. This has always been a part of the Chinese language.

他 was split off into 他 and 她 very recently (Lu Xun still used 他 for women), which is accepted in all regions, but 你 stayed only 你 on the mainland (妳 is the correct address for women in HK and Taiwan). 它 is considered the correct neutral pronoun, deprecating 牠 in HK (not sure about Taiwan). 那 used to be a synonym for 哪 but 哪 was split off in the 20th century. 台 is a common shorthand for 臺 even in Taiwan. We've recently had a thread where an archaic form 翫 of the character 玩 popped up, but 玩 is used in modern traditional characters. The list is endless.

So, regardless of how one feels about the political circumstances surrounding the simplification process in the PRC, it is simply more productive to see all these characters as variants and to use the correct variants in the proper context -- because you'll have to deal with this even if you only ever use one of the two standards.

Link to comment
Share on other sites

妳 is the correct address for women in HK and Taiwan

not true necessarily, at least not for Taiwan. It varies by person and by situation. Join Facebook and have a look, you'll be surprised :mrgreen:

Some people even use 他 to refer to women, but this strikes me as kind of archaic or overly formal usage.

Link to comment
Share on other sites

but 你 stayed only 你 on the mainland (妳 is the correct address for women in HK and Taiwan). 它 is considered the correct neutral pronoun, deprecating 牠 in HK (not sure about Taiwan).

I think it really depends on one's preference. I don't use 妳, but some people do. I don't use 牠 (but I was taught to use it on animals), and I don't think it is widely used nowadays. I am not sure what is taught in schools nowadays, though.

Link to comment
Share on other sites

There is no ambiguity in mapping from traditional to simplified characters. So if traditional character 祇 has simplified form 只, you can be sure that 祇 can't be found in simplified text (unless by mistake).

Not necessarily by mistake. 祇 exists in the simplified text as in 神祇.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...