Jump to content
Chinese-forums.com
Learn Chinese in China

  • Why you should look around

    Since 2003, Chinese-forums.com has been helping people learn Chinese faster and get to China sooner. Our members can recommend beginner textbooks, help you out with obscure classical vocabulary, and tell you where to get the best street food in Xi'an. And we're friendly about it too. 

    Have a look at what's going on, or search for something specific. We hope you'll join us. 
westmeadboy

Simplified / Traditional variants

Recommended Posts

westmeadboy

bit of a vague title there but...

Suppose I have a traditional char A with simplified form B (not the same)

1. Can I reasonably (say, 90% sure) assume that B is not part of the Traditional character set?

2. Also, can I reasonably assume that A is not part of the Simplified character set?

Share this post


Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

cababunga

1. Cannot because in many cases few traditional characters were substituted by one simplified form during simplification process. For example: 只 in simplified script can be 只, 隻 or 祇 in traditional.

2. Also cannot. This time because GB character set was eventually extended to include most of the traditional characters. http://en.wikipedia.org/wiki/GB_18030

Share this post


Link to post
Share on other sites
westmeadboy

Thanks very much.

When I talk about the character set, I really mean the set of traditional characters used in Traditional chinese rather than the computer-style character set, if that makes sense...

A) If I have a traditional character (which has a simplified variant), could it possibly match any characters in a piece of text written in Simplified chinese?

B) And vice-versa?

From your post I assume the answer to both is no.

EDIT: Actually from your example, it looks like 只 has a traditional variant 隻 but that 只 also appears in Traditional chinese.

That would mean the answer to the (A) is no, but to (B) is yes

Edited by westmeadboy

Share this post


Link to post
Share on other sites
cababunga

In that case I misunderstood you.

There is no ambiguity in mapping from traditional to simplified characters. So if traditional character 祇 has simplified form 只, you can be sure that 祇 can't be found in simplified text (unless by mistake).

"Vice-versa" case is always possible, but if 90% confidence is really all you want, then consider this:

About 2/3 of reasonably frequently used traditional characters have same appearance in simplified Chinese. Now, out of all simplified characters, 82 have more then one traditional variant (according to the data found in Unihan database). So your confidence is about 97% if you consider all 2593 simplified characters having at least one traditional variant.

Can you tell me why this is important anyhow?

Share this post


Link to post
Share on other sites
westmeadboy

Those stats are exactly what I wanted, thanks.

I'm writing some code to search the CC-CEDICT dictionary. I want to be able to search with either simplified or traditional chars and get the same results/entries, if that makes sense.

Share this post


Link to post
Share on other sites
cababunga

I just discovered something. There are seven traditional characters, which have two simplified variants each. These are :瀋, 畫, 鍾, 靦, 餘, 鯰, 鹼. This is important because so far I was convinced that there are no such cases.

Here is the mapping:

瀋 -> 沈 渖

畫 -> 划 画

鍾 -> 钟 锺

靦 -> 腼 䩄

餘 -> 余 馀

鯰 -> 鲇 鲶

鹼 -> 硷 碱

Share this post


Link to post
Share on other sites
westmeadboy

When I search in YellowBridge, it only shows one variant. Is that because YellowBridge is not geared up to show multiple variants?

Share this post


Link to post
Share on other sites
westmeadboy

Thanks for the useful link.

Totally unrelated question - Any ideas where they get their frequency data?

I've been looking for a good source for a while now...

Share this post


Link to post
Share on other sites
Glenn
畫 -> 划 画

Is that right? I thought that 画 was from 畫 and 划 was from 劃.

Share this post


Link to post
Share on other sites
chrix

I agree with Glenn, also there's also 划 in traditional, mostly in the meaning "to row" (huá)

Share this post


Link to post
Share on other sites
cababunga

Westmeadboy, at the moment frequencies are drawn from here http://www.dataparksearch.com/, the one that's called Traditional.freq.

Glenn & chrix, you are probably right. By using data from CEDICT I only could confirm such split for four out of seven characters I found in Unihan. The other tree 畫, 鯰, 鹼 map to 画, 鲇, 碱 respectively.

Share this post


Link to post
Share on other sites
renzhe
There is no ambiguity in mapping from traditional to simplified characters. So if traditional character 祇 has simplified form 只, you can be sure that 祇 can't be found in simplified text (unless by mistake).

This is also incorrect, although very many people seem to believe it. Many people also choose to use traditional characters on the computer because they think that they won't lose information if they convert to simplified character-for-character. Even wikipedia is wrong on this.

There is a small number of characters where the mapping of traditional characters to simplified is not unique. This was obviously not by design, it's just that the original (traditional) character set was already so rich with variants and meanings, so a perfect mapping was not always possible.

For example, the character 於 is simplified into 于 when used as a preposition, but is still written as 於 when used as a surname (N.B. 于 is also a surname, but a different one!)

The character 矇 is almost always simplified in the word 蒙胧, but is written as 矇 when it means "blind". You cannot always simplify it without looking at the word structure, because 蒙 has several meanings, and 矇 doesn't, so you might be introducing ambiguity. On the other hand, 矇胧 and 蒙胧 mean exactly the same thing. As does 濛扠, which is simplified to 蒙扠 and pronounced exactly the same. 朦胧, on the other hand, is pronounced the same, but means something different, and can't be simplified to 蒙胧. Everything clear? :)

I can't think of any other now, but I remember coming across a few more.

Share this post


Link to post
Share on other sites
chrix

Also one of my pet peeves is the assumption that 祇 and 纔 are commonly used in traditional: they aren't, most traditional texts use the simplifications 只 and 才, even though originally 祇 and 纔 might have been the way to write them.

Just one thing I wanted off my chest :mrgreen: (I'm not saying that anyone on this thread made this assumption)

Share this post


Link to post
Share on other sites
renzhe

I also find that, despite a de-facto standard for both traditional and simplified character sets, the issue is much more complex than "simplified uses these characters here, and traditional uses those".

The phonetic loans, shorthands and other types of simplifications have simply always been a part of the Chinese language. The PRC performed a rather sweeping reform in the late 50s/early 60s which some people found to be too excessive and wanton, not without reason. But even so, many of the "new" characters were in fact old variants, handwritten variants and common loans. This has always been a part of the Chinese language.

他 was split off into 他 and 她 very recently (Lu Xun still used 他 for women), which is accepted in all regions, but 你 stayed only 你 on the mainland (妳 is the correct address for women in HK and Taiwan). 它 is considered the correct neutral pronoun, deprecating 牠 in HK (not sure about Taiwan). 那 used to be a synonym for 哪 but 哪 was split off in the 20th century. 台 is a common shorthand for 臺 even in Taiwan. We've recently had a thread where an archaic form 翫 of the character 玩 popped up, but 玩 is used in modern traditional characters. The list is endless.

So, regardless of how one feels about the political circumstances surrounding the simplification process in the PRC, it is simply more productive to see all these characters as variants and to use the correct variants in the proper context -- because you'll have to deal with this even if you only ever use one of the two standards.

Share this post


Link to post
Share on other sites
chrix
妳 is the correct address for women in HK and Taiwan

not true necessarily, at least not for Taiwan. It varies by person and by situation. Join Facebook and have a look, you'll be surprised :mrgreen:

Some people even use 他 to refer to women, but this strikes me as kind of archaic or overly formal usage.

Share this post


Link to post
Share on other sites
renzhe

See, it's even more difficult than I imagined!

Join Facebook and have a look

NEVER! :mrgreen::nono

Share this post


Link to post
Share on other sites
skylee
but 你 stayed only 你 on the mainland (妳 is the correct address for women in HK and Taiwan). 它 is considered the correct neutral pronoun, deprecating 牠 in HK (not sure about Taiwan).

I think it really depends on one's preference. I don't use 妳, but some people do. I don't use 牠 (but I was taught to use it on animals), and I don't think it is widely used nowadays. I am not sure what is taught in schools nowadays, though.

Share this post


Link to post
Share on other sites
chrix

oh right there is also 祂, for God, though I'm not sure if it's used consistently by Christians...

Share this post


Link to post
Share on other sites
skylee
There is no ambiguity in mapping from traditional to simplified characters. So if traditional character 祇 has simplified form 只, you can be sure that 祇 can't be found in simplified text (unless by mistake).

Not necessarily by mistake. 祇 exists in the simplified text as in 神祇.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...