Jump to content
Chinese-Forums
  • Sign Up

users manual for the advanced interface


hlk123

Recommended Posts

Unless you're dealing with traditional characters, the only thing you will need to change is the "Style" button, which Adsotates by default. The other options are:

Pinyin --> convert to pinyin

Toneover --> display the tone mark above the character

Pintone ---> display the tone mark and pinyin about the character

Vocab List ---> output a CSV list of all vocabulary in the text

Sounds --> clickable links to sound files with the pronunciation of each character

Echo Chinese --> just output the Chinese (useful for converting between simplified and traditional forms)

We can make this a thread to answer any specific questions people have though.

Link to comment
Share on other sites

Thank you.

I check/use also the Numeric Pinyin.

What for are the Conjugate and Grammar button?

How about a "literal translation" option? (about literal translation see Charles Li's Mandarin Chinese and Yip's Essential Grammar). I think the probability of getting a right literal translation is much bigger than a 100% English translation. :)

Example:

不是我的文法不好,是文法太难了

Adsotrans: translate

is my grammar is not good , are grammar too had difficulty

Literal translation:

Not BE I of grammar not good, BE grammar too difficult

Link to comment
Share on other sites

Conjugate --> turns on and off verb conjugation.

Grammar --> runs the text through a grammar parser. Generally improves performance but increases processing time.

If you turn off conjugation and grammar you will be closer to a literal translation.

Link to comment
Share on other sites

  • 4 weeks later...

How much text can adsotrans accept in the input field? There seems to be some sort of limit.

I'm trying to generate a vocabulary list for a document several pages long. What do you think the most efficient way to do that would be?

Link to comment
Share on other sites

(1) If you'd like to forward me the document I can process it for you. Alternately, (2) put the text online somewhere and feed it into the engine as a remote webpage. Be sure to provide the full URL ("http://...") and the engine will recognize and process it as a remote webpage. There is still a limit on text length, but it is quite high.

Example using Baidu:

http://www.adsotrans.com/new/traditional.pl?study=on&url=http%3A%2F%2Fwww.baidu.com&service=adsotate&conjugation=on&grammar=on&encoding=GB2312&encoding_out=GB2312&quality=high

Link to comment
Share on other sites

I've been trying to use this on the Chinese Radio site:

http://gb.chinabroadcast.cn/chinese_radio/index.htm

http://gb.chinabroadcast.cn/1321/2006/04/03/542@975260.htm

Unfortunately, results so far have been limited to either no response or gibberish. I would like to generate vocabulary lists from Chinese Radio broadcasts. I haven't yet tried cutting and pasting all the relevant content into a text file and posting it.

Am I doing something wrong?

Link to comment
Share on other sites

Did you remember to set the encoding to GB2312?

Just out of curiosity... For this site i noticed that they actually hard-coded the source code that they used (which is great), but alot of sites don't do that. Do you have any idea how the encoding can be detected then, other than by trial-and-error?

I've noticed that when i did some html with chinese, that the encoding you actually type the source-code in (or cut-and-pasted from) is what seems to stay for that piece of text. I have trouble sometimes after the point trying to go back and see what encoding some text was done in.

Any thoughts?

Keith

Link to comment
Share on other sites

Hey Keith,

I *think* you can check the encoding for a page in most browsers by clicking on "View-->Encoding", although I'm using a Chinese version of IE now so am not sure if that works on English operating systems if you're using one.

Most mainland webpages use GB2312 and most international webpages use Unicode though so just knowing where a website is hosted is usually enough. In a jam try checking to see if a website has an ICP license from the MII at the bottom of the page. If it does the content will almost definitely be in GB2312.

The easiest thing is probably just to ask the software to guess the encoding by selecting the "Guess" option on the advanced page. If the software gets it wrong send me a note with the link and I'll try to improve the encoding recognition algorithm. What we have should be pretty good at differentiating between GB2312 and Unicode though. The tough thing is occasionally figuring out whether texts are simplified or complex.

Link to comment
Share on other sites

  • 1 month later...

Input is UTF-8 because that supports both complex and simplified and the text is handed to the server using the encoding of the page in some browsers.

There's no real reason to have the output default as GB2312 if it is causing problems. Is there any reason to switch? There were a couple of punctuation marks that didn't translate well, but I tried to take care of those....

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...