Jump to content
Chinese-Forums
  • Sign Up

Frequently Asked Questions


trevelyan

Recommended Posts

1. I'm trying to annotate/translate a remote webpage, but it comes out as gibberish!

Be sure you are specifying the encoding of the chinese characters properly -- Adso expects GB2312 by default. Please also note that Adso currently supports only the simplified Chinese character set. We are looking for volunteers to help with the conversion to the complex character set.

2. Adso segments my sentence incorrectly.

The most common reason is that Adso does not recognize a word as a semantic unit. Our database is growing daily due to contributions from volunteer annotators, but there are still many words and common phrases not present. If you notice one, please help us by adding it.

3. Adso segments a word incorrectly, yet all of the necessary words *are* in the database. What is happening?

The algorithm we use to segment Chinese text occasionally makes mistakes too. The usual cause of this is erroneously identifying lengthy Chinese words across the boundaries of the proper segmentation. As time passes, our segmentation algorithm is growing more sophisticated and better capable of handling these kinds of issues.

4. Adso segments a word correctly, but assigns it an incorrect part of speech.

Many words in Chinese function as different parts of speech in different contexts. After Adso segments a sentence into discrete units, it looks for grammatical patterns that fit together logically. One problem may be that the Chinese sentence is ambiguous. Another may be that the words in our backend database are not classified properly. A third might be a limitation in the way our software analyses Chinese grammar itself.

5. Adso identifies the word correctly, but provides an archaic, technical or stupid gloss.

Some of the entries in the Adso database have come from other Chinese-English word lists. In some cases it may seem like the most obscure definition was used for any given word. In other cases the gloss is simply incorrect. Please do not hesitate to edit the database to provide the correct modern gloss for these entries.

6. Adso identifies the word or phrase with a meaning that is close, but not entirely, correct. I want the software to provide a second definition.

Many Chinese words vary in meaning according to context, even as they maintain the same part of speech. Subtle shades of meaning are difficult for human translators to tease out and more problematic for machine translation. Because of this, we prefer to keep the most general translation of any word as the default entry.

For example, the word 精神 can be translated in different contexts as spirit, mind, consciousness, essence, gist, heart, energy, and so on. The Adso database currently contains only the "spirit" definition, since it applies in a general sense to people, ideas, and documents. Larger phrases in which 精神 usually has a slightly different translation may be added to the database separately; 精神世界 is in the database with the translation "mental world".

But should a word have multiple entries, it is possible to instruct the machine when to use different definitions. What happens in the process of translation is that every entry in the database is associated with a probabilistic number representing its likelihood. Every entry can be provided with word-specific markup to adjust these weights in certain contexts. When the word 天 is preceeded by a number, for instance, Adso leans towards the noun "day" rather than "sky".

If you would like to help the software select the correct definition, ask yourself "how do I know that this word should be interpreted with this meaning?" If you can come up with a relatively simple reason, we can implement that in the code. There are a number of existing rules you may use when starting off. A guide these rules is online here. Suggestions on new rules for the software to implement are also welcome.

7. Why do the Chinese characters appear in the translation/annotation rather than the English

This typically occurs when a database entry is missing information for either part of speech or the English definition. The reason these units are in the database is that they help the parser segment sentences into correct semantic units. Seeing the Chinese characters pop-up for these words is a visual reminder for users to flesh out the entry.

8. A character is popping up in the pinyin section.

This occurs when a character lacks a pinyin reading or is missing from the database altogether. Some characters are only found in the database as parts of other words, and as such have not been given entries of their own. Others have yet to have pinyin added, usually because they are relatively obscure. If you notice this problem, please take the time to add the pinyin to the database. This problemshould gradually disappear.

9. A character has incorrect pinyin.

Given the prevalence of duoyinci in the Chinese language, ensuring the correct pinyin readings for all characters and words is a non-trivial task. We are slowly correcting errors with duoyinci as we notice them. If you spot a character that is transcribed incorrectly, please call the issue to the attention of one of the developers. We review and correct all suspect character readings on a monthly basis.

10. I love this software. How can I help?

The most appreciated thing you can do is join the development community, adding and correcting databse content from which we can all benefit. Developers interested in grappling with the software itself should begin by reading how the database handles part-of-speech and other meta tags, as well as our explanation of supported markup for the CODE field. Monetary donations are always welcome to help pay for server costs as well.

11. How can I add a new word to the database?

The easiest way to add a new word to the database is through the QuickAdd interface. This invites you to enter the Chinese characters, english translation and (if you wish) your own name. All words added in this way are automatically classified as nouns; you may go into the database to change this if you wish, or you can leave it for the database review at the end of each month.

12. Is there any way to edit existing entries?

More control over dictionary content is possible through the main dictionary interface. If you try to add an existing word via QuickAdd, it will prompt you to load the main dictionary interface. One there, you merely add the missing definition and press "Update". Add multiple definitions by using the "Add Chinese word" box in the upper right-hand corner. Entering a Chinese word in this box will create an empty record which can be fleshed out manually.

13. If anyone can add content in real time, what quality control exists over the database?

Although user changes take effect in real-time, a bilingual panel reviews all submissions monthly before updating the master database. We have been very lucky to date in attracting an exceptionally talented pool of bilingual native-English speaking users.

14. When will Adso support the complex character set?

We would like to be able to start supporting the complex character set, but will need volunteers willing to help us review the conversions for words where there is more than one possibility. If you would like to help with this effort, please drop a line.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...