Jump to content
Chinese-Forums
  • Sign Up

Official Guidelines on Data Entry


trevelyan

Recommended Posts

New Users

We recommend that new users start interacting with the database through the quick entry form. The advantage of this is that it logs your contributions in a way which makes it very easy for curators to review and tag entries grammatically. This entry system will recognize personal names, nouns and verbs (if provided in the infinitive) and tag them accordingly.

1. Submit nouns without definite or indefinite articles (the/a) unless absolutely necessary. Follow standard English conventions in capitalization, and provide a word in singular form unless it is always used in the plural. If an entry is plural, please tag it as such by adding the flag "PLUR" to the flag field in the database. The proper flag for a plural noun like 孩子们 is therefore "NOUN PLUR".

2. When submitting verbs, submit them in the infinitive. Verbs paired with objects and adverbs are fine. Just remember not to split the infinitive ("to translate happily" is acceptable but "to happily translate" is not). We're not prescriptivist grammar pedants: we need to provide entries in a standard form for the software to manipulate and conjugate. Stick bundled adverbs before or after the entry using your own judgment.

3. Avoid adding inferred meaning from the Chinese text. 克林顿 should be translated as "Clinton" not "Bill Clinton". 埃及总统 is likewise "Egyptian President" not "Nasser".

4. Let the software handle numbers, dates, times and other parts of speech that are predictable and can be handled by machines. If you see a problem with these sorts of units please post a notice pointing it out to developers. Improving handling of these units is best done through adjustments to the source code rather than adding words database.

5. For maximum effectiveness, tag words semantically after you add them. A full list of the ontological and thematic tags supported by the database is available here. The Quick Add form will attempt to automatically generate the appropriate tags for Personal names and plural nouns. Remember to provide all of the tags in CAPITAL LETTERS.

6. Pinyin is automatically generated by the software for entries missing pinyin in the backend database. If you wish to provide pinyin, please write it alphabetically with the tone-mark provided in numeric form following each character. 你好 is ni3hao3 and 女朋友 is nv3peng2you3. Neutral tones are acceptable and should be marked as the fifth tone (hai2zi5). Please segment pinyin according to common use.

7. Submit one definition per entry. If there are multiple possible definitions for a word, use the Advanced Editing form to create a second definition. This may seem a bit redundant for annotation, but it makes for a real improvement in the machine translation functions. The correct entry for the verb form of 是 is "to be", not "is/are/to be" or "is, are", etc. Let the software conjugate the verb, and inform us if any verb conjugates incorrectly so we can fix the issue in our verb database.

8. Feel free to ask questions below if you're not sure of anything.

9. Feel free to complain about sentences which do not translate/annotate correctly.

Experienced Users

Experienced users can edit database content directly through the advanced editing form. Please be careful if you do this because entries which do not fit these guidelines will not be handled correctly by the software.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...