Jump to content
Chinese-Forums
  • Sign Up

Traditional Support


trevelyan

Recommended Posts

We've fixed the issues with automatic traditional character recognition that character pointed out in another thread. The updated code (and database) is available for download. Anything from version v5-022 should work:

http://adsotrans.com/downloads/adso-v5.022.tar.gz

Have also edited our "advanced editing page" so that traditional characters can be edited. Right now we will fail to parse traditional words if they do not exist in our database, even if the simplified counterpart does. All about maintaining the integrity of the database.

Suggestions on how to improve the system for users/contributors who want to deal mostly with traditional Chinese are welcome. Do we need separate editing and annotating pages? I'm not sure but would like to make whatever changes are necessary to get the fanti crowd more involved.

More details on the Adso blog.

Link to comment
Share on other sites

Right now we will fail to parse traditional words if they do not exist in our database, even if the simplified counterpart does. All about maintaining the integrity of the database.

Automatic conversion seems dauntingly difficult: http://www.cjk.org/cjk/c2c/c2cbasis.htm

I guess the internet could be harnessed to see if traditional "matches" exist for simplified phrases. The results could be reviewed before inclusion in Adso.

Link to comment
Share on other sites

The academic team at ChinesePod is using some Adso-related tools to help with lesson preparation, which is helping us flag some of the issues that still exist with duoyinci and pushing forward the project.

Manual review is definitely critical. The best solution is really to find some people who are interested in this sort of thing and are coming at text analysis from a fanti perspective. Then religiously fixing the problems they complain about. :)

Link to comment
Share on other sites

Then religiously fixing the problems they complain about.

Going entirely to apache licensing would be favorite. :wink:

---------------

./adso -f file1.txt --code --extra-code " AND " > file2.txt

This produces an empty file. Do I need to be using the non-latin database for this to work?

Until this is fixed, is there any chance of an enhanced vocab mode which includes the pinyin in addition to everything else it outputs?

-----------------

./adso -f file1.txt -ie utf8 -is traditional -oe utf8 -os traditional --vocab > file2.txt

1) Wenlin says file2.txt has ~1200 UTF-8 format violations

2) Wenlin seems to be saying that the "U+3000 Ideographic space" in the input is processed into "U+FFFD Replacement character" (which displays as a control character).

Link to comment
Share on other sites

I'm generally happy to let people use the adso materials commercially provided they attribute the materials and contribute back to the project. I don't think it's onerous to send an email asking for permission.

On the traditional side, can you mail me the file you're using so that I can take a look at it myself. email address is david.lancashire at google.com. I think the command is working for me so I'd like to replicate things exactly. You are compiling from source right?

Link to comment
Share on other sites

  • 2 weeks later...

Thanks to pressure from Mark at toshuo.com, the annotation engine is now outputting popups in traditional characters (when input is traditional characters). Will be working on hooking up the editing functionality for the traditional stuff later this week and will post when that's done.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...