Jump to content
Chinese-Forums
  • Sign Up

build your own machine translation system (Chinese->English)


trevelyan

Recommended Posts

I'm looking for volunteers to help work on building rulesets and making suggestions for the next version of the Adso system. We have working code here:

http://www.adsotrans.com/downloads/v5/

Right now we're in the initial stages of teaching the software to recognize simple compounds. We're teaching it to recognize times, dates, personal names, and basic compounds. This is setting up the building blocks for more complex grammar analysis. Basic processing is still being done by the backend engine. But a lot of the advanced functionality is being put into external XML files that are (optionally) read by the engine.

I'll be iterating this file quite quickly as errors get fixed. If you're interested in helping out send me an email and I'll add you to the list. I'm not sure if this forum is the proper place to have ongoing and somewhat technical discussions. It all depends on what priorities people have.

Notes:

This is BETA software. The quality of the output is not as good as you'll find on the main site (http://www.adsotrans.com). This is because the old system has a lot of hand-coded rules that need to be re-written, re-evaluated and moved to the new system. There is no automatic recognition of place names, or verb conjugation in the new version yet, for instance.

On the other hand, this new system is infinitely cooler from an architectural perspective. It is also a lot more flexible. If you wanted to print a list of all of the Proper Nouns in a document, for instance, you could take care of that with a command like:

./adso -f [input] --extra-code " AND "

Link to comment
Share on other sites

  • 1 month later...

Wow! You're really doing a great project.

Every time we click "teach me" and we have to re enter the simplified chinese characters, and there's no way to input its part of speech. That's quite inconvenient.

And when a volunteer comes to make an input , it's great to ask them to input an example sentence too. That may help enrich your further dictionary.

Anyway, this project is amazing, both at its idea and technology.

Link to comment
Share on other sites

Zozzen,

I edit the POS entries when reviewing contributions, so it isn't that big a deal if someone doesn't provide them. The "quick add" script attempts to guess POS based on the english definition (ie. input verbs in the infinitive) anyway. If anyone wants to make bulk contributions just send me the data somehow and I'll bulk add them. If there's a good way to automate this I'm open to any and all suggestions.

We have space in the database for sample sentences, but I don't think it makes too much sense to ask people to provide them. For the ChinesePod dictionary we've just indexed all of the lesson content using Lucene and are outputting matching sentences for searches automatically. It works pretty well. We could get better results for news texts just by indexing tons of Xinhua materials, or classical terminology by indexing books like Dream of the Red Chamber.

Link to comment
Share on other sites

  • 2 weeks later...

I'm downloading the beta code to see if I could contribute. (seems too technical for me)

And this link is dead: http://www.adsotrans.com/downloads/v5/adso-v5.004.tar.gz

It seems "teach me" function doesn't allow users to re-write the definition of a word. Let say, 字 in the dictionary is currently defined as "word". While I want to add another definition (i.e. a floor ), the system doesn't accept the edit.

Link to comment
Share on other sites

Zozzen,

This is the old dictionary editing interface:

http://www.adsotrans.com/adso/uniedit.pl

Changes can also be made through the ChinesePod dictionary - they'll filter back to the project. One small note: adding new entries may not result in immediate recognition by the system when they are added manually though this form. The reason is that the content is added in GB2312 rather than UTF-8, and the dictionary needs to be updated before all of the data is copied into the appropriate tables.

Thanks for the link to the new treebank. Hadn't heard of it and am checking it out now...

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...