Jump to content
Chinese-Forums
  • Sign Up

Update to Comments by Kudra


trevelyan

Recommended Posts

Kudra posted a comment on some errors in an older thread here. Didn't want to bog down that discussion with a technical discussion, but did want to post something to detail the fixes which have been made (just now) and explain what is happening in case anyone is curious.

Please note that the changes discussed have been implemented in the development branch of the software, and are not yet working on the live Adsotrans engine. The public site will get updated with these changes the next time I review the database, probably in early January or late December.

1. 汉皇重色思 comes out as all 1 "word" 汉皇思重色. Notice the 思 has changed position! There is no english -- just the hanzi

1. Character reorientation was an error. Fixed. The software thinks this is a foreign name, but does not recognize it, which is why no definition is provided. The addition of "Han Emperor" for 汉皇 would have solved the problem as well.

2. 天生 gets stuck together, but with no gloss

The english definition had a trailing newline in it that prevented the popup from displaying. Will have to find out why that happened. I've seen it in a few other cases.

3. 回眸一笑百媚生, here 百 gets pinyin of yibai, not bai

Pinyin is traditionally reverse-generated according to the actual number. Have changed the code so that this does not happen for single numbers (ie. 百, 千, etc.).

4. 恩泽 has no english

This is usually a sign the dictionary does not know the word. Try adding it if you know it.

5. 日 gets translated as "Japan is". OK, adso is tuned for prose, not tang shi. no big deal.

Yup. Disabling the "grammar" option on the advanced page will speed things up slightly and reduce the addition of things like "is". Poetry is particularly problematic for NLP for a few reasons.

6. 遊 doesn't get adsotated. It gets converted into 閬� if the output is set to guobiao. Don't know what's going on here.

The character 遊 was missing from our GB2312 tables. I'll need to add it as part of the next database review. So the software is handling it in binary format and not converting it across encodings. I'll need to check that it is actuallyin the GB2312 (or GB18030) encoding, and map it to 游 if not.

I'm not having the problem with pinyin generation across lines, but the software on the main site was updated about 2 weeks ago, and that may have already fixed the problem.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...