Jump to content
Chinese-Forums
  • Sign Up

Basic Python module for adso


imron

Recommended Posts

Stemming from the discussion in this thread, here is a basic python module that will perform web-based queries against the adsotrans website and return the results as a list of tuples.

There are 3 files:

adso.py - the main module

adsotatepage.py - class that handles processing of the adsotrans webpage

test.py - simple test harness

if anyone was interested, it probably wouldn't be too hard to have a translatepage or a pinyinpage that would process and return the results from a translate or pinyin query.

To use the module, import it, and create an object of the Adso class.

I decided to write an Adso class rather than just having functions in the module, so that all the different adso options (conjugation, grammar, encoding, encoding_out, numeric_pinyin and quality) can easily be preserved across multiple calls. These values are set in the constructor, and are simply strings that correspond to the values passed to the adso url.

Default values are:

conjugation='on'

grammar='on'

encoding='UTF-8S'

encoding_out='UTF-8S'

numeric_pinyin='off'

quality='high'

To use, simply import the module, create an Adso object, and call the adsotate member function with the text that you want.

from adso import Adso

adso = Adso()

result = adso.adsotate( '你好世界‘ )

result will be a list of tuples containing the values (chinese, pinyin, translation), with one tuple per segment of text, ordered by the same order the segments appear in the original text. e.g. the above example produces the result:

[ ( '你好', 'nǐhǎo', 'hello' ), ( '世界', 'shìjiè', 'world' ) ]

Note: the encoding of the text you pass in should be what you provided as the encoding when creating the Adso object (defaults to utf-8 ).

Anyway, it's all pretty basic at the moment, and doesn't really do anything more advanced than generate a query to the main adsotrans webpage, and then parse the resulting html file. There's also very little in the way of error checking, so you'll get exceptions if you can't connect to the internet etc. It was done more as a proof-of-concept than anything else. Is this the sort of thing you had in mind Kudra?

BTW speaking of errors, I don't know if this is of interest to you Trevelyan, but the python HTMLParser says the output generated by Adso has malformed start tags at various places in the html. The w3.org validator reports errors in the same lines/columns, but it seems to be because it's treating the adso.zip

Link to comment
Share on other sites

@trevelyan -- that would be convenient. In my experience of parsing yahoo pages, it is always a pain when they change the html format. By essentially providing an api you or we python(or other lang) programmers wont have to worry if you change stuff around in the html.

Link to comment
Share on other sites

@bogleg - go for it, it's not even 100 lines of code, so I can't imagine it'd take too long. Though you might want to wait until trevelyan can produce a page with a more streamlined output.

@trevelyan - yeah, a more suitable format would be nice, and would certainly be more future-proof. Maybe just a simple XML file along the lines of:

你好

nǐhǎo

hello

(or less verbosely

:) )

You could of course add any extra other info that was relevant/useful (part of speech, simplified/traditional conversion etc). All of which (including the 3 listed above) could be toggled by parameters.

This format would also lend itself nicely to the other styles of queries (translation/pinyin), which would simply just have one segment containing the entire body of text with the appropriate pinyin/translation.

Link to comment
Share on other sites

  • 3 months later...

Ok. First file here takes in GB2312. The second takes in UTF8. Because of the need to support both simplified and traditional, both files return content in UTF8.

http://www.adsotate.com/adso/api-gb2312.pl?text=TEXT

http://www.adsotate.com/adso/api-utf8.pl?text=TEXT

There's no guarantee these files will stay online here. So if you set up anything using them send me an email so I can notify you if they move.

Link to comment
Share on other sites

  • 9 months later...
  • 3 weeks later...

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...