Attention Software Developers

January 4, 2009 at 10:22 AM

We've just released a new version of the Adso software with better support for fellow software developers. Specifically, you can now easily incorporate Chinese-English translation, text analysis, segmentation and hanzi-to-pinyin conversion functionality in your own applications. The word easy doesn't actually begin to describe how painless this process is. We're talking about the ability to invoke a machine translation engine in a single line of code.

I'd encourage anyone interested in Chinese software development to check it out. Usage is free for Chinese-English translation, hanzi-to-pinyin conversion and text segmentation - everything you need to build incredible next-gen learning and reference applications. The software is available for download at:

http://adsotrans.com/downloads/

A quick write-up that will get you started is here. I'd recommend starting here if you don't have any experience with Adso:

http://adsotrans.com/blog/developer-corner-adso-with-your-own-cc-application/

Feedback and questions are always welcome here, by email or at Popup Chinese. If you aren't a programmer but want to help us in our effort to produce great, free NLP software for students and developers, the best way is to help spread word about what we're doing and contribute missing content to our ever-expanding linguistic database through the online Popup Chinese dictionary.

January 28, 2009 at 10:43 AM

adso-v5.058.tar.gz doesn't build for me, I get:

~/Download/adso-v5.058/scripts/compile_binaries> ./run
g++ -o database.o -c ghost_database.cpp

In file included from ghost_database.cpp:2:

ghost_database.h:19: warning: ‘typedef’ was ignored in this declaration

g++ -c adso.cpp

In file included from adso.cpp:1:

adso.h:21: warning: ‘typedef’ was ignored in this declaration

In file included from adso.cpp:11:

ghost_database.h:19: warning: ‘typedef’ was ignored in this declaration

adso.cpp: In member function ‘int Adso::UTF8_C_word_lookup(std::string)’:

adso.cpp:269: error: ‘strcmp’ was not declared in this scope

adso.cpp: In member function ‘int Adso::UTF8_S_word_lookup(std::string)’:

adso.cpp:309: error: ‘strcmp’ was not declared in this scope

adso.cpp: In member function ‘int Adso::word_lookup(std::string)’:

adso.cpp:358: error: ‘strcmp’ was not declared in this scope

adso.cpp: In member function ‘std::string Adso::wordstr_lookup(std::string)’:

adso.cpp:372: error: ‘strcmp’ was not declared in this scope

make: *** [adso.o] Error 1

February 2, 2009 at 09:26 AM

Trevelyan,

I've looked at this a bit further. The code for 5.058 is not compiling at all; did you post the correct source?

For example:

lines saying 'typedef struct xyz;' - this does not compile, seems you need to remove the 'typedef'.
several libraries are not included when they are used, such as cstring and cstdlib
multiple parameters with the same identifier in a method definition, for example in code.h

There are probably more things, but if the code does not compile then probably it needs to be debugged as well.

I am using gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC).

Edited February 2, 2009 at 09:31 AM by m.ellison
more info

February 3, 2009 at 06:30 PM

Hmmm.... the code has compiled on pretty much every system I've tried save one (details below). I don't know if I've tried compiling it with g++ 4.3.2 (our server has 4.1.2 installed) but we've been compiling successfully with GCC since at least 2.95). Adso 5.058 is also running on at least two or three servers with different distributions at this point, so I'd lean towards a systems issue or perhaps a bug, especially since the C++ compiler is complaining about typedef. There's very little we can do if the compiler itself has problems. If there is an issue with the software we can fix it though.

I'll check out the latest version of g++ and see how compilation goes at any rate. The one system known to have problems compiling from scratch has been a 64-bit Red Hat distro. The problem there was a unicode-related bug in one of the system libraries that choked when compiling/linking because of the UTF8 content in the source file polisher.cpp. Deleting all of the UTF8 content in that file solved the problem. It's not an ideal solution. Encodings are a recipe for disaster whatever you do though.

Specific details on your are welcome (which version you're trying to compile, mysql/internal/sqlite etc.). If you wanted to set up VPN access or create a sandboxed account with the software in it I could always try to SSH in and look at the problem myself. Send me an email if you wanted to try that. I'll follow up on the compiler to see if that's an issue here too.

February 6, 2009 at 03:01 PM

I've just confirmed that Adso compiles properly on 4.1.2 when downloaded directly from the server. After looking into your bug report I looked into the compiler versions and noticed that the 4.3.* branch is the development and non-stable branch. 4.2.4 is the latest stable release. Upgraded my Ubuntu distribution to 4.2.1 since that is their latest release and had the software compile without a problem as well.

There may be a way to get the software working under 4.3.2, but unless the problem shows up in a stable GCC release it is almost certainly an issue with the GNU tools rather than us (we're pretty clean c++, although some of the inheritance is a bit tricky).If you figure out a way to work around the compiler problem while keeping the code clean I'd be happy to apply the patch. In the meantime, I'd suggest grabbing the latest stable release if you need to compile.

February 7, 2009 at 06:01 AM

Trevelyan, the GCC page (http://gcc.gnu.org/) tells me that the current release is 4.3.3 and the development branch is 4.4.0.

I am using the latest version of Fedora namely Fedora 10 updated using yum to the current patch level. I know that Fedora sometimes releases beta versions of software as production but they seem not to have done so this time; Fedora's gcc and g++ are at 4.3.2.

The GCC pages report that significant changes between 4.2 and 4.3 as set out in http://gcc.gnu.org/gcc-4.3/porting_to.html. Adso has been broken by the library changes (see "Header dependency cleanup" in that page) and also by some new error messages.

I am not sure if I have time, but I'll try to help with patches if I can.

February 7, 2009 at 11:34 AM

Oh, wonder where I was looking then. I'll try to download/compile 4.3.3 then.

February 8, 2009 at 12:42 AM

Compilation issues with GCC 4.3.3 are now fixed. Thanks m.e. I've also updated the database with the latest data.

http://popupchinese.com/downloads

One additional feature worth note is the new --tone-sandhi option. It is still under development (runs of three or more tones are not changed), but it already handles the basics and may be suitable for text-to-speech purposes.

February 8, 2009 at 05:32 PM

Trevelyan, now some questions about the AdsoInterface class. I notice the member functions (eg pinyinize) take and return std::string objects. Are these UTF8 encoded? Or how? Strictly speaking std::string is only for ASCII (single byte) data. The translations and pinyin forms could be ASCII only, or are they in UTF8?

I've worked this out mainly now; the strings are UTF8-encoded.

Edited February 9, 2009 at 09:51 AM by m.ellison
more info

November 1, 2009 at 08:38 AM

Hi trevelyan

First off, thanks for creating such a great product! I've only just started to use it, and it looks really awesome.

Here's my feedback so far:

my@you:~/dev/ruby/dict/freq/adso/source$ ./adso --help
...
 -h, --help                        print this reference 
...

yet

sea@cal:~/dev/ruby/dict/freq/adso/source$ ./adso -h
Welcome to Adso. Enter Chinese text and it will be processed as per your command-line options. If you are unsure of what to do, type "quit" and then type "./adso --help" at the command prompt for instructions on using the software
>>

Further

sea@cal:~/dev/ruby/dict/freq/adso/source$ ./adso -i 我很喜欢吃中国菜  
sea@cal:~/dev/ruby/dict/freq/adso/source$ 
sea@cal:~/dev/ruby/dict/freq/adso/source$ ./adso -i 我很喜欢吃中国菜  -t
I very to like to eat Chinese food

IMHO, it would make sense to specify -t as default when none of -t,-y,-cn are specified. It is, after all, rather unlikely that you would want _no_ output at all.

Lastly

sea@cal:~/dev/ruby/dict/freq/adso/source$ ./adso -i 干 -t
to fuck

Is that really the most likely meaning of 干?

In case you need this info

>>sea@cal:~/dev/ruby/dict/freq/adso/source$ !:0 --version
./adso --version

Adso Chinese Text-Analysis System v5.068: (c) David Lancashire, 2009 

Chinese translation and text analysis engine. Inquiries welcome: david.lancashire@gmail.com

Thanks again, Sir Lancashire!

Regards

mke

November 16, 2009 at 07:47 AM

Thanks Mike. I'll take some of these in mind.

Not sure what the default for 干 should be without much context. Disambiguation is tricky but specific suggestions are always welcome. "to do" may be better.

Edited November 16, 2009 at 07:57 AM by trevelyan

February 15, 2010 at 06:30 PM

Happy Chinese New Years everyone. The latest version of Adso is up with a more expansive dictionary and some underlying improvements to the engine as well. Specific updates:

February 14, 2010
- updated internal compile version to add support for traditional character

input in UTF8. Not previously supported.

February 1, 2010

- eliminated infinite loop in edge-case script detection, particularly for

traditional characters missing from the database.

Janaruy 20,

- rejigged database production scripts to shorten time-to-generate and speed

up the pace of database/dictionary development. Should see more frequent

releases at this point.

- subsequent fixes to minor problems raised by the revision in database

format. particularly with regards to numbers, etc.

January 3,

- added -n flag for better integration with other Unix applications.

- fixed bug that would result in traditional entries added to an initial word being

reduplicated in certain circumstances.

October 1,

- better auto recognition of verb+complement phrases with µÃ and adjective/adverb

complements.

September 3,

- added --deconstruct-phrases command-line option. This parses text using phrase-level

data, but then breaks down the phrases into their constituent parts using the

translation information available to identify the best part of speech of sub-units.

This is different from the --no-phrases option, which ignores phrase-level data

in the database.

Enjoy.

February 19, 2010 at 07:01 AM

Hi All,

Anyone have any starting points on how to incorporate adsotrans in to a C++ application under Windows?

I found the main.cpp example, and it looks simple enough to use adsoInterface, but my problem is getting the LIB file built...

Cheers

Sample

February 24, 2010 at 07:13 AM

Trevelyan: do you have a repository (git/svn/...) for the code? 100MB is a lot to download every time.

Samplehead: I have written an application using adsotrans called zwdisplay, but it uses wxwidgets rather than the MS API. Details at http://zwdisplay.sourceforge.net/.

February 25, 2010 at 07:05 AM

Hey Martin,

The bulk of the data is the database being stored in various formats (mysql, sqlite, and the internally-compiled version). What we should do is keep the latest.

We don't have a GIT repository yet, but I can look at creating one. Should also be easy to allow people to navigate the latest distribution and download specific files rather than the whole bundle.

Best,

--david

September 16, 2010 at 06:15 AM

David,

Did you get anywhere with the git repository?

Martin

Sign In

Attention Software Developers

Recommended Posts

trevelyan

Link to comment

Share on other sites

m.ellison

Link to comment

Share on other sites

m.ellison

Link to comment

Share on other sites

trevelyan

Link to comment

Share on other sites

trevelyan

Link to comment

Share on other sites

m.ellison

Link to comment

Share on other sites

trevelyan

Link to comment

Share on other sites

trevelyan

Link to comment

Share on other sites

m.ellison

Link to comment

Share on other sites

m_k_e

Link to comment

Share on other sites

trevelyan

Link to comment

Share on other sites

trevelyan

Link to comment

Share on other sites

samplehead

Link to comment

Share on other sites

m.ellison

Link to comment

Share on other sites

trevelyan

Link to comment

Share on other sites

m.ellison

Link to comment

Share on other sites

Join the conversation