Jump to content
Chinese-Forums
  • Sign Up

Beta Test a New Site - NewsinChinese.com


roddy

Recommended Posts

After weeks of blood, sweat and tears . . .

I'm happy to present, in very close cooperation with Adsotrans.com, the all new NewsinChinese.com

On this site you can

1) Read Chinese news articles from Xinhua.com with pop-up help including HUGE characters, pinyin AND english:

newslogo.gif

2) Add to and edit the Adsotrans.com database - and then see your additions used in subsequent articles.

3) Post questions on things you don't understand and hopefully get the help you need.

And what's the catch?

This is still very new, and while it seems to be working very well, we'd like to throw it open to forum members to beta test before it's officially launched - so we're hoping to get plenty of feedback on how usable and helpful you find the site, and how we can improve it.

To get you started . . .

Front Page - the latest news stories.

Headlines - at-a-glance page giving you the last 10 active discussions and the latest 10 headlines from each category we currently cover.

Feedback is VERY welcome, either here or via the Feedback or Bug Report forms on site.

Many thanks

Roddy

Link to comment
Share on other sites

wow, over 130,000 entries in the database ~~ how long has this database been going for? I noticed that there is no conjugations, etc. So I can imagine this would be good for an intermediate chinese learner, who knows the grammar, but has a very limited vocabulary.

Let me be the first to say NICE WORK!! :clap

I noticed all the code is in C++, have you thought about making a stand alone client? It would resolve the speed problems with translations over the net..呃.

PS. Thanks for making it open source!

Link to comment
Share on other sites

Very useful site!! :D:D

My only comments:

1) Would there be mainly headlines with single paragraphs linked to source articles?

or

Would there be full articles in which people could use the pop=up software?

2) When I click on "discussion" it doesn't seem to go through

Link to comment
Share on other sites

Not sure what your problem with the 'discussion' link is - however, if you do get there, you'll find a 'full' link which you can click to have the full article annotated and made available.

I really can't think of any problem that would stop you using the discussion link (which now I think about it should probably have a different name).

Roddy

Link to comment
Share on other sites

wow, over 130,000 entries in the database ~~ how long has this database been going for?

The database has entries from CEDICT (~25,000) and the LDC (~75,000) along with our home grown and edited list. The LDC makes up a huge chunk of the content, but has a lot of questionable content and much has been removed. The source for all entries is noted in the database.

I personally started adding words about three or four years ago -- mostly in the telecom field to quickly scan high tech news items for words of interest, like "Wave-Division Multiplexing" (波分复用). We've been lucky to have a few bulk contributions from people like Mark Swoffard (www.pinyin.info). Mark contributed a lengthy list of 2,000 Chinese place names, which has really helped with the parsing and handling of domestic news items.

I noticed that there is no conjugations, etc.

There was experimental support in an earlier version, but it hasn't been reimplemented yet. A problem is finding a reliable and free database of conjugated English verbs. Another is getting the grammar parser good enough that the software knows which word to conjugate. Improving the grammar parser is basically the highest priority right now, although if there are volunteers.... ;)

I noticed all the code is in C++, have you thought about making a stand alone client? It would resolve the speed problems with translations over the net.

This is a manpower question. We could release a Windows binary if someone wanted to develop tools to let us port the database to something like SQLite, and give me an example of how to submit and receive queries from it using C++. The code interacting with the database is contained in a single class while the rest of the code is standard C++ and portable enough.

It should be possible to get the software working on any platform that supports MySQL though. I used to have it hooked up to Apache in both Windows and Linux and surf the web that way. Perhaps because Windows versions of MySQL are not compiled from source -- the software was about 10 times slower running there than on Linux.

On a side note, the version available for download is a bit outdated. Send a private message if you want to download the latest edition and I'll update the version available for download. There have been significant changes in database and software structure over the last month and a half.

Link to comment
Share on other sites

On a side note, the version available for download is a bit outdated. Send a private message if you want to download the latest edition and I'll update the version available for download. There have been significant changes in database and software structure over the last month and a half.

Just updated. Interested parties can download from the usual site.

Link to comment
Share on other sites

Why Xinhua? It wouldn't have been my first choice for balanced information. Following on from that, why not do the same as popjisyo and let readers paste in text from their own sources? I can see that the newsinchinese system allows discussion of the articles, but will there really be enough people talking about each article to make it worthwhile? News articles do tend to be here today, gone tomorrow.

Link to comment
Share on other sites

Let us know what sites you'd like to see and perhaps in the near future we can take a look at those. If they have RSS feeds it'll be a lot easier, but I didn't find any providing any greater quality of content than Xinhua.

Roddy

Link to comment
Share on other sites

The adsotrans server is in Beijing, so some foreign sites are firewalled and there may be lag on Adso's request for foreign content too, and depending where you are trans-Pacific lag might be significant (Roddy's server is in the US, so access times should be faster). If you can access Xinhua though you should be able to get through to Adso.

On the timeout front.... try submitting text before processing webpages. Time scales with amount of text submitted. I'm in Beijing and the processing time for the entire Xinhua frontpage is about two minutes. Tremendous amount of text to process there.

When the software is run from the command line there are ways to speed things up (or slow them down) by changing the amount of time the software spends in grammar analysis, etc.. These controls aren't online yet.

You may also want to check that the webpage you are submitting is GB2312 -- Unicode sites need the encoding specified on the "advanced" page.

Finally, if there are any pages that SHOULD load but don't please pass them along or submit them below. That information is tremendously useful for debugging. Occasionally there is an issue in the source code or a problem with the database not containing obscure characters that can cause strange output. Easy to fix these problems once they're noticed.

Link to comment
Share on other sites

I like the newsinchinese website, it is definately a more interesting way of building-up vocab than working through word lists. Thanks for all the work put in on this.

There is one modification I would like to suggest. A lot of the definitions are built up from several characters, would it be possible to have a method of also getting the defintions of the individual characters in order to help understand and remember the more complex defintions.

Link to comment
Share on other sites

I agree with john here -- it's also important to have knowldege of individual character meanings, not just words alone! Maybe newsinchinese.com can include links to zhongwen.com's character pages as well. I found zhongwen.com the best resource for character meanings.

Link to comment
Share on other sites

These are excellent ideas. That being said, I'm not sure they're feasible at present. Doing this will require an enormously sophisticated understanding of how Chinese words are built up from characters, and perhaps read a systematicness into the language that is not generally present.

We have space in the database for details on word usage and construction. The best place to add information on word construction and meaning is there. We don't have much content there now, but if users add it we can of course change the annotation system to include it by default, or on request. Since this isn't a concern of most of the developers, though, it isn't really getting much attention.

I don't mean this to be dismissive, because I agree with both parents posts -- this post is more to draw out the challenges of doing something like word AND character annotation. I think we're making important steps forward in machine annotation. The software and database driving the site are also open source, so if someone is interested in helping us push even more functionality into the database or software, please drop a line.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...