Jump to content
Chinese-Forums
  • Sign Up

Learning Chinese with the Word Sketch Engine


smithsgj

Recommended Posts

华语学生您好! 想不想以软件程序加强您的中文词汇能力?

Introducing the Word Sketch Engine, a software tool developed by UK academics, guaranteed to improve your Chinese word power tenfold in a matter of weeks!!!

No, seriously, this isn't spam. I'm involved with the evaluation of the Sketch Engine's Chinese version at Ming Chuan University in Taiwan. We’d like to invite keen Chinese learning forum members (native speakers of any non-Chinese language) to use the Sketch Engine (SkE for short) in their studies, to help with reading and vocabulary learning.

For further details, please read on. Or just go straight to http://mcu.edu.tw/~ssmith/walkthrough if you prefer! When asked to log in to Sketch Engine, use the name mcu03, and the password forums.

What's in it for me?

You would potentially get a great boost with vocabulary acquisition, and definitely a lot of exposure to authentic newspaper Chinese (from either China or Taiwan or both, as you please). You would get to read real sentences containing the vocabulary you're interested in – a far cry from the made-up revolutionary sentences in certain dictionaries!!

What is the Sketch Engine?

It's called a "corpus query tool". It's a computer program, with a decent web interface, which reads in a vast corpus of Taiwan and mainland newswires to generate a short Word Sketch, a one page summary of the most common contexts in which a given word may be found. SkE was designed to help in compiling dictionaries, actually, and has already been used in Longman and OUP publications, but now we want to see how it performs as a tool for helping non-native speakers to learn Chinese.

There are versions of SkE for English and other languages too, but this research is interested in the Chinese version.

What features does it have?

There's something called a Word Sketch, which shows you how a word patterns. Suppose you know contribution is 贡献, but you don't know whether to say 进行贡献 or 做出贡献. Well, you would just fire off a word sketch for 貢獻 (or one of the verbs), and that would tell you what the most significant collocate is.

Another thing it does is Sketch Differences. Imagine you wanted to clear up once and for all the difference between 高兴 and 快乐. You could of course also run a word sketch to help with this problem, but with Differences you get a summary of the two. You are shown contexts where only 高兴 is possible, contexts where only 快乐 is possible, and contexts where both are OK. There's a colour coding system, so you can see quickly which is which.

What do you want me to do?

Use it in your studies! Every time you would normally reach for the dictionary, stop. You know you're supposed to figure out the meaning from context by yourself... the Sketch Engine can help you (we hope) to do just that. We think it's potentially a great learning tool, but we want to see scientific evidence of that.

What's the next step?

You need to go to http://mcu.edu.tw/~ssmith/walkthrough. Here, you will be invited to take a short questionnaire (we’re calling it a pre-test, because we plan to ask you to do a post-test after a few weeks of using SkE in your studies). There are some questions on your background, including contact details, and on your current knowledge of collocations – we hope you will find the latter interesting.

Next, you’ll be taken on a walk through the Sketch Engine software (this will include logging in: use the name mcu03, and the password forums). You might want to set aside an hour to work through everything (you can always do the pre-test and then come back to the walkthrough later, of course).

Of course, you are under no obligation to participate, and having agreed to participate you are still free to withdraw at any time. And of course we wouldn't use your personal data for any purpose other than our research: if you wanted to be anonymous that would be fine by us. Please offer feedback, too, via PM or this thread, as you please.

敬颂学安!

Link to comment
Share on other sites

It looks great. My first impression is that it is easy to use, very interesting, and powerful. I just wish it were free. I've been wishing for a tool like this for a long time. I've seen similar tools that include English corpora, but not Chinese.

The prices are reasonable (52.5 euros/year), but that's the academic license price. If I wanted to use if for translating or work-related stuff, I guess that would require a commercial license.

Actually, I wish I could get it on DVD or CD, and, oh yeah, PDA. I'd pay good money for that. Well technically it looks possible. The Chinese corpus is 1.5gb, small enough to fit on a memory card. After it is tokenized and indexed, I guess it would be a lot bigger, but still in the realm of possibility.

And the pre-test was fun, but when do I get to see the answers?

Link to comment
Share on other sites

Why do you say tenfold? Has the company done any testing on this, or is the number just pulled out of thin air. If it's true that's pretty impressive, but the survey seems designed to figure out whether it helps.

Anyway, I'd check out the tool, but am turned off by being asked to fill out a really lengthy survey and commit to repeatedly using a system that I don't know will be useful before looking at a new tool.

Link to comment
Share on other sites

PLEASE NOTE: this is a research project of an academic sort. Although Sketch Engine is a commercially available product, my interest in it is purely as a language teacher and researcher: to what extent can it help people to learn Chinese?

Why do you say tenfold?

Sorry, that wasn't intended seriously!

am turned off by being asked to fill out a really lengthy survey

I know what you mean. We've got National Science Council 國科會 funding for this research, but we don't have money for candy or other inducements, so we've had limited success in getting university students on board. It's difficult to know how to assess the tool scientifically without using a questionnaire, so we really hope some of you will find the time.

Plus a lot of people do find it a useful tool!

It looks great. My first impression is that it is easy to use, very interesting, and powerful. I just wish it were free. I've been wishing for a tool like this for a long time. I've seen similar tools that include English corpora, but not Chinese.

It's kind of you to say so. There are some other corpus tools around: Academia Sinica offers simple concordancing, and Lancaster Uni has a more powerful tool, and these are balanced corpora (not just one theme, like this one: agency newswires).

But the real difference is corpus size. With a billion Chinese characters, it's just many many times the size of those other tools. That means the patterns you're looking for are just that much more likely to come up.

And the pre-test was fun, but when do I get to see the answers?

The survey should have picked up you email address. We'll get back to you (let me know if that doesn't happen)

What kind of encoding do you use? I just tried to do the test in simplified characters, but it doesn't display properly on my screen. I don't usually have a problem seeing Chinese.

Sorry about that. It's probably a problem with my3q... Ithought it was Unicode. Can you see the output from Sketch Engine itself? Maybe try the different View settings in your browser. I can see the simplified OK: can anyone else not? We just converted from trad in MSWord (embarrassed as I am to admit that:oops: )

Chenpv: we're interested in Chinese for our research. But if you look on the Sketch Engine front page, you'll see a list of corpora of various languages that you can access.

Link to comment
Share on other sites

The survey should have picked up you email address. We'll get back to you (let me know if that doesn't happen)

I haven't received anything, but it's possible I typed my email wrong.

Anyway, I'd check out the tool, but am turned off by being asked to fill out a really lengthy survey and commit to repeatedly using a system that I don't know will be useful before looking at a new tool.

I must have skimmed over that part or not taken it seriously. I would like to use the system, but I don't do enough studying to be a valid data point in a study requiring frequent use. If frequent use is a requirement, then I have to disqualify myself.

Link to comment
Share on other sites

Originally Posted by anon

What kind of encoding do you use? I just tried to do the test in simplified characters, but it doesn't display properly on my screen. I don't usually have a problem seeing Chinese.

Funnily enough, I can only see the simplified characters if I choose traditional encoding :shock:

Link to comment
Share on other sites

  • 2 weeks later...

"There are some other corpus tools around: Academia Sinica offers simple concordancing, and Lancaster Uni has a more powerful tool, and these are balanced corpora (not just one theme, like this one: agency newswires)."

Can you post the link to the Academia Sinica concordancing tool? I'd be interested to compare the two.

As I've said elsewhere, I'm not sold on all the functions of this tool...seems as though someone's trying to shoehorn Chinese into some grammatical molds it might not fit properly. But I'd be interested to learn more about how it might work.

Link to comment
Share on other sites

ooo-er, I wasn't aware the results of the quick test would be published for all to see, if I'd have known that I wouldn't have rushed through it...

Any chance of removing the personal email addresses from the site, I have enough trouble with spam as it is, and I don't like my email just floating around in the ether like that.

Link to comment
Share on other sites

Sorry Ironlady and Gerald, we had been working on it. :oops: The free questionnaire software forces publication of participants' answers :(

I was going to wait until we had our own secure interface up and running, but in view of your comments I've disabled the questionnaire link.

We'll try and get the new interface up later today, so people can continue to take part.

Link to comment
Share on other sites

Sorry about the way the answers were made public. That issue is now resolved: you can now go ahead and take the pre-test, by going to http://mcu.edu.tw/~ssmith/walkthrough. Your answers will not be available anywhere on the web!

Ironlady, check http://www.sinica.edu.tw/ftms-bin/kiwi1/mkiwi.sh

http://bowland-files.lancs.ac.uk/corplang/lcmc/

I'll get back on your other points: but please bear in mind that Sketch Engine is a corpus query tool, not a corpus! It can be applied to any suitable marked up corpus, but works best with a truly huge corpus like Chinese Gigaword.

The grammar rules for SkE were indeed written for English, this is true. However

a) they are being developed to take account of (EG) 把 / 將 construction (the error you spotted was to do with verb subject/object wasn't it?)

B) SkE uses grammatical patterns to find collocates. It is not intended to be a grammar teaching tool at all! I can see how people could be misled into thnking it is, and I'll think about that issue more. Thank you for raising it.

I'll get back on other points fairly soon.

Link to comment
Share on other sites

No worries, we're just embarrassed that we didn't score 100% :oops:

Thanks for the link to the other corpus...for my purposes (which are quite specific) it's probably more appropriate as I need to be able to determine with certainty what is a spoken form (or can be) and what is not. I'd lost the link some time ago and I'm certainly happy to have it back.

I don't think the sentence I saw had ba or jiang in it, but it will be difficult to deal with topic-comment construction anyway. I'm not quite sure how one could go about providing more specific information using a corpus database, but then again, that's what a dictionary of collocations is for. Each tool has its use, I guess. And I love corpora just to rummage around in if nothing else! :mrgreen:

Link to comment
Share on other sites

  • 2 weeks later...

I'm the owner of the Sketch Engine and worked with Simon on the Chinese. There are a couple of questions frm in_lab that I can quickly answer.

The prices are reasonable (52.5 euros/year), but that's the academic license price. If I wanted to use if for translating or work-related stuff, I guess that would require a commercial license.

- yes you would: individual commercial licenses, available only to individuals (eg not if you are an employee and using the tool as part of your employment) are 173.25 euros/year.

Actually, I wish I could get it on DVD or CD, and, oh yeah, PDA. I'd pay good money for that. Well technically it looks possible. The Chinese corpus is 1.5gb, small enough to fit on a memory card. After it is tokenized and indexed, I guess it would be a lot bigger, but still in the realm of possibility.

not on our agenda, I'm afraid: the whole corpus including indexes is ca 25 GB and it would raise a slate of issues about different versions, download speeds, upgrades etc: the route we are interested in is that the PDA and similar can use it if web-connected, as everythign will be before long!

Link to comment
Share on other sites

A comment on the cost issue.

As an individual no longer in school, but still working with Chinese and seeking to improve my language skills, not to mention happy to use a corpus tool for sheer self-amusement and browsing interest -- the price you quote for an individual membership is completely out of the question. I'm not trying to be mean, just trying to let you know how some people might view the pricing issue.

Put yourself in my position. I've got a fairly good level of Chinese already. The tool does not really supply 100% reliable information about Chinese grammatical categories (assuming we can get some group to agree on what those really are! :mrgreen: ). It is limited only to news agency wires. Yes, it's a big corpus, but how much of my work or study as an intermediate/advanced student is going to be limited strictly to this type of document? To see how words are used, why would I not use a freely available corpus which might be a bit smaller but which is balanced (as has been pointed out earlier in this thread) and for which I can select categories (such as the Academia Sinica tool, which allows me to select various categories of text and even quite specific subject areas)?

And as a translator, I can get sufficient contextual information about how an unknown phrase is used by just Googling it. I'm not quite clear on how the additional grammatical information this tool seeks to give would be helpful to me as a translator. (Maybe it's not intended for translators specifically, but someone mentioned it being used in work.) The only time I use a corpus tool as a translator is if I'm preparing an interpreting job and looking for information about how a specific term is used in formal oral discourse in Chinese, and I can't see where this tool's current features are adding US$200 worth of annual value over what's available now to do that job -- which I don't even need to do on a frequent basis.

For nearly US$200 a year, a learner is going to expect something that will provide instruction, or nearly so, I should think. Particularly for beginners, which is the bulk of the market, there are many competing tools already on the market. I would strongly urge you to reconsider the pricing options if your aim is to get a large number of users to frequently log in and use the tool with an eye toward collecting feedback and improving the tool long-term. Perhaps it would be worth providing the license at a greatly reduced rate (compared to the current individual pricing scheme) to collect meaningful user feedback through Web-based surveys or other instruments sent out periodically. No response, username gets cut off. I'm not talking about specific language improvement pre- and post-test setups, which would be very difficult given the disparate nature of users; I'm thinking more about useful feedback from learners of Chinese about how they are using the tool, whether they are aware of other ways of using it, and what they wish the tool would do.

Just some thoughts. I am very disappointed that a university-backed project is attempting to go so very commercial so quickly with a product that is perhaps not quite ready for prime time in terms of handling the specific quirks of Chinese. Hopefully some reconsideration of the pricing policy can be had for the benefit of the tool on a long-term basis. But again this is just my NT$0.66 (approx. US$0.02), your mileage may vary.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...