Jump to content
Chinese-forums.com
Learn Chinese in China

  • Why you should look around

    Since 2003, Chinese-forums.com has been helping people learn Chinese faster and get to China sooner. Our members can recommend beginner textbooks, help you out with obscure classical vocabulary, and tell you where to get the best street food in Xi'an. And we're friendly about it too. 

    Have a look at what's going on, or search for something specific. We hope you'll join us. 
chinesemadrush

Mass extraction of sample sentences for chinese words

Recommended Posts

chinesemadrush

Hi everyone,

 

I am currently studying chinese using Anki and it's not very helpful to remember the words without understanding how they are used. Hence, I wanted to include sample sentences in my flashcards. Manually extracting them from websites like yellowbridge would be way too time consuming. May I know if any of you have good suggestions?

 

Thank you!

  • Like 1

Share this post


Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

fabiothebest

There is a program called Chinese Text Analyser by @imron that is very good for segmenting text, keeping track of known and unknown words and create wordlists. You are especially interested in sentences though. Another thing that you could do is using sub2srs for creating flashcards based on movie subtitles. If you want to extract sentences from websites, I'm afraid you need to do it manually. There aren't programs specifically made for that I think. There are some general scrapers but for general purpose and you should know how to customize and use them. I'm also interested in suggestions from others. We could make a list of such websites with example sentences and then see if there are any programs for extracting them. If there isn't anything I might consider coding a script for extracting the sentences. I should take into account the different layout of every website, that's why making a list of websites would be useful. I don't call myself a professional programmer, but that's something I could do and there are some other programmers in this forum so it's something that could be done if it doesn't exist yet.

  • Like 2

Share this post


Link to post
Share on other sites
realmayo

Consider downloading and playing with the popular Anki deck called Chinese Sentences and audio, spoon fed

https://ankiweb.net/shared/info/2003820603

 

Also ask yourself if it really is time-consuming to add them manually: after all, you're only really having to paste the word into a website and then paste an example sentence or two into Anki. The rest of the time is spent reading the example sentences that the website returns and picking which examples you want to select. That can be viewed as part of the process of learning and understanding more about how a given word is used.

  • Like 1

Share this post


Link to post
Share on other sites
fabiothebest

I was aware about the existence of an anki deck with sentences although I didn't try it yet. I don't know if it contains mistakes or not. I'll try it. Someone who tried it can give a feedback. Hmm I'm not actually sure that learning many sentences this way with Anki can be beneficial. Maybe it's better to just search the usage of the words you are studying and need to use at the moment. Otherwise there is also Glossika, that is more listen and repeat type. You just play the audio file, you don't need to switch cards. The quality of the sentences matters though, so since I haven't tried the anki deck yet I can't really judge.

 

I think that self made materials are the best because they are personal and based on your needs. Materials made by others may be less interesting or useful for you but also contain things that you wouldn't think of because you haven't been exposed to them, so it's worth trying something like that anyway. Anyone has his own learning method. There are many ways to learn Chinese. It's important to set some goals and stick to them.

Share this post


Link to post
Share on other sites
imron
There is a program called Chinese Text Analyser by @imron that is very good for segmenting text, keeping track of known and unknown words and create wordlists. You are especially interested in sentences though

Chinese Text Analyser can extract sentences too, with optional cloze deletion of the word in question, just choose the 'Sentence' or 'Cloze Sentence' field from the export word list dialog box.

 

So for example, you could export cloze deleted sentences for the top 20 most frequent unknown words in a given document.

 

Even better, with the new Lua scripting support you can get CTA to process all files in a directory (and all sub-directories) and create anki-compatible cloze deleted sentences for all 'mostly known sentences' found (where mostly know means that great than a certain percentage of all words in the sentence are known).

 

In fact, one of the example scripts provided (anki-cloze.lua) does exactly that, and is explained step by step in the Lua example documentation.

  • Like 2

Share this post


Link to post
Share on other sites
Yadang

 

Another thing that you could do is using sub2srs for creating flashcards based on movie subtitles

 

We also have a post dedicated to Anki decks that have been made using Subs2SRS to cut the movie into little fragments of text and the corresponding audio. You can then use the decks with Imron's Chinese Text Analyzer to tag all of the sentences containing words you don't know, and provide definitions and pinyin.

Share this post


Link to post
Share on other sites
chinesemadrush

Thanks everyone for the inputs. Would explore the various options mentioned.

 

@fabiothebest, I previously tried using scrapping tools on websites but they often have this anti-botting system that requires you to identify you are not a robot after a few words or so. For example, they would ask you to type out certain words on screen into a box. Any suggestions you have in mind?

 

Thanks cone again.

Share this post


Link to post
Share on other sites
fabiothebest

@chinesemadrush If I have time I'll try and if I come up with something usable, I'll post it here.

  • Like 1

Share this post


Link to post
Share on other sites
Flickserve

Are you making cards with just the word on the front and then the sentences on the back?

I have made a lot of Anki cards. I am afraid there is no easy way to select sentences when you create your own individual cards. Because the sentences you select are personalised to your own knowledge. For instance, I ignore long sentences or sentences which have a lot of unknown vocabulary. At my low intermediate stage, it doesn't help to include such sentences.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...