Jump to content
Chinese-forums.com
Learn Chinese in China

  • Why you should look around

    Since 2003, Chinese-forums.com has been helping people learn Chinese faster and get to China sooner. Our members can recommend beginner textbooks, help you out with obscure classical vocabulary, and tell you where to get the best street food in Xi'an. And we're friendly about it too. 

    Have a look at what's going on, or search for something specific. We hope you'll join us. 
imron

Lua Script and Chinese Text Analyser

Recommended Posts

Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

dougwar

if I'm not asking for too much, this is my workflow maybe you know a way to automated it;

1. First a get a text file, I make one sentence per line

2. I run highlight-unknown.lua script

3. In excel I ordinate a row by numbers of known words.

4.I make some formulas to have this rows, than I export it to Anki;

ex. 

Sentence with blank          Unknown word                 Complete sentence.

那个学生昨天 __ 了书                 丢                      那个学生昨天 _丢_ 了书。
   

              

 

   

Share this post


Link to post
Share on other sites
imron

Yep, that can be completely automated.

 

The script could also be made to export sentences in Anki's 'cloze' format if you'd like.

Share this post


Link to post
Share on other sites
RogerGe

Imron, is it possible to change the script from checking the percentage of new characters based on HSK6 to it being based on your current word list? Thanks

Share this post


Link to post
Share on other sites
imron

Yes, it's very easy to do.  On line 11 of the file, change

 

for word in cta.hskLevel( lower, upper ):words() do

to

for word in cta.knownWords():words() do

 

This will then build the list of characters from your known vocabulary rather than the HSK 1-6 vocabulary.

 

Attached is a copy of the script with that modification (plus a few cosmetic changes to remove reference to HSK from variable names and output).

 

char-coverage-known.lua

 

Share this post


Link to post
Share on other sites
imron

@dougwar please find attached a script that should do what you want.

 

It finds all the sentences in a given document that contain unknown words.

Then it sorts those sentences by the number of unknown words, with sentences containing the least amount of unknown words appearing first

Then for each unknown word in each sentence it prints

     The total number of unknown words in the sentence

     The sentence with the current unknown word replaced with __

     The unknown word

     The sentence with the word surrounded by __ e.g. _生词_

 

This means that each unknown word in the sentence will have its own line in the output, so if the sentence has 5 unknown words, that sentence will appear 5 times in the output with a different word replaced each time.

 

You should then be able to save this file and import it directly in to Anki.

 

Let me know if this does what you want, or if you need any adjustments.

 

 

unknown-sentences.lua

  • Thanks 1

Share this post


Link to post
Share on other sites
imron

Uploading a script that extracts a marked word from the first field of a tab separated file (e.g. from cards exported by anki)

 

extract-marked-words.lua

 

See here for context.

 

 

See here for instructions on how to run the script.

  • Thanks 1

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...