Jump to content
Chinese-Forums
  • Sign Up

Word lists from Character lists


roddy

Recommended Posts

Not sure if this will be of use to anyone, but I set it up for my own use and it seems functional enough to make it public for anyone who might find it helpful. So say thanks, both of you :wink:

Generate word lists from characters (link removed, it doesn't exist any more)

Basically, this will take in a list of characters - one per line - and output all words within the HSK that can be written with those characters. Eg, if you input

you get out

文字

语气

气体

语文

Purpose: If you are working through a 'learn to write characters' type of book / course, you can input the characters you've learned so far and see what actual words you can now write.

Limitations:

I just knocked this up because I needed it for something. It is not robust, and if you don't input strictly one character per line, with nothing else in the way, it'll probably break. It only outputs two-character words - almost broke the server testing all two character combinations, not sure I want to get into any higher orders of magnitude. It only outputs the characters, no pinyin or HSK level info at the moment. Input is limited to lists of 500 characters.

I think that's all. Probably easiest way to use it is to export a character flashcard list from Pleco, ZDT, whatever, knock it into the required format with some finding and replacing, then generate a wordlist for use wherever.

Hope someone manages to use it for something. If you want extra features, say so - if it's easy enough to do I'll add it on.

This feeds off the HSK Vocab Database, which is very nifty at generating wordlists based on phrase length, HSK list, tone patterns, etc.

Roddy

Link to comment
Share on other sites

Interesting idea. I have been working on something similar, but yep you guessed it, pinyin based.:mrgreen:

Can't get it to work at the moment though. But I know what you mean.

Alternatively, you can dump into access and run mass queries, but a public tool is always great.

Link to comment
Share on other sites

Interesting idea. I have been working on something similar, but yep you guessed it, pinyin based

What are you looking to do, specifically? It's quite possible that one of the databases out there can already do it, or could be made to quite easily.

Link to comment
Share on other sites

Then we really gotta talk.

Cut and paste from an email in which I discuss the project:

What I did with myself and what I have tried to do with students is to add linkages between pinyin words such that a cognitive link is established and disambiguation occurs. Results have been good, but I'm greedy, and I'm looking for more and better ways of doing things.

I have been experimenting with a number of tools and models trying to build a more dynamic and interactive way of doing this than my currently paper-based system. (And of course hopefully scalable and modular as well)

I have come up with something that has potential. However, it is going to need a lot of work, and perhaps some funds even, down the road. I view this model as the temporary step into what I really want to go to: a dynamic pinyin-only based system utilizing dynamic links in a surfable interface, that combines ranked words from a kou3 yu3 lexicon with tiered levels based on previously acquired knowledge as ranked and enforced by a Leitner based system with spaced repetition. In other words, how can we be lazy and learn more in a shorter time (for average Joe). I see that as a long way off, but this current thing could be an intermediate step at least allowing for linkages, surf ability and disambiguation. I just like to dream big :-)

Link to comment
Share on other sites

Well the part that is similar is this you start with the word (in my model it is pinyin of course) and it is linked to other words from a lexicon (read preprogrammed database) that share the same ahem* "character". Even if you never actually see the character.

Example:

ke3yi3 de ke3 is linked to ke3 neng2 de ke3 is linked to . . .

Maybe I will p.m. you. I don't wanna get too specific with it yet: number one because I want to use it in my intensive Chinese course in Beijing for maybe a year before releasing it to the public and number two I don't wanna get anyone's hopes up with a broken model:mrgreen:

This all depends on my finding an appropriate ready-made technology (I've been scouring for this for over a year) and the level of programming it would require.

It's the idea is all mapped out and people say it's very doable, well we just gotta see . Because I'm really not a techie and I don't know if I'll be able to get it done any time soon:cry:

Link to comment
Share on other sites

Sounds a little like Unfinished Project #576b - The Clickable Browsable HSK Database Dictionary Green Harmonious Platform. All characters are clickable. If you get stuck in a dead end click index and start again. From what you've said it sounds similar, although yours would have pinyin only and no characters (tape a bit of paper to the screen over the characters, same thing in the end)

I was looking at that again while doing the wordlist thingy today, and I've realised that stuff that was stopping me moving on with it (besides that fact that I forgot about it) isn't actually that important. May well move it on a stage or two (it currently only covers the first HSK list) in the near future.

Link to comment
Share on other sites

  • 2 months later...
  • 11 months later...

I know this thread is a year old, but we're supposed to search first, right? :wink:

Anyway, it has a lot of great links on page 1, so it probably deserves being bumped. I've bookmarked them and some of them have given me some new motivation for studying.

But roddy, your link doesn't work anymore! I mean the link works, but the ... whatever it's called, the word/phrase generator doesn't work! :( I was sad.

THANK YOU to everyone for all the great links!

Link to comment
Share on other sites

Ack! I got it! Sorry, my fault, I didn't type in carefully enough. :oops:

I now see that you warned us they need to be one per line, and not even a space before or after a character or it won't use that character.

Hey, so THIS is what "haste makes waste" means! Now I get it. I should have read better the first time.

Ok, so now here's your thanks:

THANKS! :D

I'm the type of learner who likes to thoroughly learn a hanzi when I learn it; learn several ways to use it, try to add it to my life, etc, so maybe your generator-thingy could help me find new ways to use them.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...