Free graded reader resource

July 7, 2017 at 03:35 AM

Hello everyone,

Inspired by the Inkstone app that I found was so useful and free, I dedicated some of my time here in China to create a graded reader website, IronMandarin.

What you can do with this website:

read a text
choose a text by category or by HSK level
see how close a text really match a given HSK level
save words in your know list, to give you an idea of where YOU stand to a given text (this is your personal profile)
save words in a 'to learn' list, and I plan to add a SRS (Spaced Repetition Software) functionality, or maybe just allow export in a convenient format for anki import?
see the most frequent unknown words in a text (so once you add 的，一，个... to your personal list they don't appear in the frequency listing anymore)
switch the character set from simplified to traditional
analyze your own text:
- without logging in, you can analyze a text, for example an email or an article, it will be better for you than google translation
- logging in, you can publish a text so it is saved on the website. It can be public, or private (so no sharing say, personal emails)

It is not fully developed, as I also study and work, but more functionalities are planned, such as frequency list over one category or a set of articles, to be able to focus on specific vocabulary, which I find especially useful for HSK 6.

I publish text from different sources but I know a few Chinese teachers here in Chengdu that help me. You can also participate by publishing some texts.

The segmentation algorithm is automated, based on Jieba, but it makes a lot of mistakes and currently I spend quite some time reviewing the segmentation, maybe I'll have a look at the code to patch a few common mistakes (for example it doesn't split numbers, or number and measure word).

I had before some question about monetization. I dedicate quite some time to this project, and I need to pay the writers more if they spend some more time on the project.

I will make pretty soon a Patreon page, hopefully it will be enough to make this project sustainable.

The website also offers the possibility of tailored analysis and advice for a reader to progress, based on his current word list. This and some premium content could make for a premium membership in the future if a Patreon is not enough.

The website is not free to gain enough traction to put it full price more expensive than a real newspaper. The core functionalities intend to remain free, on a donation or freemium model.

Hopefully this will help all of us Chinese learners

Let me know if you have any suggestions!

July 7, 2017 at 08:45 AM

5 hours ago, IronMandarin said:

maybe I'll have a look at the code to patch a few common mistakes

Jieba is a statistics based segmenter. It's not so much the code you need to patch but rather the probabilities used in the statistical model. You could probably hard-code a bunch of different exceptions, but the whole point of using a statistical model is to avoid the need to hard-code exceptions in the first place.

For what it's worth, I'm currently working on a statistical segmenter for Chinese Text Analyser that uses the Jieba data files for probabilities (the current version of CTA uses a first longest match algorithm, which is fast but even more inaccurate than Jieba).

July 7, 2017 at 03:05 PM

Ok, thanks for the information I'll have a look into that. It was not my priority but that could be useful to dig a bit into the technique.

July 7, 2017 at 04:45 PM

Let me know if you have any questions about it, or can't figure out why it does something in a given way. I've been going over it in detail recently so have a good idea of how most of it works.

Sign In

Free graded reader resource

Recommended Posts

IronMandarin

Link to comment

Share on other sites

imron

Link to comment

Share on other sites

IronMandarin

Link to comment

Share on other sites

imron

Link to comment

Share on other sites

Join the conversation