Jump to content
Chinese-Forums
  • Sign Up

Transcrobes: Free + Open Source language learning platform (for Mandarin)


AntonOfTheWoods

Recommended Posts

On 1/15/2022 at 12:58 AM, AntonOfTheWoods said:

That's weird. Almost all questions have 5 levels (Likert). Here is what I see on Chrome on Android.

 

Yes I can see 5 levels on my chrome desktop, but I'm not that familiar with how Likert queries are set up. I just read the options and think to myself, I can't disagree or agree with this statement in the slightest (completely indifferent), so I end up torn between "slightly disagree" and "slightly agree" but can't really pick either.

Link to comment
Share on other sites

On 1/14/2022 at 11:50 PM, alantin said:

This would be great, but the font should be configurable or big enough to see the diacritics. My personal problem is a bad vision and I can't make them out if the font is too small.

This is very important. Things like this should definitely never get in the way of learning. ALL your extra cognitive load should be dedicated to useful learning, not practical issues like font size!

 

In the movie player you can make the font any colour or size you want, and have the subs at the top, bottom or under.

 

In the book reader you can choose from 4-5 fonts for both the glosses and original Chinese text, and make them any size. Currently the glosses are 20% smaller than the Chinese (due to user feedback) but I want to make this configurable.

 

I need to be a little more conservative with the extension because it can take a week to get something approved. For changes to the main site I can revert something in production in minutes if there is an issue.

Link to comment
Share on other sites

On 1/15/2022 at 7:06 AM, alantin said:

Likert queries

This is a very common system in research (and industry). It has drawbacks like the one you mention but has a lot of weight.

 

I will try and add an option for the next ones. I can always make an argument that it was necessary ?.

 

For this questionnaire, I am actually reproducing a study from some guys on (university) learners of English, so I need to keep the questions/answers as similar as possible. Basically I just substituted "Chinese" for "English".

 

So I guess this is a choose the "least inaccurate" case :).

 

And thanks!

 

Link to comment
Share on other sites

@AntonOfTheWoods, I read through a couple of wikipedia articles clicking words known as I went along and I think this is pretty much exactly what I was looking for a while ago. It now shows me the pinyin after the words that I can't read and I can get the English definition by hovering over the word. It's perfect!

I've encountered some pretty big bugs too though. I imported a csv file of a little over 10.000 words and added a job for it to check those words as known, but it has been contemplating on taking the task up for hours now and nothing happens.

I also tried to import a txt file containing a chapter from a book and found out that only epub's were supported, so I found a website to convert the file and got the text imported and enriched, but when I click the "Read" link in the "Content" view, it only opens a blank page. No text.

 

Third issue I had was trying to find reports and lists of words I have in the system. Only way to find words is by searching for individual words with the search function and I haven't been able to find a report on the numbers of known words. Actually I haven't been able to find any reports.. I also couldn't figure out how to create "Goals".

 

Fourth issue: If I use this, I'll need to be able to export words out of it so I can feed them back into CTA. I haven't found an export function.

Link to comment
Share on other sites

On 1/14/2022 at 11:16 PM, Jan Finster said:

It may actually be even better to put the pinyin and/or translation above or below the text.

I would like to have extra modes like this but my general thinking (for the start) was the following - your reading is interrupted *already* by the fact that you don't know the word, and are going to need to guess. Remember that you should only see glosses for words that you don't know the meaning of already. This requires more research but when you see an unknown word, you typically stop and your eyes will look around to get more context to try and guess, so the flow is already broken.

 

I personally found this way best to start with but I want extra modes and if more users think it best to have two lines, then I will make that the default.

Link to comment
Share on other sites

On 1/15/2022 at 1:23 AM, AntonOfTheWoods said:

This requires more research but when you see an unknown word, you typically stop and your eyes will look around to get more context to try and guess, so the flow is already broken.

 

This isn't quite my experience.

 

I can quite happily get in a flow of enjoying content without such interruptions given that the only crutches are pinyin next to the occational unreadable word (maybe I'll begin to make a distinction between "readable" and "known" words. What does disrupt reading is English text embedded between the Chinese text.

At least for me something like this is completely unreadable.

wiki.png.ee9e7b29ec7be9de4b9c3315089d5685.thumb.png.7b87e71e25982825905863694fbd7d39.png

 

Compared to this

360206072_Screenshot2022-01-15at1_37_00.thumb.png.6f6e5e7c4c446bb7adbbe139bab36316.png

 

 

This is what I wrote about my own view on why I absolutely do not want English embedded in there:

On 1/14/2022 at 12:07 PM, alantin said:

English is a crutch (just like pinyin is too) and the English translation is not the meaning of the word. It is the English translation of the meaning of the word. If you have the crutch there, it will draw your attention and give your brain the signal that the brain can just rely on the translation and the new unknown information is unnecessary. In most cases the context and the characters are more than enough to give the brain enough material to figure it out if you come across the word again and again and figuring out meaning while discarding all superfluous information is exactly what the brain is meant to do.


 

  • Like 1
Link to comment
Share on other sites

On 1/15/2022 at 3:11 AM, alantin said:

I got the account set up, the plugin working, and my scv word list imported, but now I've been waiting for an hour or more for it to mark them known. Is there a way for me to know when it is going to begin this?

Another epic fail on this one for me - the import gets the data into the system, it is actually when you create a list when you can tell the system you know a word. It is a two-step process for a few (good) reasons but I'm very sorry I didn't make that clear!

Link to comment
Share on other sites

On 1/15/2022 at 7:33 AM, alantin said:

This isn't quite my experience.

Learners needs are very dependent on their level. Maybe at your level you don't need anything to understand what you are reading. Most research (and the "accepted" values) suggests the percentage of known words in a text before it starts seriously affecting comprehension is from 95-98% of the words.

 

You put in your sig that you are HSK5-ish but that sounds very surprising to me. Reading typical stuff intended for native speakers will typically mean that there are *lots* of words you don't know at that level, and many learners don't like it when they don't understand what they are reading. Maybe you are fine with that, and I definitely want to support learners like you! I really need to understand at least the main points of what I'm reading in order to get enjoyment, so I've found this best for me.

 

Because everyone is different, the system should be able to adapt to each individual's needs when those needs evolve and change. That is my goal.

Link to comment
Share on other sites

On 1/15/2022 at 7:17 AM, alantin said:

I click the "Read" link in the "Content" view, it only opens a blank page. No text.

Sounds like a bug. I'm on it.

 

On 1/15/2022 at 7:17 AM, alantin said:

I also couldn't figure out how to create "Goals".

You create goals from lists, which is pretty simple. More ways will come later - your ideas are more than welcome!

 

On 1/15/2022 at 7:17 AM, alantin said:

Third issue I had was trying to find reports and lists of words I have in the system. Only way to find words is by searching for individual words with the search function and I haven't been able to find a report on the numbers of known words. Actually I haven't been able to find any reports.

 

Reporting and visualising lists is pretty basic/missing at the moment. The reporting is currently mainly on your progress over the last two months in terms of % vocab known (which is directly practical and meaningful for many learners). Lots, lots more is to come. If you can give me some ideas on what you are looking for, I can start that now.

 

I will work on an export that covers your needs this weekend, if we can work together to understand what they are. 

 

As a (dog ate it) excuse for why this isn't already developed... I started out my professional career with user interface and web development but moved to other stuff because I hated it so much (my true tech love is DevOps!). I made a lot of bad choices when I started out on this project just because I wanted to avoid doing JS/HTML UIs again. There are many, many thousands of hours of training and development that have already gone into what is there, including writing a really terrible version that I threw away entirely (thousands of hours in the rubbish!) before learning the proper technologies for it (Typescript/React/async FastAPI/SQLAlchemy). Now I am using the right technologies and understand them ok, things are moving quite fast but I'll still need some time to be able to implement features!

Link to comment
Share on other sites

On 1/15/2022 at 7:17 AM, alantin said:

when I click the "Read" link in the "Content" view, it only opens a blank page. No text.

When you import books a copy of the import file gets stored (temporarily at least) on the server. I imported the book (chapter) into my account and it seemed to go without a hitch.

 

It might be a refresh issue. Could you try doing a ctrl+r or F5 refresh in the browser?

 

I can post a screenshot if you are ok with that (I didn't read the material but you might not want me posting a screenshot of it...)

Link to comment
Share on other sites

On 1/15/2022 at 12:23 AM, AntonOfTheWoods said:

would like to have extra modes like this but my general thinking (for the start) was the following - your reading is interrupted *already* by the fact that you don't know the word, and are going to need to guess.

 

On 1/15/2022 at 12:33 AM, alantin said:

At least for me something like this is completely unreadable.

wiki.png.ee9e7b29ec7be9de4b9c3315089d5685.thumb.png.7b87e71e25982825905863694fbd7d39.png

 

Yeah, I agree with Alantin...

Link to comment
Share on other sites

On 1/15/2022 at 3:59 PM, Jan Finster said:

Yeah, I agree with Alantin...

I have a couple of other users that think it's fine, particularly given the choice of this or having to click on 2-3 words per sentence. But again, there are lots of ways for the system to present, and I think users should have the choice. Your level might make this form make no sense. It might be perfect for others.

 

I think you might be surprised if you try and read a text like this where the words that are glossed are only the words *you* don't know, not the words that someone else (me!) doesn't know.

 

I will have a look at colours and trying to put the gloss above (or below?) later today or tomorrow.

Link to comment
Share on other sites

On 1/15/2022 at 10:08 AM, AntonOfTheWoods said:

I have a couple of other users that think it's fine, particularly given the choice of this or having to click on 2-3 words per sentence. But again, there are lots of ways for the system to present, and I think users should have the choice. Your level might make this form make no sense. It might be perfect for others.

 

Yeah I think the way it is customizable for the user is definitely the best way to go!
I think the plugin combined with the back-end is already the best reading aid for web sites I ever seen for Chinese (or Japanese. I'm comparing mainly to rikaichan, Cinese0tohero, and zhongzhong) and it fits pretty much perfectly to my own study routine once I just have trained it.

Also you describe it as some kind of a "digital twin" of your vocabulary or language knowledge. I think this is something quite new and I see a lot of potential in it, but the like you said the UI isn't your core interest area and I think the lacks in it makes it hard for less uninitiated to get onboard. That being said, I think it's actually quite good and some minor changes and more instructions in the different views would probably make a big difference. Is it possible for you to find someone studying UX and front-end design to do that as a part of their thesis or something?

 

Working in IT myself and having done software testing in software projects in the past I can quite well imagine how much work must have gone to it already and frankly, at this early stage It works a lot better than some commercial software that I've seen...

 

Also, I think all the feedback here, though not without criticism, has been very constructive and it's great to see to don't seem to get disheartened because of it! That's a big reason I believe this has good chances of becoming something truly awesome!

I saw your PM and other messages. I'll comment on those shortly and also test the plugin again with something!

Great job! ?

 

  • Like 1
Link to comment
Share on other sites

On 1/15/2022 at 2:10 AM, AntonOfTheWoods said:

Learners needs are very dependent on their level. Maybe at your level you don't need anything to understand what you are reading. Most research (and the "accepted" values) suggests the percentage of known words in a text before it starts seriously affecting comprehension is from 95-98% of the words.

 

You put in your sig that you are HSK5-ish but that sounds very surprising to me. Reading typical stuff intended for native speakers will typically mean that there are *lots* of words you don't know at that level, and many learners don't like it when they don't understand what they are reading. Maybe you are fine with that, and I definitely want to support learners like you! I really need to understand at least the main points of what I'm reading in order to get enjoyment, so I've found this best for me.

 

I'm not so sure HSK is very good at conveying a persons level. I don't follow HSK material anymore but I did pass HSK4 almost two years ago. Lately I've been thinking I'm probably somewhere between HSK5 or HSK6 in terms of speaking ability and general comprehension, but my reading speed with native content is still less than 90cpm. HSK5 seems to require at-least 200cpm for you to actually be able to read everything in the reading section in the given time. Also CTA shows that HSK1-HSK6 vocabulary covers only about 60% of the 7 828 total words in the document that I'm currently reading. So almost every second word in is not in the HSK lists!

 

HSK material was useful for me up to about HSK5 as a motivational crutch but my goal up to that point was to get to a point where I could start reading actual material meant for natives, since I believe that's where real progress begins to happen. I read a couple of graded readers before beginning reading 流浪地球 because I liked the movie, but gave up at about 1/3 in since it was just too difficult and labored through the first Harry Potter book instead. After that I gathered myself for a while, listened the whole 三体 trilogy in English and then picked up the first 三体 book in Chinese and was able to finish it. After that I read one of the books from the middle of "The Wheel of Time" and found it quite approachable, since I already know the story. Now I have made my goal to read through the whole 14 book series in Chinese before moving to Chinese authors.

 

I haven't kept records of my reading performance before, but I began doing so when I started reading the first Wheel of Time book about three months ago. My average readability rate (the proportion of the words in the whole text that I can read out aloud. Initial, final, and the tone.) over the last 10 chapters is about 93,5%. When I began 3 months ago it was 90,6% for the first 10 chapters and the absolute low point was 88%. I can say that reading these books now is a LOT easier than reading the Potter book two years ago!

 

 

This leads me to reports.

On 1/15/2022 at 2:33 AM, AntonOfTheWoods said:

Reporting and visualising lists is pretty basic/missing at the moment. The reporting is currently mainly on your progress over the last two months in terms of % vocab known (which is directly practical and meaningful for many learners). Lots, lots more is to come. If you can give me some ideas on what you are looking for, I can start that now.

 

I was looking for the number of words the system believes I know. Something like "Known words: 11 346" with a link to a list of those with individual stats. Other than that I was expecting something similar to the stats Anki shows you. Also I think it would be great if the Chrome plugin could show you some stats about the page you are viewing. The percentage of known words and characters etc.

 

 

 

As for ideas that might be useful. I don't know if any of these are even possible to implement, but from my reading material I personally record for each chapter:

- The total word count and the unique word count,

- the counts of known total and unique words before I begin to read it and after I finish,

- the number of minutes I spent on reading it,

- total and unique character counts,

- *a difficulty score for the text.

- I also calculate the reading speed after finishing (and some other things) and record in how many sessions I read the whole thing.

 

* The score is the average frequency (as per Jun Da 笪骏 frequency list) of all unique characters in the text weighted by the number of their occurrences in the text.

 

I then then mainly follow the progress of my reading speed and the readability figures, but I'm also interested in the correlation between these variables. Mainly I want to know what predicts higher reading speed though. I find that the best one is, unsurprisingly, the percentage of known words in the material, but the difficulty score also has a moderate correlation to the reading speed. Below are some screenshots. I call the difficulty score "Character Average Frequency" in my excel sheets though.

That one outlier in the reading speed data is actually the Mandarin Companion graded reader "Journey to the Center of the World" which I read to gauge my reading speed with easier material.

 

 

 

 

Screenshot 2022-01-15 at 15.57.08.png

Screenshot 2022-01-15 at 15.55.29.png

Screenshot 2022-01-15 at 15.57.35.png

  • Like 1
Link to comment
Share on other sites

On 1/15/2022 at 2:10 AM, AntonOfTheWoods said:

Reading typical stuff intended for native speakers will typically mean that there are *lots* of words you don't know at that level, and many learners don't like it when they don't understand what they are reading.

 

I may have higher than average tolerance for ambiguity due to my own language learning background. I don't know. Chinese isn't the first second language for me and I have quite a clear idea of what works for me and what doesn't. I also believe this intolerance for ambiguity is the single most important thing keeping people back while learning a language. It is obviously always the easier to pick up vocabulary the more you understand, but you certainly don't need 95-98% comprehension rate to learn a language by consuming it. Children star from zero with their native languages and adult brains are only better poised for learning a language than infants with all their life experiences, so they certainly can do it a lot before 95% comprehension given that they can make sense of the context.

 

We study English in the primary school here and we don't have dubs so I had some foundation for English already when I was reading Wheel of Time as a 12 year old kid, but I ran out of the Finnish translations. I loved the story so I got the original English version of the next book and read it all the time with a dictionary. If you're a kid who wants to know what happens next, armed with a real paper dictionary you won't be checking every word you don't know, but you will hone in on the important ones. Those books are about 300 000 words long each in English and I didn't need the dictionary after the third book. My English skills were always after that way ahead my class mates too, though it evens out here at the University level at the latest.

Since then I haven't encountered anything that would convince me I had to check the meaning of every word. Quite the opposite actually. We study a few years of Swedish in school a couple of hours a week which is nowhere near to reach any proficiency in it, but I began listening to the audio book versions of the wheel of time while studying my bachelor's degree. I was probably around A1 level or so at the time, but English helps a lot with it and I got to conversational level in Swedish in a year from listening to those audio books while commuting and taking walks without ever touching a dictionary. I throughout the whole process I was enjoying the story, understanding what was going on, and piecing the language together in my head while doing it.

I began studying Japanese about 18 years ago and speak it fluently, but reading is pretty bad (I can write and read emails, even professional stuff, and chat with people online, but reading a book takes ages) probably because at the time I didn't have a very good clue about how to approach a language with such an alien writing system. I tried to ignore it for the first few years and then concentrated insane amounts of time on the characters and drilling individual words with SRS, but I never really just got around to reading properly, because it seemed always too difficult. Nowadays I mostly use Japanese to speak with my wife, her family, and our Japanese friends and listen to audio books sometimes.

 

With Chinese, I have pretty much put everything I know about learning languages (and especially a language using the Chinese characters) to learn it as effectively and as "balancedly" as I can. Three years in and beginning my fourth, I think I'm doing pretty well. I'm not displeased with the level I'm currently at.

Link to comment
Share on other sites

On 1/15/2022 at 3:05 PM, alantin said:

My average readability rate (the proportion of the words in the whole text that I can read out aloud. Initial, final, and the tone.) over the last 10 chapters is about 93,5%. When I began 3 months ago it was 90,6% for the first 10 chapters and the absolute low point was 88%.

 

Wow, when I tried to read such material, I typically admit defeat after a few chapters as it is too laboursome. It feels like I have to look up words all the time and there is no reading flow. Yesterday, I tried to start the Chess Master (棋王), CTA said I should be 95% good to go, but I stopped since it still feelds too much like intensive and less like extensive reading. I am glad it works for you.

  • Like 1
Link to comment
Share on other sites

On 1/16/2022 at 1:57 AM, Jan Finster said:

Yesterday, I tried to start the Chess Master (棋王), CTA said I should be 95% good to go, but I stopped since it still feelds too much like intensive and less like extensive reading.

 

Yea, Chess Master is tougher than it might seem.  If it's considered a "classic" of any sort, I usually move it up one notch in difficulty because the language is going to be more sophisticated, even if the words are basic.  Even kids stories that are classic suffer from this.

 

Potboilers / popular fiction set in modern-ish settings tend to be the easiest for a given level of vocab, because they tend to be repetitive or formulaic.  The exact things a critic hates, are what a language learner wants.

  • Like 1
Link to comment
Share on other sites

This looks pretty cool and seems like it will have a lot of potential. I used it for a bit and it's pretty nice on my laptop for reading. I mostly use CTA now. wondering how i can integrate it with anki. like being able to highlight a sentence or part of a sentence and export it with anki with all the definitions of unknown and or known words under the sentence. or something like that. i have some other tools that do this kind of thing but it would be nice to have one tool to do it.

 

i'll keep checking on the project and use it here and there to see what's new. I always think it's better to have more tools and options than less. looking forward to updates

Link to comment
Share on other sites

On 1/16/2022 at 5:19 AM, thelearninglearner said:

wondering how i can integrate it with anki. like being able to highlight a sentence or part of a sentence and export it with anki with all the definitions of unknown and or known words under the sentence. or something like that. i have some other tools that do this kind of thing but it would be nice to have one tool to do it.

 

This is exactly what LingQ does. And it does more. Here is an old post of mine that shows you examples on sentences or parts of sentences you can import. I currently highlight virtually only with parts of sentences, not individual words for potential review. So rather than 国际 and 进口  and 博览会 I extract 第四届 国际 进口 博览会 (The 4th International Import Expo):

 

https://www.chinese-forums.com/forums/topic/60209-suggestions-for-better-learning-intermediate-to-advanced/?do=findComment&comment=470312

 

They are currently releasing a new version (5.0) with new features.

 

(Life-time Chinese language subscription is 199$ last time I checked) 

  • Thanks 1
Link to comment
Share on other sites

Just a quick update: I found and fixed a couple of nasty bugs that made things pretty painful when you have lots of words.

 

@Jan Finster, I also now give a lot more freedom with the glosses - you have four choices of where to put them (after, above, below and even before!) and you can choose the size and the colour. I will admit I was a bit sceptical to begin with but after seeing it in action, I realise giving learners those options is a really good idea! Thanks for the suggestion! I might even be a convert myself (especially for video subtitles) :-).

 

I am working on an interface for copy/pasting text which I also had a request for from a teacher at my uni and then I will get to work on the exports and stats suggested above.

 

Please let me know what you think about the new glossing and let me know if you have any issues. A couple of users have had issues with the system "forgetting" the known words. If you have that the simplest option might be to kill the DB and start again. The known words are safe on the server (with cross-continent backups) and just need resynching! You can do this on the System page, "Reinstall DB".

  • Thanks 2
Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...