Jump to content
Chinese-Forums
  • Sign Up

Transcrobes: Free + Open Source language learning platform (for Mandarin)


AntonOfTheWoods

Recommended Posts

Ok so a little bit of annoying news.  The code that was deployed to the Chrome store has a couple of issues:

 

- the username, password and URL (so https://am.transcrob.es/) need to be re-entered. Don't worry, it does *not* need to reinstall the database!

- there is a bug that stops it from applying several of the config options on the first time you save them after the update.

 

Apart from this basic functionality still all works, you just can't choose the position/colour of the glosses (or turn off mouseover) unless you close the options, go back and then update them again. An update has been submitted and will hopefully get published in the next 24 hours!

Link to comment
Share on other sites

Not the case for me, the extension indeed got updated, so I re-entered the credentials, saved (confirmed as "Options saved."), activated the extension on a wiki page, got the error message saying I need an account on transcrobes, so I checked the extension again, my credentials were gone, I re-entered again, saved again, gone again. Rinse, repeat.

Link to comment
Share on other sites

And you weren't able to get them entered at all? Are you using Chrome or Edge (or something else)?

 

For updates on both Chrome and Edge on Linux, and new installs on Linux and Windows (tested on Chrome), everything installed fine (bar having to save twice) and works fine also (still a bit laggy to start but I'm working on that too!). 

 

Could you post or PM me a screenshot of your config so I can try and reproduce?

Link to comment
Share on other sites

On 1/15/2022 at 10:05 PM, alantin said:

I was looking for the number of words the system believes I know. Something like "Known words: 11 346" with a link to a list of those with individual stats.

@alantin The initial stats screen should have the first part. That at least gives you the raw numbers, with an idea to how that has changed over the last 6 months (or since you started). I would like to get your feedback on some ideas I had for presenting lists in a generic way (see below) but the list is definitely a feature I will add in the next couple of weeks (at least a simple version).

 

On 1/15/2022 at 10:05 PM, alantin said:

Other than that I was expecting something similar to the stats Anki shows you.

I actually started off this project by customising and then completely reimplementing a backend server for Anki, before later realising I had wasted about 6 months of my life on that. One of the key differences between what I am developing and systems like Anki is that they look at all notes/cards as "bits of knowledge", so learning a word is like learning a capital city, or the names of the bones in your hand. It is certainly possible to do that but I don't think it is the best way to think about vocabulary (and grammatical) knowledge. A key advantage to my system is that I know not only when you have studied a word in the spaced repetition but also every time you have seen/read that word in context. I have a basic system that prioritises new words based on that, and will hopefully start tweaking the spaced repetition algorithm with that in the not-too-distant future. There are heaps of stats that I could show based on those, but it is significantly more complicated (and I believe useful!) than what you get in Anki. An example would be, today my stats say you need to revise a word in 3 days time but in the intervening period you see that word 27 times in context, so I push out the date to 7 days rather than 3. 

I can do some stuff in the meantime though - are there any specific data points you would like? Is it more around a prediction of how many revisions (and the time taken) you will need to do in the coming days?

 

On 1/15/2022 at 10:05 PM, alantin said:

- The total word count and the unique word count,

- the counts of known total and unique words before I begin to read it and after I finish,

- the number of minutes I spent on reading it,

- total and unique character counts,

- *a difficulty score for the text.

- I also calculate the reading speed after finishing (and some other things) and record in how many sessions I read the whole thing.

All this and more will be possible with the stats collection system I recently deployed but unfortunately it is going to take a while for me to start extracting this sort of info. In terms of the difficulty score, I have had a number of arguments about this with my supervisor... Just as an example, there are quite a few tech-related words I know that mean that something that might be quite hard for my native speaker (piano teacher) wife, and due to domain knowledge, I will actually understand more of some texts than her. Such texts would likely score very high on any generic rating system. I would likely fail pretty hard on a child's fairy tale though... One of the things I am confident I will be able to publish LOTS of papers on is deeply personalised difficulty scores that take into account domain knowledge (based on previous content read) but that is going to take a bit of research first!

 

One thing I will have to do immediately is automatically extract stats from chapters, rather than just taking books as a whole. I initially though people might import by chapter but as I don't really offer an easy way to do that, I need to do it automatically. You currently get stats (totals for words, chars and known words and chars) for whole books. I will try and get per chapter very quickly done (it should be pretty easy). The number of minutes reading will need to wait a bit for me to better understand the new collection system.

Link to comment
Share on other sites

@AntonOfTheWoods, I didn't mean to say that I need the same stats as Anki has, but rather that I see missing graphs entirely as a bid deficiency. I don't know how many people find them fascinating, but I draw a lot of motivation from seeing them and digging into them is very interesting.

 

All of those ideas seem really good! I'm looking forward to seeing them in action!

 

About the book and chapter analysis, maybe you could also prioritize words based on the content the user is about to read. The user could import content and make a reading plan, to which the SRS could respond by re-prioritizing new vocabulary in the order it is going to show up in the material.

Link to comment
Share on other sites

On 2/21/2022 at 1:29 AM, alantin said:

About the book and chapter analysis, maybe you could also prioritize words based on the content the user is about to read. The user could import content and make a reading plan, to which the SRS could respond by re-prioritizing new vocabulary in the order it is going to show up in the material.

YES!!! This is one of the key features that pushed me to start this project. After working in digital advertising for over a decade and seeing how much awesome computing power we put into predicting purchase intentions, why not evolve "X language for specific purposes" (e.g, English for Academic purposes, Chinese for Business, etc.) into "X language for MY purposes". That comes from goals ("I want to pass HSK 5 in 3 months", "I want to read the 3 Body novels by Christmas", etc.) and previous learning performance. There are also more efficient orders to learning words, so that will also need to be taken into account. Should you learn individual character components before learning compound words? How durable is that (you might forget words learnt one way quicker, etc.)? All this can be studied when there is enough data to analyse over lots of learners, and then bring that back to help learners make informed choices based on both other learners and their own patterns.

 

On 2/21/2022 at 1:29 AM, alantin said:

I don't know how many people find them fascinating, but I draw a lot of motivation from seeing them and digging into them is very interesting.

This is also a key aspect of the project. Many learners aren't that into numbers but many are (including you and me ? ). I see this project as being part of the "quantified self" movement, where learners (users, citizens...) are given the tools to better understand how they learn (and behave more generally), so they can better plan, decide, and generally have more control over their own futures. Multinational corporations are developing detailed models to try and extract profit, we can also leverage open source to build models and tools to learn and improve our lives without necessarily having a profit motive guiding us. That's what I'm trying to do anyway! The project will eventually have an entire analytics suit for self reflection and analysis but creating tools that allow normal learners to get great insights is a major, major challenge.

  • Like 1
Link to comment
Share on other sites

Hi all. I have just deployed an update that has basic export functionality. You can export the info the system has on a per word and per day basis, and all the info the system has on your Repetrobes cards, including any personalisations you may have made to the "front" of the meaning card.

 

@alantin does this cover most of the essentials for you? What else is an absolute must at this stage for you?

 

 @thelearninglearner, if you can describe what you were talking about in a little more detail then I'll set whether there is some functionality I can add that would cover it.

 

While having the freedom to export all data that the system has in a useful and easy-to-transform format will always be part of the DNA of Transcrobes, I hope that it will evolve to cover most of the needs that most learners have, so I'm very keen to add functionality that real learners actually use in other apps. This so no one needs to export to achieve their learning goals. All learners are different though, so it might take a while to accommodate everything!

 

Please let me know about any bugs you see or features you think are missing, and in the meantime I will get back to improving the (hopeless!) documentation!

Link to comment
Share on other sites

On 2/18/2022 at 9:55 PM, AntonOfTheWoods said:

And you weren't able to get them entered at all? Are you using Chrome or Edge (or something else)?

Figured out what the issue was. When I clicked on Save in the Options, it said "Options saved." right away so I closed the tab with a shortcut. Despite displaying that message it did NOT save anything yet. 3-4 seconds later (yepp, I have that much of a lag to your server, guess what country I am in) it would have displayed "Settings Update Complete!".

 

  • Please change "Options saved." to "Please wait! Do not close the tab yet" or something similar.
  • I do not have a vertical monitor, so the saving progress indicator on the top of the Options page was not visible once scrolled down to the "Save" button. If possible, please move the progress indicator to the top of the (currently displayed) screen, instead of the top of the web site. I sometimes activate the extension while on the middle of a long page and have to wait for quite a while to see whether anything is happening or the extension just died (happened before, but then again even the google translate extension sometimes dies until restart).
Link to comment
Share on other sites

Also, please double check what is going on with 3rd tones in the attached screenshot (of a cn.nytimes.com article). If it is caused by using the same font of the context (inherit font), please consider replacing with a font selected by you that definitely contains those characters with the caron (also known as háček, haček, hachek, wedge, check, kvačica, strešica, mäkčeň, paukščiukas, inverted circumflex, inverted hat ?‍?, flying bird ?️, inverted chevron). 

 

2022-02-21--19-49-08.png

Link to comment
Share on other sites

On 2/21/2022 at 8:12 PM, yaokong said:

Also, please double check what is going on with 3rd tones in the attached screenshot (of a cn.nytimes.com article).

Sorry @yaokong. It's Windows... I really need to check much more with Windows, given they still haven't worked out how to do fonts right, even after all these decades :-(. On the main site it downloads quite a few high quality fonts so even for users on Windows the experience is good. I guess I didn't think enough about all the places it could be broken. I will have a think about how to get at least getting a backup font into the extension!

 

On 2/21/2022 at 7:40 PM, yaokong said:

Figured out what the issue was. When I clicked on Save in the Options, it said "Options saved." right away so I closed the tab with a shortcut. Despite displaying that message it did NOT save anything yet. 3-4 seconds later (yepp, I have that much of a lag to your server, guess what country I am in) it would have displayed "Settings Update Complete!".

Imagine what I was like with user interfaces before! I didn't want to make the progress wheel to intrusive but it doesn't serve it's purpose if it's not intrusive so I'll try and fix it to the top of the viewscreen.

 

Sorry again and thanks for the heads-up. So you can confirm that you have some sort of access from the mainland (at least from the capital)? I was thinking that I might have to have dedicated servers there but was holding off because you need a licence and it is *literally* an order of magnitude more expensive (so $50 in the US/Europe is $500 on the Mainland... because there are no super-budget providers, sniff...).

Link to comment
Share on other sites

On 2/21/2022 at 9:26 PM, AntonOfTheWoods said:

I really need to check much more with Windows, given they still haven't worked out how to do fonts right, even after all these decades :-(. On the main site it downloads quite a few high quality fonts so even for users on Windows the experience is good. I guess I didn't think enough about all the places it could be broken. I will have a think about how to get at least getting a backup font into the extension!

@yaokong I should also probably not diss MS so much also - actually the font is broken because Windows respects the NY Times font :-D. Obviously they are not taking us seriously yet!

 

I have just pushed an update to the store that allows overriding the gloss font with a couple of basic fonts. I can confirm that it fixes the issue, at least on that NYT page. It should be a general purpose solution though. I also changed the saving message (which was true at one stage but then I added extra stuff ? ) and made the progress wheel always appear at the top of the screen, no matter where the scroll is.

 

Please let me know if this is clearer and allows you to read as you want!

 

 

Link to comment
Share on other sites

Thanks, those issues are fixed, the font looks fine after setting a sans-serif font.

 

Just came across a minor bug on cn.nytimes.com (I tried 3 different articles, happens on all of them, not on the main page though): the popup cannot be closed. If I click on another word, the popup follows along, showing info on that word, but still cannot be closed (i.e. by clicking into empty space). This has never happened before and does not happen on 3-4 other sites that I tested.

Link to comment
Share on other sites

1541001594_ScreenShot2022-02-19at09_02_06.thumb.png.86bf85ee40cce439e0ed940e11235731.png1510129172_ScreenShot2022-02-22at20_41_43.thumb.png.1d3e551f07fa8d58ad8b63e5cc933560.png

 

You have some interesting translations there... It's difficult to take this program seriously with such egregious errors in the data. Is there any way to drop that word list that you're using and import one that is correct?

Link to comment
Share on other sites

On 2/23/2022 at 10:44 AM, Glyn said:

Is there any way to drop that word list that you're using and import one that is correct?

The meanings for the meaning questions are editable (you can click and then edit, and you get definitions from multiple sources, including CEDICT), and I'm going to work on making the rest editable too soon. You also get more definitions than that by either hovering over it or with a long-press on mobile.

 

One thing to keep in mind is that Transcrobes was never intended to be software for English speakers to learn Chinese but an open platform that works for native speaker learners of all of the major languages, for all the major languages. The amount and quality of free resources is very variable, even for the major languages. That means someone may have to do tweaking for individual cases. I am pretty sure that you have found 2 out of about 30 of the problem cases, out of a total of about 60000 total cases. Manually checking large data sources is something that can be crowd-sourced when you have either a really great product and user-individuals can help to make it better, or massive amounts of money. I don't have either (yet)!

 

Remember this is supposed to be a totally open platform (not 100% there yet but very close), usable either by researchers, language schools or even individuals. I am doing absolutely EVERTHING (from theory though frontend, backend, devops, documentation, and I also have to do marking and tutorials...). I want it to be available to all, and that means it can't require potentially very expensive data/analytics or hosting services, which means I am also doing a lot of devops engineering, making sure all the latest/best data and monitoring services are available. A normal startup would have *at least* 5 full-time engineers to do the all the jobs I am doing. So I have to prioritise on certain things...

 

It is all open source, open education and open research (I won't publish in paywalled journals), if you have some time and engineering expertise, it would be wonderful to get some help!

 

ps. CEDICT is fine but the project maintainers have a philosophical viewpoint that makes using their data much harder. They are open about the fact it is NOT intended for automated use. They want it to be a basic digital version of a paper dictionary. What do I mean? They explicitly rejected having part-of-speech information integrated, which makes their dictionary a LOT less useful for this kind of thing. 

Link to comment
Share on other sites

On 2/22/2022 at 10:53 PM, yaokong said:

Just came across a minor bug on cn.nytimes.com (I tried 3 different articles, happens on all of them, not on the main page though): the popup cannot be closed. If I click on another word, the popup follows along, showing info on that word, but still cannot be closed (i.e. by clicking into empty space). This has never happened before and does not happen on 3-4 other sites that I tested.

It looks like the NYT site is capturing the click outside so I don't get to see it (probably for managing ads). I will add a close button to the popup, thanks for pointing this out @yaokong! (the update is now waiting to be approved on the extension store)

Link to comment
Share on other sites

I have just deployed an update and it is now possible to change the default definitions for the spaced repetition system (actually it changes the priority) in the Repetrobes configuration. That allows for users to mitigate the issue that @Glyn points out above for vocabulary revision.

Link to comment
Share on other sites

  • 2 weeks later...

I have just pushed a new update that allows for user provided dictionaries. There is not a lot of documentation yet but should be enough to get started for anyone who is interested. Let me know if anything is not clear.

 

I am not sure why but it may be necessary to close the browser completely after it updates. It also might seem very laggy but after closing the browser (and waiting for a minute or so to make sure the browser process has actually stopped) and reopening it should return to normal speed.

 

The various settings interfaces (where it is relevant) now also contain a selection panel where you can add/remove/prioritise the dictionary you want. If your highest priority dictionary doesn't contain part of speech information then if you want to force your definitions, you will need to select the "Strict Provider Ordering". Then it will only fall through to the second dictionary (and further) if the word doesn't appear in your dictionary *at all* (so even if it thinks it should be selecting a noun but your dictionary only has a translation for a verb, it will still get chosen).

 

I also integrated a lot of extra information about characters and radicals (from https://www.skishore.me/makemeahanzi/), and integrated the "notrobes" tool a lot more tightly everywhere. In most places where you see characters now you can either click to get a popup (like before) or it will open up notrobes in a new tab with the particular word preselected. I also added a "related" option, which finds all the words that either contain the word (so has the word as a sub-string) or words with the same pronunciation, or that have the same radicals. In order to be fast this feature has to preload quite a bit of data so unfortunately that takes around 30-40 seconds to load the first time you click on it for a session (and occasionally afterwards).

 

Any feedback much appreciated!

Link to comment
Share on other sites

Hello again, Anton,

 

The documentation gives no information about how long an import takes, but I've been waiting quite a while for a 500-word vocabulary list import that I need to create a list, which is then needed to create a goal. Any thoughts or information that may help me figure out what went wrong?

 

Attached is the file I'm trying to import.

hsk_v3_01_vocab.csv

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...