Jump to content
Chinese-Forums
  • Sign Up

Lazybug: free, open source app for smart subtitles (OCR)


martindbp

Recommended Posts

On 3/10/2023 at 10:54 PM, martindbp said:

So the main issue is that you have to move your mouse away to read the character, right? You could just hide the cursor after clicking (and show it again after moving the mouse away), would that help?

 

Thanks for the quick response.  It is an awesome app you've made, please don't take my feedback as negative criticism of your overall efforts or idea, which is great!

 

The problem I'm describing is this:

 

threebodyunpeek.thumb.jpg.e736a1d89fc6ae277baf7f1ea9f6da53.jpg

 

I've just clicked on a word to peek, 眼睛, but I can barely see it. There's a pin icon with glaze over it, plus the mouse cursor.  Plus the green 3, the icon bar on top, and the tool tip "Look up in dictionary" (the tooltip doesn't always appear, but it did the one time I took this screenshot).  The word itself is in light gray font, since I only clicked once. 

 

(I've also just clicked on the other 2 words after it, 里 and 看到, and those are also in light gray so even those are not that easy to see, after I've moved the cursor away.  Comparatively, the AI-suggested subtitle 倒计时 at the far right is very clear.)

 

Expanded picture of the clutter. 

 

threebodyunpeekexpanded.thumb.jpg.68161bb7095f8575ff7d8dc6e7e78d21.jpg

 

This makes it very hard to click word by word to search for the missing word I didn't grasp.  I don't know if other users like to do it, but for me, it's a fun game to uncover the least number of words in order to understand a sentence.  Like the game show -- Wheel of Fortune.

 

Currently, if you move off the word then some of the UI elements disappear.  So to do what I want to do, I have to move off a word, check if that's the word I didn't grasp, and then move back onto the word to cover it, if it's not the right one.  So that's my use case for you to consider.

Link to comment
Share on other sites

Another possibility is to change the behavior of the mouse-over.

 

threebodyscrollover.thumb.jpg.43603620bd8900793c2c09d373240d15.jpg

 

Here I haven't clicked on anything, I just moused over a hidden word, which turned it into an eye icon.  I don't think the eye icon adds anything to the user experience. 

 

So one possibility is just to uncover the word when you scroll over it, while clicking brings up the menus.  I don't know if that fits in with the other ways you're handling mouse-over vs clicks, so I'm just throwing it out there.

Link to comment
Share on other sites

On 3/7/2023 at 6:16 AM, martindbp said:

Btw, is there any particular font you guys prefer? I'm not sure if the current one is considered "easily-legible" but it seems to be the browser default.

 

That font looks good to me! 

Link to comment
Share on other sites

On 3/10/2023 at 4:34 PM, phills said:

Thanks for the quick response.  It is an awesome app you've made, please don't take my feedback as negative criticism of your overall efforts or idea, which is great!

 

No worries, constructive feedback is just a sign people care!

 

On 3/10/2023 at 4:50 PM, phills said:

So one possibility is just to uncover the word when you scroll over it, while clicking brings up the menus.  I don't know if that fits in with the other ways you're handling mouse-over vs clicks, so I'm just throwing it out there.

I'll have to think about this one a bit. I agree it would be more convenient, but ideally I'd want it to require a deliberate action since eventually I'd like to incorporate the peek events into a SRS. The actions you take in the UI give a lot of indication how well you know the material, which should inform the SRS. The problem with hover-peeking is that it creates a lot of false positives where you unintentionally peeked but didn't need to. On the other hand, just peeking the hanzi quickly is quite different from also peeking the pinyin/translation.

 

Btw, about hiding the non-hanzi rows, is this mainly because they take up too much space (too prominent?), or do you really not want to see the pinyin/translation of new words (or after peeking)? Since a few people didn't like about how much space it takes up I reduced the font size of pinyin/translation and spacing to make it overall more compact, see a comparison below:

compact_caption.thumb.png.3bce228e6b0b81d03c7a936da6ab6b5f.png


Some other changes live now:

  • Use lighter gray color for peeked words
  • When you hide words (manually or automatically) they just disappear, no more transition
  • Moved the word stats badge (which btw shows the word frequency in this video) to the popup context menu
  • Now when you peek, there is no hide icon, context menu or mouse cursor blocking the hanzi until you move the mouse

Haven't gotten to adding the options for removing individual rows though

Link to comment
Share on other sites

On 3/12/2023 at 12:22 AM, martindbp said:

Btw, about hiding the non-hanzi rows, is this mainly because they take up too much space (too prominent?), or do you really not want to see the pinyin/translation of new words (or after peeking)?

 

It's both.  And for me at least, once I set up myself in "Chinese" character mode, seeing pingyin jostles me out of being able to read the Chinese character at a glance, as my eye is also drawn to the pingyin at the same time. 

 

When it's a character I don't know, and I have to look it up, then of course, the pingyin is helpful.  But if I know the character, I'd rather not see it.  And I've gotten to the level, where I know almost all the characters, so I'd rather learn to rely on my own ability to read them. 

 

It's a crutch that's hard to ignore and so fights for my attention when I'd rather it didn't ?

 

On 3/12/2023 at 12:22 AM, martindbp said:

I'll have to think about this one a bit. I agree it would be more convenient, but ideally I'd want it to require a deliberate action since eventually I'd like to incorporate the peek events into a SRS.

 

Yea the mouse-over is a bigger change, so I understand.  I imagine one of your ultimate goals is to have the AI predict which words you want to see.  I'm curious how you're approaching that.  Is it purely machine learning using your own past behavior, or do you have some theory of learning about what types of characters/words people should want to see in order to learn better? 

 

Actually on a basic level, when the AI is showing a word, is that 1. a word that it thinks you should know (meaning on the edge of your knowledge, so it's meant to reinforce your weak words), or 2. a word that it thinks you don't know (meaning beyond your knowledge, so it's meant to help you understand the movie better by making the hard parts easier)?  

 

Both are useful ways of engaging with the material.  I think I'm more in camp 2, but some people might prefer 1.

 

Link to comment
Share on other sites

On 3/11/2023 at 6:20 PM, phills said:

And I've gotten to the level, where I know almost all the characters, so I'd rather learn to rely on my own ability to read them. 


Got it, but I guess at least sometimes you need to check the definition of a word though right? I'm wondering if hiding all other rows completely is the right design, or if there are other solutions, like an option for hiding the pinyin/word translation until you hover over the word, or the video is paused? I'm not opposed to removing rows completely, but just thinking if there is something here that other learners might benefit from

 

On 3/11/2023 at 6:20 PM, phills said:

I imagine one of your ultimate goals is to have the AI predict which words you want to see.  I'm curious how you're approaching that. 


That was my initial thought, but I think it's quite difficult to exactly predict which words a learner has been exposed to, so I'm not sure it would be a good idea to guess and make assumptions. There are exceptions, like if a learner knows a lot of HSK5 words, they'll definitely know <=4, but then you could just ask which level they've "completed". I did some experiments with active learning and logistic regression to get a probability of knowing a word given a list of known ones. My conclusion is to keep ML out of the subtitles, at least for now... One area of improvement though is for compound words, where if you know the constituents, you probably understand the compound, like 足+球=足球. Right now there are some simple rules in place, but it fails a lot. I'd like to implement something smarter there by matching the dictionary definitions in a fuzzy way.

 

On 3/11/2023 at 6:20 PM, phills said:

Actually on a basic level, when the AI is showing a word, is that 1. a word that it thinks you should know (meaning on the edge of your knowledge, so it's meant to reinforce your weak words), or 2. a word that it thinks you don't know (meaning beyond your knowledge, so it's meant to help you understand the movie better by making the hard parts easier)?  


Exactly, I think a core proposition of these subtitles is to help you understand, but help you minimally. Especially lower-intermediate to intermediate learners have to pause and look up words constantly which for me at least made me exhausted and just stop after 10 minutes.

I think one area where AI can help in the subtitles is to identify words that are ripe for you to focus on. If you have enough bandwidth to learn a few new words in a session, you don't want to add everything you come across, or words that only show up once and then you don't see them again for weeks. I've also noticed that some words are just so much easier to learn because they have enough memory "support", like you know the individual characters well, have a feel for how they are used, so it might be better to focus on those rather than trying to memorize a word that just doesn't stick no matter how many repetitions you do.


I've already experimented a bit building a simple RNN (Recurrent Neural Network) on my own Anki data using memory half-life regression, similar to a paper that Duolingo released years ago. What I'd like to do is take something like that, but also add more information, like how many different unique sentences with this word have you seen, how many words do you know for the constituent characters, which exercises did you do (simple hanzi->translation or cloze?). And of course other impressions you've got while watching shows. It doesn't have to be too complicated. I think there are diminishing returns, especially since you can't control for impressions outside the system (IRL or other material/tools), but scheduling should be quite a bit better than a simple flashcard system. It should be a pretty small, simple model, which also happens to be good for running in the browser :)

Link to comment
Share on other sites

On 3/12/2023 at 3:42 AM, martindbp said:

I'm wondering if hiding all other rows completely is the right design, or if there are other solutions, like an option for hiding the pinyin/word translation until you hover over the word, or the video is paused? I'm not opposed to removing rows completely, but just thinking if there is something here that other learners might benefit from

 

I'd like the option to X out the rows.  I prefer not to have pop-ups on hover, or automatically on pause.  Clicking on a word for the dictionary is relatively easy.  

 

I think learners at different levels may have different preferences.  For me, I want to rely solely on Hanzi as much as possible.

 

Your SRS ideas reminds me a bit of a thread from last year, about another open source systems someone was building.  I don't know if you've seen it:

 

https://www.chinese-forums.com/forums/topic/61866-transcrobes-free-open-source-language-learning-platform-for-mandarin

 

I think your system so far has much lower on-boarding friction. 

Link to comment
Share on other sites

On 3/12/2023 at 7:22 AM, phills said:

I'd like the option to X out the rows.  I prefer not to have pop-ups on hover, or automatically on pause.  Clicking on a word for the dictionary is relatively easy.  

 

I'll add options for it :) just need to fix an issue with database syncing logic first

 

On 3/12/2023 at 7:22 AM, phills said:

Your SRS ideas reminds me a bit of a thread from last year, about another open source systems someone was building.  I don't know if you've seen it:

 

I remember checking it out last year, but yeah I think the focus is a bit different. In an ideal world it would be great if all (Chinese) learning tools could share their data, so I'm open to integrating with other services (data format is open), or collaborate on code/ML. It's a pet peeve of mine that your data is fragmented across services so neither of them have the complete picture.

  • Like 1
Link to comment
Share on other sites

On 3/14/2023 at 11:24 PM, shawky.nasr said:

An error occurred during background message: {"type":"getCaptions","data":{"captionId":"youtube-5eQerxgZKk8"}}

Right, some shows have episodes missing, I think I had some troubles downloading these episodes but I'll try again

 

On 3/15/2023 at 2:30 AM, rongminshan said:

Is it possible to add links to other dramas, or just stick with the list so far?

Sure thing, do you have any requests? The way it works is that I process the show manually and upload the final file and show meta data. Others will be able to do this as well, just need to figure out a process. This is necessary even if the show has soft-subs, since I do a bunch of other NLP processing as well.

 

On 3/12/2023 at 7:22 AM, phills said:

'd like the option to X out the rows.  I prefer not to have pop-ups on hover, or automatically on pause.  Clicking on a word for the dictionary is relatively easy.  

Try it now, there are checkboxes in the options for the rows. I added them to the options instead of directly in the subtitle since I'm not sure how the UI would work to add them back. Maybe later I'll add buttons there too

hiderows.thumb.png.13243ac4691a840a6f7643d8ee4615d2.png

 

 

On an unrelated topic, I'm thinking about integrating ChatGPT at some point. I imagine pretty much every learning app will integrate this soon, as we've seen now with Duolingo and Khan Academy, will be interesting to see how this plays out. Basically, you can create your own account on openai.com, create an API key and supply it the app. It can be stored encrypted in the browser to make sure it can't be leaked. Then the subtitles you watch, or words you've starred etc can become context for a conversation with ChatGPT. It costs $2 for 700k tokens, so I imagine you can get quite a lot of usage for not a whole lot of money this way.


Interested to hear if anyone have any good ideas for what you could do with something like that

  • Like 1
Link to comment
Share on other sites

On 3/15/2023 at 8:28 PM, martindbp said:

On an unrelated topic, I'm thinking about integrating ChatGPT at some point. I imagine pretty much every learning app will integrate this soon, as we've seen now with Duolingo and Khan Academy, will be interesting to see how this plays out. Basically, you can create your own account on openai.com, create an API key and supply it the app. It can be stored encrypted in the browser to make sure it can't be leaked. Then the subtitles you watch, or words you've starred etc can become context for a conversation with ChatGPT. It costs $2 for 700k tokens, so I imagine you can get quite a lot of usage for not a whole lot of money this way.

 

Yea I think that's a grand idea, although having to enter your own API key is going to be a huge barrier to entry.

 

Since your app is so clearly educational, you might want to get in touch with them and see if they'll give you a small amount of free tokens monthly.  It might be interesting to them to see ChatGPT used more clearly for educational purposes (rather than just the media saying GPT is only useful for CHEATING!!) , and you're not competitive with them at all.  Just have ChatGPT phrase the request and make the educational point in a more diplomatic way ?

Link to comment
Share on other sites

On 3/16/2023 at 4:23 PM, phills said:

Since your app is so clearly educational, you might want to get in touch with them and see if they'll give you a small amount of free tokens monthly. 


Good point, it's worth a try, so I sent an email. But it seems that while you can apply as a non-profit, the application process is "competitive", which usually means you need the right contacts, be an established organization etc ?  Also agree the barrier to entry will likely keep many people out, but the alternative is to set up payments and which kind of goes against the whole goal (and barely more convenient), even if it just passes on the bill from OpenAI. We'll see, I'll probably experiment a bit, but there's also plenty of other features to build.

Would be interesting to combine it with SRS modeling. LLMs are not necessarily good at keeping track of forgetting curves, but they're great at generating varied content and exercises to optimally target words and concepts. Content generated by it could also be cached and reused I guess, such as example sentences.

Link to comment
Share on other sites

  • 1 month later...

Interesting tool, thanks for making it open-source!

 

Is there any way for importing known words? I have thousands of words marked as known in Chinese Text Analyser, I am too much of a lazy bug to mark them as known one-by-one.

Link to comment
Share on other sites

On 4/26/2023 at 1:45 PM, yaokong said:

Is there any way for importing known words? I have thousands of words marked as known in Chinese Text Analyser, I am too much of a lazy bug to mark them as known one-by-one.

 

Right now there's no way to do it, but it's a feature I've been wanting as well, so it's been high on the list. I haven't used Chinese Text Analyzer myself, but it is easy to export a text file with those words you know? I'm thinking having the ability to upload/paste any text file, including sentences, and extract words from there using naive segmentation, because I have lots of Anki cards with clozes that I'd like to import.

Just to be sure, did you select your HSK level? I assume so, even with it there can be quite a bit of clicking required for words that are not in any HSK level. Perhaps a "know all" keyboard shortcut could also be useful?

Link to comment
Share on other sites

  • 3 months later...

@yaokong You can now add a text-file with newline-delimited words in Hanzi in Options/Subtitles/"Upload words list"

Spring was a busy time so I didn't progress as fast as I'd have liked, but another feature released now is word exercises for starred words. You now get asked to type in the pinyin and translation for starred words as they appear while watching (on by default but can be turned off of course). After getting it right a couple of times the word is hidden from then on. In the Review tab you can review exercises in past videos in reverse chronological order.

 

This feature really comes out of my own falling out with Anki/SRS in general, which always results in a backlog explosion and lack of motivation (due to kids, work and side projects of course...). Ideally I'd like to reinforce and learn as I go, staying within the same context as much as possible.

Finding it pretty useful for me so far, curious if anyone would like to test it and give some feedback? Realizing of course that most people here are rather invested in outside SRS tools already :)

 

  • Thanks 1
  • Helpful 1
Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...