Jump to content
Chinese-Forums
  • Sign Up

zimu.ai: browser extension for subtitles from OCR. What would you like to watch?


martindbp

Recommended Posts

Hey everyone!
 

I'm a software/ML/computer vision engineer by trade and I've spent some time building an OCR subtitle extraction algorithm for videos, and made them accessible through a browser extension. My goal is to make available pretty much any video of interest online, whether on Youtube, Netflix or chinese sites like Bilibili. As of now I'm working only with Youtube though. You can download it here, and find short installation/user guide here. For now it's what I would consider this "beta" software, and it's Chrome only and requires manual installation. The current list of processed shows can be found at browse.zimu.ai. The list is pretty short right now but I'm processing new videos every day.
 

As you probably know, there are quite a few similar extensions for soft subs (which are also supported naturally), but I'm trying out a slightly different concept/philosophy for the subtitles. The idea is we want to display the minimal yet sufficient information such that a learner can understand the content in a reasonable time frame. From the start, the pinyin, hanzi and word translations are visible for all words. Gradually you can hide information you know, but new unknown words are thus visible by default, hopefully keeping you in flow. If you keep learning until all the subtitles are completely hidden, viola, you're fluent! At least that's the idea. But, naturally everyone is free to use it as it suits best, I've tried to keep enough settings to make it flexible to use.

 

The extension comes with the standard Anki CSV file export. You can export the usual basic or cloze notes, but I've also added the ability to export the JSON of the whole containing sentence, along with dictionary info, so that you can build very advanced cards in Anki if you wish (example cards are provided in the guide). That said, (deep) knowledge tracing has been a research interest of mine for quite a while and I do see a big potential in minimizing the amount of time we spend in SRS by helping us encode memories more efficiently, and use inter-card dependencies to improve the scheduling. Therefore at some point I'll probably take a stab at an embedded SRS.

 

As for funding, I'm making this browser extension available for free. I'm putting as much functionality as I can client-side (in the browser), and optimizing for low cost so that each additional user has very low marginal cost. For full disclosure, my philosophy here is to try and reach and provide something useful to as many people as possible, and try to find other ways to support it financially rather than a subscription or locking important features behind a paywall. That might be Patreon donations, selling the OCR as a SaaS, or even VPN/affiliate ads on the browsing site (not in the extension).

 

So, are there any cool Youtube videos or channels with hard subs (or soft) you've been wanting to watch?

 

Any and all feedback is warmly welcome! Hope you find it useful!

  • Like 4
Link to comment
Share on other sites

家有兒女 would be a good candidate, should be available on youtube.

 

So you process the videos offline and provide the soft subs using the extension? What's the accuracy of the OCR engine?

 

Personally I have my SRS setup in Pleco, so I would prefer just a list of words, similar to what I provide for some shows, based on soft-subs.

Link to comment
Share on other sites

On 5/18/2022 at 1:39 PM, wibr said:

So you process the videos offline and provide the soft subs using the extension? What's the accuracy of the OCR engine?

 

Yes that's right, I process them offline at the moment. At one point I checked the accuracy on a few different videos that had soft captions as well as hard captions, the result then was about 1 character error in 200 to 1000 depending on difficulty. The difficulty depends on a lot of things, resolution, text blending into the background without a clear border, fade-in/fade-out, rare fonts etc. For example, I checked the show you suggested, and it's a bit on the challenging side due to the low resolution. Here's an excerpt of the first dialog from the first episode:
 

Quote

你等会儿我还没说完呢
你瞧咱俩结婚刚两个月
这俩孩子就好得跟亲兄弟似的
多好啊
好如果是再多个就更好了

什么意思啊你还想让我再生啊
不是我不是那意思
我是说啊干脆把小雪
从她爷爷家也接过来一块住
你想啊头羊也是赶
一头羊也是
三头羊也是轰


At the end you can see a duplicate line where there was high uncertainty. In cases like this though, I can generate more specific training data to hopefully improve the model :)

 

On 5/18/2022 at 1:39 PM, wibr said:

Personally I have my SRS setup in Pleco, so I would prefer just a list of words, similar to what I provide for some shows, based on soft-subs.


That sounds like a simple export function I could add for Pleco users. In the extension you can "star" words that you encounter and export from the dashboard page, would that work?

By the way, I realized from my post I should probably lead with a screenshot of what to expect. The screenshot attached shows what it looks like to me. Personally I keep all pinyin and hanzi visible as I'm not yet focusing on listening comprehension.

zimuai.png

  • Like 2
  • Helpful 1
Link to comment
Share on other sites

Here's a first week update

 

Added shows

New extension release:

  • Export starred words to Pleco format
  • Export subtitles to SRT format for those who use subs2srs or similar tools (can be found in the subtitle options menu, "Other" tab)
  • New auto-pause feature based on limiting the Words Per Second (WPS). When enabled, this only pauses subtitles which are above a set WPS threshold, and pauses for the remaining duration. I found this to be very useful when watching with my wife, letting me get a bit more time to process difficult subtitles while not having to manually control things constantly :)
  • Youtube video thumbnails now have an added icon to signify that the video has processed subtitles (see attached image). This makes it a bit easier to differentiate which videos are supported, and navigating back to the ones you were watching before.
     

Let me know if there are any features that are blockers for effectively using it, it'll help me prioritize!

thumbnail.png

  • Thanks 1
Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...