Jump to content
Chinese-forums.com
Learn Chinese in China

  • Why you should look around

    Since 2003, Chinese-forums.com has been helping people learn Chinese faster and get to China sooner. Our members can recommend beginner textbooks, help you out with obscure classical vocabulary, and tell you where to get the best street food in Xi'an. And we're friendly about it too. 

    Have a look at what's going on, or search for something specific. We hope you'll join us. 
PandaEye

Instantly Extract Chinese Subtitles Physically Embedded from Videos to Text File

Recommended Posts

PandaEye

I'm interested in creating a tool that can instantly extract Chinese Subtitles that are physically embedded from Any Chinese video and output to a text file with time stamps--unlocking an endless supply of the highest Quality learning resource (native content, audial, visual, transcriptions).

Transcriptions can be imported into Pleco's Screen Reader for immediate translations of the script without having having to manually search word definitions and can simultaneously be followed along while watching the video. Other apps can also be used like Chrome's Zhongwen Popup Dictionary, Hanping's Chinese Popup or any other method for more rapid and rich learning.

In order to make this tool a reality it would have to be sold as service in order to compensate those build and maintain the tool; the service would be a site/app where you provide a video/url and pay an inexpensive price per video hour/subscription and the hosted software would immediately deliver you the transcription file.

The software that extracts the subtitles would be required to be built on deep/machine learning principles (Artificial Intelligence). I've begun inquiring ML Engineers about their estimated cost to create this tool (Chinese engineers could also be potentially leveraged) and I intend to create a Kick Starter/funding campaign with the goal of creating the software, website/app and maintaining and improving the service if funding is met.

Link to inquiry of ML Engineers: https://www.reddit.com/r/deeplearning/comments/9iicq7/request_for_quotation_of_a_use_case_from_the_dl/

 

What do you think is the size of the market for this service, language learners and other use cases included? There would need to be enough people interested to meet funding otherwise it couldn't happen. What are your thoughts? 

Screenshot of (320) China News Intro _ Opener _ Logo 2015 (2) Chinese News Channel - YouTube.jpg

Share this post


Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

大块头

Welcome to the forum!

 

I'm always glad to see people innovating in this space, so I'm sorry to pop your bubble...

 

I haven't used it personally, but it looks like this open-source Chrome extension does what you're describing?

 

https://chrome.google.com/webstore/detail/copyfish-🐟-free-ocr-soft/eenjdnjldapjajjofmldgmkjaienebbj?hl=en

 

 

  • Thanks 1
  • Helpful 2

Share this post


Link to post
Share on other sites
大块头

It seems to work OK...

 

screenshot.png

Share this post


Link to post
Share on other sites
Flickserve

I tried it out. It works better if the background is dark. 

 

If the background is light and the font has a shadow on it, it doesn't work at all well. 

Share this post


Link to post
Share on other sites
XiaoXi

I tried it on two random sentences and with the first one it didn't get it at all, the second one was like in the attached image. Not sure why but not only did it not recognise the captured area, for some reason the last few characters were not part of the area I selected. The full sentence should have been 他肯定不会回来的. Maybe the background is not super dark but it's hardly super light either. It does seem to exist but it's bad going on awful. Personally I'd be more interested in software that could analyse a downloaded video file and OCR all the subtitles and produce a .srt file.

 

In fact to say I'd be interested in that is the understatement of the century.

屏幕快照 2018-09-25 下午1.47.15.png

  • Like 2

Share this post


Link to post
Share on other sites
Flickserve
56 minutes ago, XiaoXi said:

Personally I'd be more interested in software that could analyse a downloaded video file and OCR all the subtitles and produce a .srt file.

 

This. 

 

Not sure of the size of the market though. It is a lot of work. Most people end up just watching a video. 

Share this post


Link to post
Share on other sites
XiaoXi
1 hour ago, Flickserve said:

Not sure of the size of the market though. It is a lot of work. Most people end up just watching a video. 

Probably not that big unfortunately because most people wouldn't know what to do with srt file to get the most out of it. But personally I know that an srt file is the holy grail of language learning.

  • Like 2

Share this post


Link to post
Share on other sites
yaokong

XiaoXi, could you please explain in 2-3 sentences how you use it? 

Share this post


Link to post
Share on other sites
XiaoXi
2 hours ago, yaokong said:

XiaoXi, could you please explain in 2-3 sentences how you use it? 

I'd be interested to know how Flickserve uses it too. Well let me ask you, when you watch a Chinese tv series with hard coded subs and come across a word you don't know - what do you do?

Share this post


Link to post
Share on other sites
Flickserve

Me?

 

I am interested in hard subtitles. And then, you can make a srt file

 

A srt file lets you generate anki cards with the sentences, audio (and pictures) as an automated process. 

Share this post


Link to post
Share on other sites
XiaoXi
12 hours ago, Flickserve said:

I am interested in hard subtitles. And then, you can make a srt file

Yes that was my suggestion, we were interested in the same thing so I wondered also how you used an srt file.

 

12 hours ago, Flickserve said:

A srt file lets you generate anki cards with the sentences, audio (and pictures) as an automated process. 

Oh ok, yes it's useful for that. The srt file is really so much more useful that hardcoded subs. So many possibilities. Is there software for that now? I remember there was software for Japanese a long time ago but no other languages.

Share this post


Link to post
Share on other sites
Flickserve

oh definitely. I turned some films into anki cards. We have a thread for that in the forum.

 

I used subs2srs. The whole process was documented by TysonD on the forum.

 

 

 

 

Share this post


Link to post
Share on other sites
艾墨本
On 9/25/2018 at 1:52 PM, XiaoXi said:

Personally I'd be more interested in software that could analyse a downloaded video file and OCR all the subtitles and produce a .srt file.

 

This. I want to read the subtitles like a book and practice performing them as a way to learn language that fits specific situations and the tone of voice that goes along with it. I want to imitate the actors and do some 配音

Share this post


Link to post
Share on other sites
XiaoXi
23 hours ago, Flickserve said:

I used subs2srs. The whole process was documented by TysonD on the forum.

Oh right, that's the exact same software I was referring to. The one made originally for Japanese.

 

18 hours ago, 艾墨本 said:

This. I want to read the subtitles like a book and practice performing them as a way to learn language that fits specific situations and the tone of voice that goes along with it. I want to imitate the actors and do some 配音

Yes hopefully the OP hasn't gone to a better place and maybe he can make this software since there seems to be more demand for it. Not to mention that what he is proposing seems to already exist, even though it doesn't work that well. If you read the SRT file as a book how will you be able to actually hear the voices to imitate them?

 

Btw you can already do this with movies since SRT subtitles are normally available for the more popular Chinese movies.

Share this post


Link to post
Share on other sites
Flickserve

there is plenty of video content out there. I just wonder how accurate it can be. Greater than 95%? one incorrect word out of twenty?

Share this post


Link to post
Share on other sites
艾墨本
9 hours ago, XiaoXi said:

Yes hopefully the OP hasn't gone to a better place and maybe he can make this software since there seems to be more demand for it. Not to mention that what he is proposing seems to already exist, even though it doesn't work that well. If you read the SRT file as a book how will you be able to actually hear the voices to imitate them?

 

Btw you can already do this with movies since SRT subtitles are normally available for the more popular Chinese movies.

 

Pretty simply, actually. I'd read them slowly, looking up words and practicing at my own pace away from my computer (an important part of good studying for me) and then go back to my computer and watch the show.

 

As far as finding SRT files for popular movies. I challenge you to try finding some SRT files for 人民的名义 or how about the dubbed version of Avatar the last airbender. Both would be great for studying!

Share this post


Link to post
Share on other sites
yaokong
Quote

I challenge you to try finding some SRT files for 人民的名义

here you go, as found on http://subhd.com/ar0/378232

 

Cannot find the latter even after a rigid search, having illuminated the darkest corners of the interwebs.

Share this post


Link to post
Share on other sites
XiaoXi
On 10/3/2018 at 9:01 PM, 艾墨本 said:

Pretty simply, actually. I'd read them slowly, looking up words and practicing at my own pace away from my computer (an important part of good studying for me) and then go back to my computer and watch the show.

How do you look up the words?

 

On 10/3/2018 at 9:01 PM, 艾墨本 said:

As far as finding SRT files for popular movies. I challenge you to try finding some SRT files for 人民的名义 or how about the dubbed version of Avatar the last airbender. Both would be great for studying!

The subtitles for a foreign language movie are normally made by someone completely independent from the people who do the dubbing so are unlikely to match. Depends on the movie. Sometimes they match quite well, sometimes they don't match at all. With some movies the subtitles do indeed appear to be based off of the mandarin dubbing, like 猛龙过江 but it's not the norm unfortunately.

 

I found with French that fans make transcripts of tv shows dubbed in French that match perfectly since they're transcripts of the dub itself. Might be worth looking to see if Chinese fans ever make transcripts like this. 

  • Like 1

Share this post


Link to post
Share on other sites
Flickserve
6 minutes ago, XiaoXi said:

I found with French that fans make transcripts of tv shows dubbed in French that match perfectly since they're transcripts of the dub itself. Might be worth looking to see if Chinese fans ever make transcripts like this. 

 

Isn't that what viki.com does? I haven't investigated it fully.

Share this post


Link to post
Share on other sites
XiaoXi
2 hours ago, Flickserve said:

Isn't that what viki.com does? I haven't investigated it fully.

Maybe, I wasn't looking for that myself so don't know, just making a suggestion to 艾墨本.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×