Jump to content
Chinese-Forums
  • Sign Up

Instantly Extract Chinese Subtitles Physically Embedded from Videos to Text File


PandaEye

Recommended Posts

  • New Members

I'm interested in creating a tool that can instantly extract Chinese Subtitles that are physically embedded from Any Chinese video and output to a text file with time stamps--unlocking an endless supply of the highest Quality learning resource (native content, audial, visual, transcriptions).

Transcriptions can be imported into Pleco's Screen Reader for immediate translations of the script without having having to manually search word definitions and can simultaneously be followed along while watching the video. Other apps can also be used like Chrome's Zhongwen Popup Dictionary, Hanping's Chinese Popup or any other method for more rapid and rich learning.

In order to make this tool a reality it would have to be sold as service in order to compensate those build and maintain the tool; the service would be a site/app where you provide a video/url and pay an inexpensive price per video hour/subscription and the hosted software would immediately deliver you the transcription file.

The software that extracts the subtitles would be required to be built on deep/machine learning principles (Artificial Intelligence). I've begun inquiring ML Engineers about their estimated cost to create this tool (Chinese engineers could also be potentially leveraged) and I intend to create a Kick Starter/funding campaign with the goal of creating the software, website/app and maintaining and improving the service if funding is met.

Link to inquiry of ML Engineers: https://www.reddit.com/r/deeplearning/comments/9iicq7/request_for_quotation_of_a_use_case_from_the_dl/

 

What do you think is the size of the market for this service, language learners and other use cases included? There would need to be enough people interested to meet funding otherwise it couldn't happen. What are your thoughts? 

Screenshot of (320) China News Intro _ Opener _ Logo 2015 (2) Chinese News Channel - YouTube.jpg

Link to comment
Share on other sites

Welcome to the forum!

 

I'm always glad to see people innovating in this space, so I'm sorry to pop your bubble...

 

I haven't used it personally, but it looks like this open-source Chrome extension does what you're describing?

 

https://chrome.google.com/webstore/detail/copyfish-?-free-ocr-soft/eenjdnjldapjajjofmldgmkjaienebbj?hl=en

 

 

  • Thanks 1
  • Helpful 2
Link to comment
Share on other sites

I tried it on two random sentences and with the first one it didn't get it at all, the second one was like in the attached image. Not sure why but not only did it not recognise the captured area, for some reason the last few characters were not part of the area I selected. The full sentence should have been 他肯定不会回来的. Maybe the background is not super dark but it's hardly super light either. It does seem to exist but it's bad going on awful. Personally I'd be more interested in software that could analyse a downloaded video file and OCR all the subtitles and produce a .srt file.

 

In fact to say I'd be interested in that is the understatement of the century.

屏幕快照 2018-09-25 下午1.47.15.png

  • Like 2
Link to comment
Share on other sites

56 minutes ago, XiaoXi said:

Personally I'd be more interested in software that could analyse a downloaded video file and OCR all the subtitles and produce a .srt file.

 

This. 

 

Not sure of the size of the market though. It is a lot of work. Most people end up just watching a video. 

Link to comment
Share on other sites

1 hour ago, Flickserve said:

Not sure of the size of the market though. It is a lot of work. Most people end up just watching a video. 

Probably not that big unfortunately because most people wouldn't know what to do with srt file to get the most out of it. But personally I know that an srt file is the holy grail of language learning.

  • Like 2
Link to comment
Share on other sites

2 hours ago, yaokong said:

XiaoXi, could you please explain in 2-3 sentences how you use it? 

I'd be interested to know how Flickserve uses it too. Well let me ask you, when you watch a Chinese tv series with hard coded subs and come across a word you don't know - what do you do?

Link to comment
Share on other sites

Me?

 

I am interested in hard subtitles. And then, you can make a srt file

 

A srt file lets you generate anki cards with the sentences, audio (and pictures) as an automated process. 

Link to comment
Share on other sites

12 hours ago, Flickserve said:

I am interested in hard subtitles. And then, you can make a srt file

Yes that was my suggestion, we were interested in the same thing so I wondered also how you used an srt file.

 

12 hours ago, Flickserve said:

A srt file lets you generate anki cards with the sentences, audio (and pictures) as an automated process. 

Oh ok, yes it's useful for that. The srt file is really so much more useful that hardcoded subs. So many possibilities. Is there software for that now? I remember there was software for Japanese a long time ago but no other languages.

Link to comment
Share on other sites

On 9/25/2018 at 1:52 PM, XiaoXi said:

Personally I'd be more interested in software that could analyse a downloaded video file and OCR all the subtitles and produce a .srt file.

 

This. I want to read the subtitles like a book and practice performing them as a way to learn language that fits specific situations and the tone of voice that goes along with it. I want to imitate the actors and do some 配音

Link to comment
Share on other sites

23 hours ago, Flickserve said:

I used subs2srs. The whole process was documented by TysonD on the forum.

Oh right, that's the exact same software I was referring to. The one made originally for Japanese.

 

18 hours ago, 艾墨本 said:

This. I want to read the subtitles like a book and practice performing them as a way to learn language that fits specific situations and the tone of voice that goes along with it. I want to imitate the actors and do some 配音

Yes hopefully the OP hasn't gone to a better place and maybe he can make this software since there seems to be more demand for it. Not to mention that what he is proposing seems to already exist, even though it doesn't work that well. If you read the SRT file as a book how will you be able to actually hear the voices to imitate them?

 

Btw you can already do this with movies since SRT subtitles are normally available for the more popular Chinese movies.

Link to comment
Share on other sites

9 hours ago, XiaoXi said:

Yes hopefully the OP hasn't gone to a better place and maybe he can make this software since there seems to be more demand for it. Not to mention that what he is proposing seems to already exist, even though it doesn't work that well. If you read the SRT file as a book how will you be able to actually hear the voices to imitate them?

 

Btw you can already do this with movies since SRT subtitles are normally available for the more popular Chinese movies.

 

Pretty simply, actually. I'd read them slowly, looking up words and practicing at my own pace away from my computer (an important part of good studying for me) and then go back to my computer and watch the show.

 

As far as finding SRT files for popular movies. I challenge you to try finding some SRT files for 人民的名义 or how about the dubbed version of Avatar the last airbender. Both would be great for studying!

Link to comment
Share on other sites

On 10/3/2018 at 9:01 PM, 艾墨本 said:

Pretty simply, actually. I'd read them slowly, looking up words and practicing at my own pace away from my computer (an important part of good studying for me) and then go back to my computer and watch the show.

How do you look up the words?

 

On 10/3/2018 at 9:01 PM, 艾墨本 said:

As far as finding SRT files for popular movies. I challenge you to try finding some SRT files for 人民的名义 or how about the dubbed version of Avatar the last airbender. Both would be great for studying!

The subtitles for a foreign language movie are normally made by someone completely independent from the people who do the dubbing so are unlikely to match. Depends on the movie. Sometimes they match quite well, sometimes they don't match at all. With some movies the subtitles do indeed appear to be based off of the mandarin dubbing, like 猛龙过江 but it's not the norm unfortunately.

 

I found with French that fans make transcripts of tv shows dubbed in French that match perfectly since they're transcripts of the dub itself. Might be worth looking to see if Chinese fans ever make transcripts like this. 

  • Like 1
Link to comment
Share on other sites

6 minutes ago, XiaoXi said:

I found with French that fans make transcripts of tv shows dubbed in French that match perfectly since they're transcripts of the dub itself. Might be worth looking to see if Chinese fans ever make transcripts like this. 

 

Isn't that what viki.com does? I haven't investigated it fully.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...