Jump to content
Learn Chinese in China

extract hard subtitle in Chinese clip


Recommended Posts

having read those thread

certainly, extract hard subtitle in Chinese movie file is thorn in the side.


VideoSubFinder is a free program that allows you to autodetect a video frame by frame and extract hardcoded subtitles to a series of image grabs with text based on text mining algorithms for further OCR process. Closely follow the steps below.

download and install VideoSubFinder here: https://sourceforge.net/projects/videosubfinder/.
since this program requires "Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019" installed on the PC

run "VideoSubFinderWXW.exe", click on "File" and select "Open Video (OpenCV)" to import the video file embedded with hardsubs.

after importing the video file, drag the slider along the progress bar to locate the subtitle position (in this case Chinese). To precisely frame the area where the subtitle appears in the video.

to eliminate redundant video screen part. press "Run Search" button to autodetect the hardsubs.

when the process is finished, switch to OCR tab and click on "Create Cleared TXT Images". After done, it will produce a number of large cleared image sequences with text in the TXTImages folder of VideoSubFinder root directory,

DO not close the program and go to the next step.

at this point, we have to use some image to text recognition software for OCR process.

use the free command-line OCR engine tesseract. be sure to register the folder path in environment variable in PC.

use below command in prompt.

for %i in (C:\your_dir\Release_x64\TXTImages\*.jpeg) do tesseract -l chi_sim --oem 2 --tessdata-dir C:\your_dir\AppData\Local\Programs\Tesseract-OCR\tessdata --psm 6 %i "%~dpni"

use the %% in case of .bat file instead of %.

it takes a while conversion to be completed depending on the media length so let’s sip a cup of coffee.

now converted txt file appeared same holder and all file must be to move to C:\your_dir\Release_x64\TXTResults

now go back to VideoSubFinder, hit "Create Sub From TXT Results" button to generate a .ass (recommended) subtitle file. Rename the subtitle file and save it.

there are some garbled text in the file so edit accordingly. this method is far from the perfect but usable in relatively short video.

as aside next time use this app be sure no files in three holder namely ILAImages, ISAImages, RGBImages to pick up previous one.

use in another app like subs2srs to study.

that’s it.





  • Like 1
Link to comment
Share on other sites

Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

Interesting but Windoze only.  Anyone know of the Mac equivalent method?

I noticed you show a clip from 开端 Reset drama 😎.

Also wanted to point out that there's no need to go to such trouble for this specific drama, you can just download the full transcript with timings from YouTube.


Link to comment
Share on other sites

  • 2 months later...

For people who aren’t comfortable with the command-line interface, the free Subtitle Edit software has an OCR (optical character recognition) that can process images into text. It also uses the open-source tesseract OCR.



Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Create New...