Learn Chinese in China
eslang

[Tool] Autosub

6 posts in this topic

I came across the Autosub tool from the d-addicts forum.

 

Autosub is a utility for automatic speech recognition and subtitle generation.
https://github.com/agermanidis/autosub

Install AutoSub Step to Step in Windows with Translate subtitle
https://github.com/agermanidis/autosub/issues/31

 

So I tried out the autosub tool on this program show 锵锵三人行 where realmayo had put a link

Transcripts for recent 锵锵三人行 episodes

 

and over at the "any good TV series recently?" topic thread:

  

 
    The holy grail for me I think is to find good shows which also have "soft" subtitles (in Chinese) available,
that is, a downloadable text file of subtitles.
  

   
@realmayo, would you mind sharing with us which gems you have found (the combo mp4/srt files)
and where we can download them? 锵锵三人行 doesn't appear to have subtitles at all.

 


Please refer to the attach file (picture) to take a look at the "soft" subtitles.

 

Overall, I find that the autosub tool managed to capture some lines correctly compared to the youtube captions feature.

post-49976-0-52726800-1481188890_thumb.jpg

2 people like this

Share this post


Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

@wibr - Thanks for the additional information, that's certainly good to know and keep in mind.  I'm just curious about which program show that you tried it out on and what do you think of the quality of the "soft subtitles"?

So far, I have tried out on 1 English documentary,  2 Chinese documentary and 3 Japanese documentary, trailer and drama programs.  Most likely I have not hit the 60 minutes per month limitation yet. 

[Edit]

This software tool was installed two days ago and the following program shows were tested:

 

1 English - Documentary (4mins 4sec)
2 Chinese - Documentary (5mins 1sec) 锵锵三人行
2 Chinese - Documentary (4mins 58sec) 文明之旅
3 Japanese - Documentary (6mins 11sec)
3 Japanese - Trailer (1min 39sec)
3 Japanese - Drama (51mins 20sec)
Total Time : 73mins 13sec

 

The Japanese documentary have around 56% correct phrases, 30% incorrect phrases, and 14% grey-area phrases.

The Japanese drama have around 34% correct phrases, 40% incorrect phrases, and 26% grey-area phrases.

 

Correct phrases - 正字

Incorrect phrases - 誤字・脱字

Grey-area phrases - 字余り

Share this post


Link to post
Share on other sites

I am not sure why wibr have the API key problem.  In any case, I managed to run the software tool smoothly on 4 episodes (about 50min per episode) of Japanese drama yesterday.

 

I have edited the "auto-generated subtitles" for the talk-show 锵锵三人行(about the first 5min) using Aegisub to fine-tune and adjust the subtitle timing, then copy and paste the relevant text into the subtitle line.

 

The transcript can be found in this link below.

中国拍不出好电影 这事能怪小鲜肉吗?_凤凰卫视
http://phtv.ifeng.com/a/20161111/44491382_0.shtml

The soft-subtitle (srt file) is attached

 

 

中国拍不出好电影 这事能怪小鲜肉吗?_凤凰卫视_1.srt

Share this post


Link to post
Share on other sites

I haven't actually tried the software, I just checked the github page and found that the speech api key is hardcoded in the sourcefiles. I am not really familiar with the google apis and how they are billed, so maybe I am missing something here, but according to the website only the first 60min are free. So if you manage to go above the 60min, as far as I understand it, the owner of the api key will have to pay for that.

Share this post


Link to post
Share on other sites

@ wibr  -  It is likely that most of the software developers or owner of the API key (who have engaged the software developer) will have to pay for that amount being billed.  In such cases, the software is still relatively unknown and unheard, so it is unlikely that end-users will fork out money to pay for some beta-version or prototype model.  Metaphorically speaking, it is also not likely that potential customers are willing to pay for gasoline when they visit the showroom to test drive some new car models.

If you (or others) happen to test out the software later, it would be great to hear from you (or other people) about the quality of the auto-generated subtitles. :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now