Jump to content
Chinese-forums.com
Learn Chinese in China

I made a Windows app for processing language mp3 files


Recommended Posts

Not sure if this is allowed, and whether this would be better in the resources forum?

 

Basically I had many mp3 files in the "exam format" where each dialog is spoken at natural speed followed by a long silence, and I found such files were not so useful for general listening practice (but almost every chinese language book I've bought includes such files). So I made this app to batch process these files into a more digestible format.

 

It's a windows app. It takes a set of mp3 files as input, and outputs a set of mp3 files. The operations it performs are user configurable (via a wizard) and includes things like stripping out the silences, repeating dialogs and slowing down dialogs.

 

Let me know if you find it useful. If so, I may port it to android and/or make it a real-time (instead of file-based) tool.

 

Download page (github)

  • Like 2
  • Helpful 1
Link to post
Share on other sites
Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

mungouk

I've been thinking of doing something very similar (although I don't know what your output sounds like) using SoX tools on the command line. 

There's so much useful content out there but I guess it's all Hanban copyright... if only they had decided to make it CC instead.

 

I'm a Mac user but will check out your github later when I have time. Cheers!

Link to post
Share on other sites
Flickserve

If I have a whole bunch of MP3’s of single sentences at 100% speed, can it batch process them to 80% speed?

Link to post
Share on other sites
36 minutes ago, Flickserve said:

If I have a whole bunch of MP3’s of single sentences at 100% speed, can it batch process them to 80% speed?

 

Yes absolutely; the app is designed for just this kind of thing. And not just change the whole thing to 80% speed, but you could, say, play sentence 1 at 80%, then sentence 1 at 100%, then sentence 2 at 80%, then sentence 2 at 100%, and so on.

 

Oh, and in reference to sound quality, it changes the tempo, not just speed i.e. the voice pitch will not be changed.

 

BUT it is an early version of the app, and that's why I put it here and not in Resources because I'm expecting feedback. I've processed hundreds of files with it already but it's possible that there are bugs. Also NB the app will not overwrite files -- it insists that you output to a different folder than the original files, to protect against any possibility of losing data.

  • Like 2
Link to post
Share on other sites
alantin

Sounds useful!

My most recent use case is recording a session with a tutor, then separating the two people speaking into two different audio files and stripping silences from those. My own voice for later to compare if I can discern any improvement and the tutor's for listening practice.

 

I wonder if anyone knows anything that would do this for me?

It probably would require some kind of AI tool to separate the two.
Currently I'm doing this by hand in audacity.

Link to post
Share on other sites

Well my app can auto strip the silences. 
And it can cut the audio sections into separate files. For example, imagine you have a file with 10 sentences padded with silence in between. It can strip the silences and save the result as one file, or it can strip the silences then save each sentence to a separate mp3 file. 

 

But no, currently there is no processing of different voices, so it cannot create, say, speakerA speakerB files

Link to post
Share on other sites
alantin

I did some googling and it seems that this is called a "cocktail party problem" and it's so tricky that up until two or three years ago it wasn't possible to separate multiple voices talking on top of each other. Google (probably among others) has been however been researching it with pretty impressive results to use in video call noise cancelling and transcription, but I couldn't yet find any service that would do what I need. I think it should be available for anyone to use in a few years.

Link to post
Share on other sites
markhavemann

Very nice. Looks professional too. I'm not sure I have a use for this right now, but it's definitely a nice tool to have available. Thanks for sharing!

Link to post
Share on other sites
Flickserve
11 hours ago, alantin said:

but I couldn't yet find any service that would do what I need. I think it should be available for anyone to use in a few years.

 

Many Chinese programs have background music. If you listening to sentences and trying to mimic, reducing the background music would be great. 

Link to post
Share on other sites
Flickserve
11 hours ago, alantin said:

 

I wonder if anyone knows anything that would do this for me?

 

I use an app called Evaer. It can record the just the teachers voice on a single channel from a Skype call. But when you speak, there will be no sound on the recording.

 

 

I am not bothered about recording my own voice during the lesson. In lesson, I don't concentrate on pronunciation very much and prefer to fine tune it later. 

  • Like 1
  • Helpful 1
Link to post
Share on other sites
alantin
2 hours ago, Flickserve said:

Many Chinese programs have background music. If you listening to sentences and trying to mimic, reducing the background music would be great. 

 

I know a tool for removing background music!
 

https://vocalremover.org/

 

 

2 hours ago, Flickserve said:

I use an app called Evaer. It can record the just the teachers voice on a single channel from a Skype call. But when you speak, there will be no sound on the recording.

I am not bothered about recording my own voice during the lesson. In lesson, I don't concentrate on pronunciation very much and prefer to fine tune it later. 

 

Perfect! Somehow I failed to realize that you could tap into the skype call data directly!

I agree. Listening to your own voice too, albeit grueling for me, does help with improving pronunciation.

 

I tried recording a whole lesson for the first time  last week using the Skype recording function and I've listened to the recording multiple times now.

This is a new technique for me and I find it very helpful. I can understand enough to keep the conversation going, but I miss a lot of small things. Listening to it later allows me to pick up the missing pieces, notice the tutor using grammar points I didn't notice before (like 才 in a specific point in one sentence), what kinds of filler words or ways to correct sentences when the train of thought changes, and a lot of vocabulary that I missed the first time too. This gets a lot more efficient when you can separate only the tutors voice and then strip the silences.

  • Like 2
Link to post
Share on other sites
Jan Finster
On 2/14/2021 at 4:22 PM, Mijin said:

Let me know if you find it useful. If so, I may port it to android and/or make it a real-time (instead of file-based) tool.

 

I tried it with an audio file of TheChairMansBao and it did not work at all. I wanted it to split the text sentence by sentence according to silence and always ended up with one file, which was identical to the original. Splitting it at fixed intervals is not useful as sentence length obviously varies.

 

Incidentally, I found this recommendation from Steve Kaufmann (Lingq). The program he uses is Wavepad. It does the job you try to do perfectly: https://www.nch.com.au/splitter/index.html

Link to post
Share on other sites

Oh dear. Thanks for trying my app anyway. Can I get a copy of that audio file, I can find what went wrong.

thanks

 

I would guess that maybe the silences aren't entirely silent. Of course there is a tolerance for a certain amount of noise, but it's just an arbitrary cutoff. 

I guess WavePad must do some actual signal processing to tell the difference between content and hiss.

 

Thanks for pointing out WavePad too. I'll take a look at it and figure out if what I was trying to do is completely redundant :D

 

 

Link to post
Share on other sites
Jan Finster
11 minutes ago, Mijin said:

Oh dear. Thanks for trying my app anyway. Can I get a copy of that audio file, I can find what went wrong.

thanks

 

I would guess that maybe the silences aren't entirely silent. Of course there is a tolerance for a certain amount of noise, but it's just an arbitrary cutoff. 

I guess WavePad must do some actual signal processing to tell the difference between content and hiss.

 

Thanks for pointing out WavePad too. I'll take a look at it and figure out if what I was trying to do is completely redundant

 

You can just check it with some of the free lessons: https://www.thechairmansbao.com/

 

Yeah, I was wondering if you tried to reinvent the wheel 😉 From a personal programming challenge point of view, I can totally see the value of what you are doing. But, if you want to have a competitive product, make it better than Wavepad and free 😉

Link to post
Share on other sites

Ok, I just tested it out with the first lesson "Shanghai couples tie the knot".

 

This lesson is just a single dialogue though. Any pauses between sentences are typically less than 2 seconds; that's shorter than the threshold of my app for cutting sentences. The silence detection assumes that there is deliberate silence of over 2 seconds spacing out separate dialogs or questions. It's not designed for finding short pauses within one contiguous dialog.

 

Have you tried this with WavePad too? Because it does seem to me that if we're cutting sentences with shorter pauses it starts to get arbitrary what a sentence is. For example, people often pause for a second or more before or after a conjunction ("because", "otherwise", "in reality"), so those might get parsed into two or more sentences.

But I guess breaking on a conjunction may actually be desirable for language-learning anyway.

I'll make a version where the silence duration for detection is adjustable.

 

Regarding "competitive product", this was just something I made for my own use. I had no intention of sharing it even, it was just when a friend told me she never bothers to download the free audio files that come with her textbooks that I thought my tool might be useful for others.

Link to post
Share on other sites
Jan Finster
8 hours ago, Mijin said:

Have you tried this with WavePad too? Because it does seem to me that if we're cutting sentences with shorter pauses it starts to get arbitrary what a sentence is. For example, people often pause for a second or more before or after a conjunction ("because", "otherwise", "in reality"), so those might get parsed into two or more sentences.

But I guess breaking on a conjunction may actually be desirable for language-learning anyway.

Yes, I tried it with Wavepad but so far only with one file. It does seem to make meaningful splits, so not strictly sentence by sentence, but as you say possibly also sub-clauses. This is fine for me.

 

Link to post
Share on other sites
alantin

Hi @Mijin

 

I just tried out your application.

 

I fist recorded a session using Evaer as per @Flickserve's suggestion and then ran the audio through your application only leaving one second gaps between sections. It is beautiful! The time I needed after the lesson to end up with an audio file with only the tutor's speech was reduced from over an hour to just a few minutes! Worked like charm!

  • Like 1
Link to post
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...