Jump to content
Chinese-Forums
  • Sign Up

Common Voice open-source transcribed audio dataset


大块头

Recommended Posts

thelearninglearner

this is actually pretty cool. I will try to use some of this in a morphman deck. just need to figure out how to translate sentences properly and make sure the ones i add are spoken accurately. do they have a separate section of the ones spoken well? I have to take a look

Link to comment
Share on other sites

53 minutes ago, thelearninglearner said:

do they have a separate section of the ones spoken well?

 

After a sentence is recorded it then goes through a validation process where other volunteers listen to it and confirm that the speaker read the sentence correctly. The number of up or down votes a sentence recording got is a part of the dataset.

Link to comment
Share on other sites

  • 5 weeks later...

Sounds awesome. Thanks for tagging me in. I'm getting confused navigating the site though, how does one access the dataset?

 

EDIT: Nevermind, I'm an idiot, but I'll blame it on listening to 王菲 too loud and disorientating myself.

 

One thing I noticed about the Taiwan set (haven't checked China yet) is that the dataset is overwhelmingly male, which I view as a good thing for myself. So many learning materials are female spoken.

Link to comment
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...