Jump to content
Chinese-forums.com
Learn Chinese in China

大块头

Common Voice open-source transcribed audio dataset

Recommended Posts

Site Sponsors:
Pleco for iPhone / Android iPhone & Android Chinese dictionary: camera & hand- writing input, flashcards, audio.
Study Chinese in Kunming 1-1 classes, qualified teachers and unique teaching methods in the Spring City.
Learn Chinese Characters Learn 2289 Chinese Characters in 90 Days with a Unique Flash Card System.
Hacking Chinese Tips and strategies for how to learn Chinese more efficiently
Popup Chinese Translator Understand Chinese inside any Windows application, website or PDF.
Chinese Grammar Wiki All Chinese grammar, organised by level, all in one place.

thelearninglearner

this is actually pretty cool. I will try to use some of this in a morphman deck. just need to figure out how to translate sentences properly and make sure the ones i add are spoken accurately. do they have a separate section of the ones spoken well? I have to take a look

Share this post


Link to post
Share on other sites
大块头
53 minutes ago, thelearninglearner said:

do they have a separate section of the ones spoken well?

 

After a sentence is recorded it then goes through a validation process where other volunteers listen to it and confirm that the speaker read the sentence correctly. The number of up or down votes a sentence recording got is a part of the dataset.

Share this post


Link to post
Share on other sites
mungouk

Interesting dataset, especially since it's public domain.

 

Any idea how the "accent" field is encoded?  Where present it's just large numbers like 370000.

 

 

Share this post


Link to post
Share on other sites
mungouk

Oh, they're postal codes?

 

Share this post


Link to post
Share on other sites
NinKenDo

Sounds awesome. Thanks for tagging me in. I'm getting confused navigating the site though, how does one access the dataset?

 

EDIT: Nevermind, I'm an idiot, but I'll blame it on listening to 王菲 too loud and disorientating myself.

 

One thing I noticed about the Taiwan set (haven't checked China yet) is that the dataset is overwhelmingly male, which I view as a good thing for myself. So many learning materials are female spoken.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Click here to reply. Select text to quote.

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...