Crowdsourcing a Multi-lingual Speech Corpus: Recording, Transcription and Annotation of the CrowdIS Corpora.
Andrew CainesChristian BentzCalbert GrahamTim PolzehlPaula ButteryPublished in: LREC (2016)
Keyphrases
- multi lingual
- speech corpus
- information access
- language independent
- speech synthesis
- automatic speech recognition
- text corpus
- cross lingual
- hand crafted
- natural language processing
- text corpora
- language identification
- metadata
- information retrieval
- active learning
- speech recognition
- domain independent
- multimedia
- parallel corpora
- spoken document retrieval
- data mining
- bayesian networks
- feature selection