Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview.
Alena ButrynaShan-Hui Cathy ChuIsin DemirsahinAlexander GutkinLinne HaFei HeMartin JanscheCibu JohnyAnna KatanovaOddur KjartanssonChenfang LiTatiana MerkulovaYin May OoKnot PipatsrisawatClara RiveraSupheakmungkol SarinPasindu De SilvaKeshan SodimanaRichard SproatTheeraphol WattanavekinJaka Aris Eko WibawaPublished in: CoRR (2020)
Keyphrases
- open source
- linguistic resources
- resource allocation
- resource management
- parallel corpora
- multi lingual
- web resources
- resource constraints
- resource consumption
- spoken language
- resource selection
- resource requirements
- speech recognition
- resource usage
- open source software
- information resources
- english text
- case study
- website
- resource sharing
- statistical machine translation
- language resources
- information retrieval
- resource availability
- resource discovery
- source code
- natural language processing
- web pages
- language identification
- cross language information retrieval
- audio visual
- low quality
- comparable corpora
- web services
- coordination mechanism
- search engine