NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages.
Samuel CahyawijayaHoly LoveniaFajri KotoDea AdhistaEmmanuel DaveSarah OktaviantiSalsabil Maulana AkbarJhonson LeeNuur ShadieqTjeng Wawan CenggoroHanung Wahyuning LinuwihBryan WilieGalih Pradipta MuridanGenta Indra WinataDavid MoeljadiAlham Fikri AjiAyu PurwariantiPascale FungPublished in: IJCNLP (1) (2023)
Keyphrases
- high quality
- linguistic resources
- statistical machine translation
- low quality
- expressive power
- parallel corpora
- resource allocation
- resource management
- comparable corpora
- databases
- web resources
- cross lingual
- super resolution
- natural language processing
- ground truth
- first order logic
- higher quality
- resource constraints
- knowledge base
- multi lingual
- machine learning
- neural network