NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages.
Genta Indra WinataAlham Fikri AjiSamuel CahyawijayaRahmad MahendraFajri KotoAde RomadhonyKemal KurniawanDavid MoeljadiRadityo Eko PrasojoPascale FungTimothy BaldwinJey Han LauRico SennrichSebastian RuderPublished in: CoRR (2022)
Keyphrases
- language resources
- cross lingual
- language independent
- machine translation
- multi lingual
- sentiment classification
- multilingual documents
- multilingual information retrieval
- sentiment analysis
- cross language information retrieval
- language specific
- parallel processing
- cross language
- database
- cross lingual information retrieval
- databases
- comparable corpora
- broadcast news
- opinion mining
- digital libraries
- sentence level
- language identification
- benchmark datasets
- metadata
- query translation
- parallel implementation
- polarity classification
- indian languages
- parallel corpus
- parallel corpora
- parallel computing
- feature set