IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP.
Fajri KotoAfshin RahimiJey Han LauTimothy BaldwinPublished in: COLING (2020)
Keyphrases
- benchmark datasets
- language model
- pre trained
- machine translation
- natural language processing
- language modeling
- training data
- document retrieval
- probabilistic model
- information retrieval
- n gram
- query expansion
- training examples
- speech recognition
- information extraction
- natural language
- retrieval model
- question answering
- statistical machine translation
- control signals
- test collection
- smoothing methods
- ad hoc information retrieval
- relevance model
- cross language information retrieval
- feature vectors
- pairwise
- semi supervised
- cross lingual
- machine learning