IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP.
Fajri KotoAfshin RahimiJey Han LauTimothy BaldwinPublished in: CoRR (2020)
Keyphrases
- benchmark datasets
- language model
- pre trained
- machine translation
- natural language processing
- language modeling
- training data
- n gram
- document retrieval
- training examples
- retrieval model
- probabilistic model
- information extraction
- question answering
- natural language
- mixture model
- speech recognition
- test collection
- information retrieval
- query expansion
- control signals
- smoothing methods
- ad hoc information retrieval
- cross lingual
- query terms
- text mining
- translation model
- statistical machine translation
- training set