SimpleBooks: Long-term dependency book dataset with simplified English vocabulary for word-level language modeling.
Huyen NguyenPublished in: CoRR (2019)
Keyphrases
- language modeling
- word level
- n gram
- language model
- word segmentation
- language independent
- cross lingual
- english vocabulary
- chinese text retrieval
- retrieval model
- information retrieval
- probabilistic model
- query expansion
- text classification
- machine translation
- word recognition
- document images
- relevance model
- language learning
- sentence level
- information retrieval systems
- translation model
- document level
- machine learning
- high dimensional
- cross language
- character recognition
- text documents