Character-Based Machine Learning vs. Language Modeling for Diacritics Restoration.
Jurgita Kapociute-DzikieneAndrius DavidsonasAusra VidugirienePublished in: Inf. Technol. Control. (2017)
Keyphrases
- language modeling
- machine learning
- language model
- text classification
- retrieval model
- information retrieval
- query expansion
- n gram
- probabilistic model
- cross lingual
- document retrieval
- relevance model
- feature selection
- word segmentation
- trec collections
- model selection
- data mining
- information extraction
- knowledge discovery
- document length
- statistical language modeling
- comparable corpora
- translation model
- statistical language models
- test collection
- active learning
- decision trees
- pseudo relevance feedback
- term frequency
- data analysis
- dirichlet prior
- learning algorithm