Language chunking, data sparseness, and the value of a long marker list: explorations with word n-grams and authorial attribution.
Alexis AntoniaHugh CraigJack ElliottPublished in: Lit. Linguistic Comput. (2014)
Keyphrases
- n gram
- data sparseness
- language modeling
- word segmentation
- language model
- language specific
- character n grams
- linguistic knowledge
- language independent
- cross lingual
- bag of words
- natural language
- text classification
- variable length
- part of speech
- probabilistic model
- out of vocabulary
- document retrieval
- word sense disambiguation
- retrieval model
- information retrieval
- active learning
- test collection
- information retrieval systems
- natural language processing
- knn
- artificial intelligence