USZEGED: Correction Type-sensitive Normalization of English Tweets Using Efficiently Indexed n-gram Statistics.
Gábor BerendErvin TasnádiPublished in: NUT@IJCNLP (2015)
Keyphrases
- n gram
- language model
- language specific
- character n grams
- word level
- bag of words
- text classification
- word segmentation
- language modeling
- social media
- language independent
- language modelling
- variable length
- part of speech
- natural language
- web documents
- information extraction
- query translation
- viterbi algorithm
- artificial intelligence
- machine translation
- named entities
- information retrieval
- probabilistic model
- text categorization
- query expansion
- knowledge discovery