USZEGED: Correction Type-sensitive Normalization of English Tweets Using Efficiently Indexed n-gram Statistics.

Gábor Berend Ervin Tasnádi

Published in: NUT@IJCNLP (2015)

Keyphrases

n gram
language model
language specific
character n grams
word level
bag of words
text classification
word segmentation
language modeling
social media
language independent
language modelling
variable length
part of speech
natural language
web documents
information extraction
query translation
viterbi algorithm
artificial intelligence
machine translation
named entities
information retrieval
probabilistic model
text categorization
query expansion
knowledge discovery