Out of Vocabulary Words Decrease, Running Texts Prevail and Hashtags Coalesce: Twitter as an Evolving Sociolinguistic System.
Suman Kalyan MaityBhadreswar GhukuAbhishek UpmanyuAnimesh MukherjeePublished in: HICSS (2016)
Keyphrases
- out of vocabulary
- n gram
- language model
- microblog posts
- word segmentation
- named entity recognition
- spoken document retrieval
- broadcast news
- cross language information retrieval
- social media
- hand crafted
- parallel corpora
- term frequency
- query words
- language modeling
- text documents
- query terms
- named entities
- cross lingual
- topic models
- natural language
- machine translation
- query translation
- linguistic features
- information extraction
- information retrieval
- language independent
- wordnet
- text classification
- co occurrence
- text mining
- hidden markov models
- keywords