Out of vocabulary words decrease, running texts prevail and hashtags coalesce: Twitter as an evolving sociolinguistic system.
Suman Kalyan MaityBhadreswar GhukuAbhishek UpmanyuAnimesh MukherjeePublished in: CoRR (2015)
Keyphrases
- out of vocabulary
- language model
- word segmentation
- n gram
- microblog posts
- cross language information retrieval
- spoken document retrieval
- broadcast news
- named entity recognition
- query words
- parallel corpora
- social media
- named entities
- hand crafted
- language modeling
- cross lingual
- text documents
- query terms
- information extraction
- probabilistic model
- machine translation
- natural language
- topic models
- term frequency
- keywords
- word recognition
- word level
- language independent
- natural language processing
- information retrieval
- maximum entropy
- speech recognition