Reddit Temporal N-gram Corpus and its Applications on Paraphrase and Semantic Similarity in Social Media using a Topic-based Latent Semantic Analysis.
Anh DangAbidalrahman Moh'dAminul IslamRosane MinghimMichael SmitEvangelos E. MiliosPublished in: COLING (2016)
Keyphrases
- n gram
- semantic similarity
- latent semantic analysis
- co occurrence
- word pairs
- topic modeling
- topic models
- language model
- text classification
- latent dirichlet allocation
- wordnet
- vector space model
- language independent
- bag of words
- similarity measure
- text corpora
- language modeling
- visual words
- information retrieval
- parallel corpus
- semantic relations
- semantic features
- tf idf
- latent semantic indexing
- word segmentation
- part of speech
- probabilistic model
- text data
- named entities
- semantic information
- web documents
- databases