Corpus Specificity in LSA and Word2vec: The Role of Out-of-Domain Documents.
Edgar AltszylerMariano SigmanDiego Fernández SlezakPublished in: Rep4NLP@ACL (2018)
Keyphrases
- word frequencies
- latent semantic analysis
- text corpus
- word pairs
- multiword
- natural language text
- co occurrence
- sentence level
- parallel corpus
- linguistic information
- word frequency
- training corpus
- text corpora
- word co occurrence
- word spotting
- information retrieval
- information retrieval systems
- related words
- document space
- document collections
- newspaper articles
- stop words
- keywords
- document level
- noun phrases
- latent topics
- document clustering
- document retrieval
- statistical machine translation
- text collections
- term frequency
- text data
- english words
- text documents
- retrieval systems
- word segmentation
- parallel corpora
- n gram