Corpus specificity in LSA and Word2vec: the role of out-of-domain documents.
Edgar AltszylerMariano SigmanDiego Fernández SlezakPublished in: CoRR (2017)
Keyphrases
- word frequencies
- latent semantic analysis
- text corpus
- multiword
- word pairs
- co occurrence
- sentence level
- natural language text
- training corpus
- linguistic information
- word co occurrence
- word frequency
- document level
- information retrieval
- parallel corpus
- document space
- word spotting
- noun phrases
- word sense
- newspaper articles
- text documents
- information retrieval systems
- keywords
- stop words
- text corpora
- web documents
- related words
- automatic summarization
- statistical machine translation
- document collections
- unknown words
- tf idf
- document retrieval
- sentiment analysis
- semantic relations
- english words
- text data
- writing style
- document representation
- term frequency
- part of speech
- retrieval systems