Scaling Word2Vec on Big Corpus.
Bofang LiAleksandr DrozdYuhe GuoTao LiuSatoshi MatsuokaXiaoyong DuPublished in: Data Sci. Eng. (2019)
Keyphrases
- word frequencies
- english words
- text corpus
- multiword
- unknown words
- training corpus
- word pairs
- word sense
- sentence level
- noun phrases
- linguistic information
- natural language text
- lexical features
- spontaneous speech
- parallel corpus
- statistical machine translation
- co occurrence
- ambiguous words
- writing style
- manually annotated
- word sense disambiguation
- n gram
- word co occurrence
- stop words
- document level
- word recognition
- text corpora
- test set
- keywords
- spoken document retrieval
- word segmentation
- relation extraction
- part of speech
- semantic similarity
- chinese word segmentation
- recognizing textual entailment
- big data
- conversational speech
- wordnet
- sentence pairs