Inferring Multilingual Domain-Specific Word Embeddings From Large Document Corpora.
Luca CaglieroMoreno La QuatraPublished in: IEEE Access (2021)
Keyphrases
- domain specific
- text corpus
- parallel corpus
- word frequency
- text corpora
- news corpus
- general purpose
- latent topics
- language independent
- multilingual information retrieval
- spoken document retrieval
- related words
- word level
- training corpus
- cross language
- comparable corpora
- printed documents
- keywords
- specific domains
- short list
- news articles
- document clustering
- cross language information retrieval
- language specific
- document images
- word sense disambiguation
- natural language processing
- multilingual documents
- document collections
- automatic summarization
- web documents
- named entities
- document corpus
- vector space
- noun phrases
- chinese english
- cross lingual
- compound words
- digital libraries
- text documents
- document retrieval
- term frequency
- source language
- co occurrence
- topic models
- lexical knowledge
- text collections
- relation extraction
- word sense
- document analysis
- retrieval systems
- parallel corpora
- information retrieval
- multiword
- statistical machine translation