Balancing the composition of word embeddings across heterogenous data sets.
Stephanie BrandlDavid LassnerMaximilian AlberPublished in: CoRR (2020)
Keyphrases
- data sets
- data sources
- co occurrence
- high dimensional data
- synthetic data
- benchmark data sets
- n gram
- training set
- vector space
- real world data sets
- varying degrees
- keywords
- real world
- databases
- training data
- artificial intelligence
- database
- word sense disambiguation
- euclidean space
- word segmentation
- word recognition
- word meaning