Balancing the composition of word embeddings across heterogenous data sets.

Stephanie Brandl David Lassner Maximilian Alber

Published in: CoRR (2020)

Keyphrases

data sets
data sources
co occurrence
high dimensional data
synthetic data
benchmark data sets
n gram
training set
vector space
real world data sets
varying degrees
keywords
real world
databases
training data
artificial intelligence
database
word sense disambiguation
euclidean space
word segmentation
word recognition
word meaning