SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings.
Masoud Jalili SabetPhilipp DufterHinrich SchützePublished in: CoRR (2020)
Keyphrases
- training data
- high quality
- data sets
- low quality
- parallel processing
- n gram
- decision trees
- co occurrence
- higher quality
- training set
- classification accuracy
- test data
- prior knowledge
- supervised learning
- noisy data
- vector space
- class labels
- parallel computing
- training examples
- word recognition
- dimensionality reduction
- pairwise
- training corpus
- parallel implementation
- training dataset
- test set
- classification models
- word sense disambiguation
- manifold learning
- high dimensional data
- text classification
- image quality
- distance measure
- learning algorithm