SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings.
Masoud Jalili SabetPhilipp DufterFrançois YvonHinrich SchützePublished in: EMNLP (Findings) (2020)
Keyphrases
- training data
- high quality
- training set
- decision trees
- test data
- parallel processing
- data sets
- higher quality
- low quality
- supervised learning
- test set
- prior knowledge
- training process
- manifold learning
- co occurrence
- learning algorithm
- vector space
- training corpus
- n gram
- keywords
- word recognition
- low dimensional
- support vector machine
- classification accuracy
- high resolution
- sequence alignment
- parallel computing
- learned from training data
- shared memory
- classification models
- noisy data
- training examples
- pairwise
- machine learning