Sinhala-English Word Embedding Alignment: Introducing Datasets and Benchmark for a Low Resource Language.
Kasun WickramasingheNisansa de SilvaPublished in: PACLIC (2023)
Keyphrases
- english text
- parallel corpus
- word level
- word alignment
- language specific
- target language
- machine translation system
- english language
- machine translation
- sentence pairs
- source language
- language learning
- natural language
- bilingual dictionaries
- lexical information
- indian languages
- language independent
- word order
- cross lingual
- statistical machine translation
- linguistic knowledge
- english words
- character n grams
- word meanings
- n gram
- language identification
- cross language
- parallel corpora
- document analysis
- word forms
- word pairs
- co occurrence
- unknown words
- resource allocation
- document images
- out of vocabulary
- spoken language
- comparable corpora
- semantic roles
- language resources
- sentence level
- language processing
- query translation
- native language
- dynamic time warping