Sinhala-English Word Embedding Alignment: Introducing Datasets and Benchmark for a Low Resource Language.
Kasun WickramasingheNisansa de SilvaPublished in: CoRR (2023)
Keyphrases
- english text
- parallel corpus
- word level
- language specific
- target language
- word alignment
- source language
- english language
- sentence pairs
- machine translation
- machine translation system
- language learning
- lexical information
- natural language
- bilingual dictionaries
- statistical machine translation
- indian languages
- word order
- language independent
- character n grams
- linguistic knowledge
- n gram
- cross lingual
- native language
- document images
- language processing
- vector space
- resource allocation
- unknown words
- semantic roles
- language identification
- word meanings
- cross language information retrieval
- word segmentation
- programming language
- word forms
- stop words
- language proficiency
- comparable corpora
- multiword
- translation model
- word recognition
- document analysis
- co occurrence
- out of vocabulary
- answer questions
- language skills
- word meaning
- foreign language
- language model