Length-aware Byte Pair Encoding for Mitigating Over-segmentation in Korean Machine Translation.
Jungseob LeeHyeonseok MoonSeungjun LeeChanjun ParkSugyeong EoHyunwoong KoJaehyung SeoSeungyoon LeeHeuiseok LimPublished in: ACL (Findings) (2024)
Keyphrases
- machine translation
- machine translation system
- language independent
- natural language processing
- cross lingual
- target language
- natural language generation
- image segmentation
- language resources
- cross language information retrieval
- language processing
- parallel corpora
- information extraction
- chinese english
- word order
- word segmentation
- query translation
- word sense disambiguation
- statistical machine translation
- pairwise
- word alignment
- natural language
- word level
- parallel corpus
- multilingual documents
- machine learning
- markov random field