Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine Translation of Lecture Transcripts.
Haiyue SongRaj DabreChenhui ChuAtsushi FujitaSadao KurohashiPublished in: CoRR (2023)
Keyphrases
- multistage
- machine translation
- fine tuning
- statistical machine translation
- chinese english
- parallel corpora
- parallel corpus
- machine translation system
- cross lingual
- cross language information retrieval
- word alignment
- dynamic programming
- natural language processing
- information extraction
- language resources
- language independent
- pos tagging
- target language
- comparable corpora
- spontaneous speech
- machine readable dictionaries
- query translation
- word sense disambiguation
- text mining
- english chinese
- bilingual lexicon
- statistical translation models
- source language
- natural language
- data mining
- word level
- co occurrence
- probabilistic model
- reinforcement learning
- machine learning