Identifying Bilingual Topics in Wikipedia for Efficient Parallel Corpus Extraction and Building Domain-Specific Glossaries for the Japanese-English Language Pair.
Bartholomäus WlokaPublished in: BUCC@LREC (2018)
Keyphrases
- parallel corpus
- english language
- domain specific
- cross lingual
- cross language information retrieval
- machine translation
- word alignment
- information extraction
- query translation
- machine translation system
- information retrieval
- sentence pairs
- language independent
- statistical machine translation
- parallel texts
- text documents
- topic modeling
- named entities
- parallel corpora
- bilingual dictionaries
- language model