Humanistic Buddhism Corpus: A Challenging Domain-Specific Dataset of English Translations for Classical and Modern Chinese.
Youheng W. WongNatalie PardeErdem KoyuncuPublished in: LREC/COLING (2024)
Keyphrases
- domain specific
- machine translation
- mono lingual
- english chinese
- chinese english
- cross lingual
- query translation
- machine translation system
- web corpora
- statistical machine translation
- parallel corpus
- unknown words
- event extraction
- link grammar
- parallel corpora
- target language
- person names
- parallel texts
- open domain
- word alignment
- general purpose
- chinese language
- english text
- foreign language
- domain independent
- cross language information retrieval
- question answering
- english language
- training corpus
- dependency parser
- tree bank
- bilingual dictionaries
- street view
- chinese characters
- wide coverage
- english words
- natural language
- natural language processing
- information extraction
- specific domains
- chinese web
- language learning
- source language
- cross language
- comparable corpora
- feature set
- noun phrases
- chinese text
- multiword
- linguistic features
- semantic roles
- broad coverage
- answer questions
- word segmentation
- text summarization
- penn treebank
- language independent
- million images
- news articles
- keyword extraction