EMPAC: an English-Spanish Corpus of Institutional Subtitles.
Iris Serrat RoozenJosé Manuel Martínez MartínezPublished in: LREC (2020)
Keyphrases
- machine translation system
- mono lingual
- link grammar
- spanish language
- language identification
- pronominal anaphora
- statistical machine translation
- parallel corpus
- cross lingual
- person names
- open domain
- question answering
- machine translation
- sentence pairs
- broad coverage
- qa clef
- english words
- wide coverage
- multiword
- training corpus
- semantic roles
- natural language
- higher education
- english language
- parallel corpora
- word sense
- linguistic features
- internal and external
- cross language
- penn treebank
- chinese english
- target language
- manually annotated
- source language
- pos tagging
- comparable corpora
- cross language information retrieval
- unknown words
- english text
- tv broadcast
- speaker identification
- tv series
- answer questions
- information extraction