ZAEBUC: An Annotated Arabic-English Bilingual Writer Corpus.
Nizar HabashDavid PalfreymanPublished in: LREC (2022)
Keyphrases
- parallel corpus
- sentence pairs
- statistical machine translation
- machine translation
- parallel corpora
- multiword
- cross lingual
- manually annotated
- chinese english
- comparable corpora
- cross language information retrieval
- machine translation system
- word alignment
- unknown words
- annotated corpus
- word forms
- parallel texts
- writer identification
- arabic language
- query translation
- cross language
- handwritten words
- source language
- english chinese
- word sense
- link grammar
- language identification
- language independent
- target language
- training corpus
- writer independent
- word pairs
- morphological analysis
- cross language retrieval
- named entity recognition
- open domain
- wordnet
- natural language processing
- relation extraction
- manually constructed
- english words
- bilingual lexicon
- machine readable dictionaries
- language resources
- language model
- text retrieval
- language modeling
- text corpora
- named entities
- proper names
- natural language
- word recognition
- bilingual dictionaries
- hand crafted
- information extraction