Evaluating Utility of Data Sources in a Large Parallel Czech-English Corpus CzEng 0.9.
Ondrej BojarAdam LiskaZdenek ZabokrtskýPublished in: LREC (2010)
Keyphrases
- data sources
- cross language
- link grammar
- parallel corpus
- language independent
- open domain
- person names
- data integration
- wide coverage
- statistical machine translation
- broad coverage
- cross lingual
- natural language
- english words
- cross language information retrieval
- mono lingual
- language learning
- cl sr
- data model
- utility function
- parallel corpora
- training corpus
- english language
- semantic roles
- databases
- machine translation system
- text retrieval
- linguistic features
- heterogeneous data sources
- chinese english
- multiword
- text classification
- comparable corpora
- answer questions
- data warehouse
- spontaneous speech
- information extraction
- manually generated
- integrating heterogeneous
- automatically generated
- sentence pairs
- geospatial data
- parallel processing