Recycling Annotated Parallel Corpora for Bilingual Document Composition.
Arantza CasillasJoseba AbaituaRaquel Martínez-UnanuePublished in: AMTA (2000)
Keyphrases
- parallel corpora
- machine translation
- cross language information retrieval
- comparable corpora
- english chinese
- language independent
- cross lingual
- wikipedia articles
- cross language
- query translation
- machine translation system
- word pairs
- parallel texts
- labor intensive
- language resources
- sentence level
- document retrieval
- statistical machine translation
- bilingual dictionaries
- information retrieval
- chinese english
- document collections
- sentence pairs
- document clustering
- information retrieval systems
- out of vocabulary
- source language
- document images
- web documents
- retrieval systems
- manually annotated
- tf idf
- text documents
- text classification
- keywords
- parallel corpus
- translation model
- error prone
- machine learning
- fully automated