New approach for collecting high quality parallel corpora from multilingual websites.
Cong-Phap HuynhPublished in: iiWAS (2011)
Keyphrases
- parallel corpora
- cross language information retrieval
- cross lingual
- comparable corpora
- language independent
- language resources
- cross lingual information retrieval
- cross language
- machine translation
- bilingual dictionaries
- website
- machine translation system
- chinese english
- query translation
- labor intensive
- parallel corpus
- language modeling
- statistical machine translation
- text retrieval
- word pairs
- sentence level
- document retrieval
- document collections
- natural language processing
- text classification
- question answering
- web pages
- information retrieval