Co-clustering of bilingual datasets as a mean for assisting the construction of thematic bilingual comparable corpora.
Guiyao KePierre-François MarteauPublished in: LREC (2014)
Keyphrases
- comparable corpora
- bilingual lexicon
- parallel corpora
- cross language information retrieval
- news articles
- machine translation
- terminology extraction
- cross lingual
- language modeling
- cross language
- text documents
- text corpora
- bilingual dictionaries
- word pairs
- language independent
- semi automatically
- text mining
- statistical machine translation
- translation model
- query translation
- machine translation system
- parallel corpus
- text categorization
- clustering algorithm
- bi directional
- labor intensive