Word Clustering Approach to Bilingual Document Alignment (WMT 2016 Shared Task).
Vadim ShchukinDmitry KhristichIrina GalinskayaPublished in: WMT (2016)
Keyphrases
- word alignment
- word level
- document clustering
- sentence pairs
- document classification
- machine translation
- parallel corpus
- clustering method
- multiword
- cross lingual
- k means
- document images
- source language
- language independent
- web documents
- keywords
- statistical machine translation
- clustering algorithm
- text clustering
- text documents
- noun phrases
- document collections
- n gram
- machine translation system
- document space
- information retrieval systems
- compound words
- cross language
- document analysis
- coreference resolution
- sentence level
- semi supervised
- word pairs
- semantic dependencies
- co occurrence
- english chinese
- bilingual lexicon
- document representation
- parallel corpora
- topic models
- text corpus
- translation model
- word recognition
- latent topics
- tf idf
- cross language information retrieval