A Massive Collection of Cross-Lingual Web-Document Pairs.
Ahmed El-KishkyVishrav ChaudharyFrancisco GuzmánPhilipp KoehnPublished in: CoRR (2019)
Keyphrases
- cross lingual
- web documents
- machine translation
- cross language
- cross lingual information retrieval
- information extraction
- language modeling
- language independent
- web pages
- prefetching
- event extraction
- text classification
- web search engines
- parallel corpus
- web logs
- document collections
- pairwise
- news articles
- vector space model
- query translation
- keywords
- document clustering
- mono lingual
- related documents
- query expansion
- knowledge discovery
- data analysis