MultiUN: A Multilingual Corpus from United Nation Documents.
Andreas EiseleYu ChenPublished in: LREC (2010)
Keyphrases
- parallel corpus
- newspaper articles
- multilingual documents
- word frequencies
- person names
- information retrieval
- document level
- text corpora
- document collections
- text data
- comparable corpora
- information retrieval systems
- relevant documents
- multiword
- cross lingual
- natural language text
- document classification
- parallel corpora
- similar documents
- multilingual search
- text documents
- text collections
- document retrieval
- training corpus
- text corpus
- heterogeneous collections
- web documents
- metadata
- multilingual information retrieval
- xml documents
- language independent
- topic segmentation
- word frequency
- keywords
- document clustering
- cross language
- cross language information retrieval
- multi document summarization
- document corpus
- bilingual dictionaries
- linguistic information
- free text
- semantic information
- web pages