Belgisch Staatsblad Corpus: Retrieving French-Dutch Sentences from Official Documents.
Tom VanallemeerschPublished in: LREC (2010)
Keyphrases
- text corpus
- document level
- sentence level
- information retrieval
- training corpus
- word frequency
- multiword
- multi document summarization
- word frequencies
- newspaper articles
- text corpora
- text documents
- plain text
- document collections
- extractive summarization
- person names
- document summarization
- sentiment analysis
- noun phrases
- keyphrases
- language model
- document set
- lexical features
- information retrieval systems
- text classification
- document retrieval
- linguistic features
- query expansion
- mutual reinforcement
- text collections
- sentence extraction
- sentence similarity
- word sense
- semantic roles
- word pairs
- text summarization
- sentiment classification
- named entities
- natural language
- parallel corpora
- natural language text
- semantic relations
- relevant documents
- retrieval systems
- keywords