MIND - Mainstream and Independent News Documents Corpus.
Danielle CaledPaula CarvalhoMário J. SilvaPublished in: CoRR (2021)
Keyphrases
- person names
- news corpus
- news articles
- newspaper articles
- keywords
- word frequencies
- news stories
- text documents
- topic detection and tracking
- document level
- named entities
- text corpora
- topic tracking
- scientific papers
- text corpus
- web documents
- artificial intelligence
- information retrieval
- document collections
- similar documents
- xml documents
- wikipedia articles
- keyphrases
- relevant documents
- information retrieval systems
- news items
- document clustering
- training documents
- word pairs
- co occurrence
- multiword
- text collections
- topic segmentation
- document retrieval
- online news
- parallel corpora
- text data
- metadata
- parallel corpus
- topic detection
- automatic summarization
- text categorization
- textual content
- sentence level
- natural language text
- digital libraries
- part of speech
- document corpus
- vector space model
- search engine
- free text