A Corpus of Wikipedia Discussions: Over the Years, with Topic, Power and Gender Labels.
Vinodkumar PrabhakaranOwen RambowPublished in: LREC (2016)
Keyphrases
- wikipedia articles
- document corpus
- topic tracking
- contextual features
- power consumption
- named entity disambiguation
- world knowledge
- topic models
- topic detection and tracking
- document level
- topic segmentation
- named entities
- short texts
- semantic features
- concept space
- knowledge base
- link structure
- training data
- conversational speech
- multi label
- writing style
- natural language text
- information retrieval
- pairwise
- semantic relations
- text data
- news stories
- text corpus
- search queries
- document clustering
- semantic relatedness
- wordnet
- user generated content
- individual differences
- multi document summarization
- topic modeling