Balanced Word Clusters for Interpretable Document Representation.
Marco WrzalikDirk KrechelPublished in: WML@ICDAR (2019)
Keyphrases
- document representation
- document clustering
- index terms
- bag of words
- related documents
- document collections
- clustering algorithm
- n gram
- language model
- vector space model
- text documents
- data fusion
- document content
- vector space
- text mining
- web documents
- information retrieval
- semantic information
- background knowledge
- image classification
- text classification
- cluster analysis
- language modeling
- clustering method
- keywords
- knowledge discovery
- domain knowledge
- computer vision
- document retrieval
- data points
- k means
- search engine