Dealing with Sparse Document and Topic Representations: Lab Report for CHiC 2012
Philipp SchaerDaniel HienertFrank SawitzkiAndias Wira-AlamThomas LükePublished in: CoRR (2012)
Keyphrases
- document content
- document set
- topic discovery
- word clouds
- topic models
- information retrieval systems
- document images
- high dimensional
- topic hierarchy
- text documents
- information retrieval
- related documents
- latent topics
- document collections
- document clustering
- structured documents
- vector representation
- document retrieval
- document classification
- retrieval systems
- automatic summarization
- compressive sensing
- document level
- focused crawler
- sparse representation
- scientific papers
- text classification
- document corpus
- sparse data
- latent dirichlet allocation
- news articles
- short list
- document summaries
- cross document
- statistical topic models
- keyword extraction
- focused crawling
- topic specific
- term frequency
- relevant documents
- user queries
- text mining
- clustering algorithm