Learning to classify short and sparse text & web with hidden topics from large-scale data collections.
Xuan Hieu PhanMinh Le NguyenSusumu HoriguchiPublished in: WWW (2008)
Keyphrases
- data collections
- learning to classify
- web documents
- semi structured
- web scale
- newspaper articles
- textual data
- text information
- keywords
- text classification
- data collection
- information retrieval
- website
- text documents
- semantic markup
- key concepts
- information retrieval and extraction
- text data
- web applications
- web pages
- topic models
- web images
- text mining
- data sets
- database
- information extraction
- ranked retrieval
- topic detection
- document collections
- digital data
- text collections
- xml data
- natural language processing
- topic modeling
- structured data
- web mining
- latent dirichlet allocation
- user interests
- anchor text
- natural language
- data mining
- real world