Clustering of short commercial documents for the web.
Moreno CarulloElisabetta BinaghiIgnazio GalloNicola LambertiPublished in: ICPR (2008)
Keyphrases
- web documents
- content similarity
- document clustering
- multilingual documents
- web data
- tag information
- tolerance rough set
- website
- web information
- clustering algorithm
- web applications
- text clustering
- clustering method
- web snippets
- metasearch engine
- information retrieval systems
- web pages
- information retrieval
- web people search
- textual data
- document classification
- web content
- k means
- digital documents
- newspaper articles
- hierarchical clustering
- text documents
- document repositories
- document collections
- web search
- information extraction
- topic specific
- text information
- web queries
- returned by a search engine
- open directory project
- document retrieval
- linked data
- data objects
- web mining
- current web search engines
- google scholar
- keywords
- search engine
- database
- semi structured
- structured information
- co occurrence
- user generated content
- vector space model
- answering questions
- web crawler
- electronic documents
- data interchange
- cosine similarity
- semantic web
- similarity measure
- link analysis
- web users
- query terms