Near real-time thematic clustering of web documents and other internet contents.
Adrian PusztayJanos SzuleySándor LakiPublished in: CogInfoCom (2013)
Keyphrases
- web documents
- content similarity
- real time
- web content
- semi structured
- clustering algorithm
- web pages
- information extraction
- keywords
- web search engines
- document classification
- prefetching
- document clustering
- clustering method
- html documents
- vector space model
- document representation
- textual information
- structured documents
- databases
- web logs
- web directories
- relational databases
- unstructured documents