Content-based document routing and index partitioning for scalable similarity-based searches in a large corpus.
Deepavali BhagwatKave EshghiPankaj MehraPublished in: KDD (2007)
Keyphrases
- document corpus
- inverted index
- similarity search in metric spaces
- information retrieval
- inverted lists
- index terms
- space partitioning
- database
- image retrieval
- document images
- document level
- index structure
- text corpus
- routing problem
- multimedia
- similar documents
- text collections
- document collections
- document retrieval
- vector space model
- document clustering
- scientific papers
- relevant documents
- information retrieval systems
- tf idf
- ad hoc networks
- web scale
- routing algorithm
- text documents
- retrieval strategies
- language model
- document space
- keywords
- retrieval systems
- word sense
- indexing scheme
- relevance feedback
- document representation
- indexing techniques