Parallel and Distributed Document Overlap Detection on the Web.
Krisztián MonostoriArkady B. ZaslavskyHeinz W. SchmidtPublished in: PARA (2000)
Keyphrases
- web documents
- website
- distributed processing
- distributed systems
- detection algorithm
- parallel processing
- web applications
- web content
- detection method
- web pages
- digital documents
- map reduce
- load balance
- distributed environment
- document collections
- information sources
- document images
- retrieval systems
- database
- content similarity
- master slave
- web crawler
- trec web
- user generated content
- heterogeneous environments
- web technologies
- parallel execution
- document classification
- keywords
- multi agent
- end users
- object detection
- relevant content
- information retrieval systems
- peer to peer
- semantic web
- cf loadingtexthtml
- text documents
- parallel computing
- shared memory
- document representation
- link analysis
- web users
- web mining
- false positives
- text categorization
- anomaly detection
- web search
- information retrieval