ParaCA: A Speculative Parallel Crawling Approach on Apache Spark.
Yuxiang LiZhiyong ZhangDanmei NiuJunchang JingPublished in: ICA3PP (1) (2020)
Keyphrases
- open source
- open source software
- search engine
- parallel processing
- map reduce
- parallel implementation
- distributed systems
- parallel hardware
- web server
- web pages
- information systems
- shared memory
- real time
- web crawlers
- web crawling
- focused crawling
- data sets
- resource discovery
- distributed memory
- data mining
- information extraction
- artificial intelligence
- database
- knowledge base
- web search
- general purpose