Handling data skew in joins based on cluster cost partitioning for MapReduce.
Yang WangYong ZhongQingshan MaGuanci YangPublished in: Multiagent Grid Syst. (2018)
Keyphrases
- data skew
- parallel processing
- data distribution
- load balancing
- join algorithms
- join operations
- skewed data
- sort merge
- hash join
- data points
- map reduce
- feature selection
- database
- query evaluation
- cost sensitive
- main memory
- query optimization
- principal component analysis
- data model
- query processing
- digital libraries
- data sets