Login / Signup
Intermediate Data Placement Strategy for Different Data Skew Levels Based on Random Sampling in Spark.
Xueqian Gong
Chunlin Li
Youlong Luo
Published in:
ICBDC (2019)
Keyphrases
</>
random sampling
data sets
data distribution
original data
data sources
database
data analysis
active learning
random samples
decision trees
training data
training set
nearest neighbor
distributed systems
input data
sample size