A parallel clustering algorithm for logs data based on Hadoop platform.
Jiuyuan HuoJian WengHong QuPublished in: HP3C (2019)
Keyphrases
- data sets
- clustering algorithm
- prior knowledge
- raw data
- big data
- probability distribution
- data processing
- statistical analysis
- training data
- high quality
- data analysis
- databases
- knowledge discovery
- data distribution
- missing data
- synthetic data
- data collection
- log data
- data reduction
- input data
- original data
- computer systems
- database
- data sources
- high dimensional data
- data mining techniques
- open source
- end users
- data objects
- data structure
- database systems
- data mining
- clustering analysis
- massive scale