Unsupervised blocking and probabilistic parallelisation for record matching of distributed big data.
Chenxiao DouYi CuiDaniel SunRaymond K. WongMuhammad AtifGuoqiang LiRajiv RanjanPublished in: J. Supercomput. (2019)
Keyphrases
- big data
- data intensive
- big data analytics
- cloud computing
- high volume
- data analysis
- commodity hardware
- unstructured data
- data management
- distributed environment
- distributed systems
- business intelligence
- data warehousing
- vast amounts of data
- massive data
- data processing
- semi supervised
- knowledge discovery
- data science
- social media
- health informatics
- data analytics
- information systems
- data driven decision making
- information processing
- database
- decision support
- peer to peer
- case study
- decision making
- data mining
- databases