Overcoming Limited Data Challenges: Training Large-Scale Deduplication Models through Distributed and Non-Distributed Methods.
Shraddha SuranaChinmay DhawanDigvijay GunjalRajesh TamhanePublished in: MCSoC (2023)
Keyphrases
- distributed data
- data mining techniques
- statistical methods
- predictive model
- data sets
- easily interpretable
- heterogeneous data
- data analysis
- learned models
- database
- prior knowledge
- data sources
- distributed systems
- data mining methods
- high dimensional data
- huge data sets
- data representations
- experimental data
- historical data
- real world
- data points
- data mining tools
- incomplete data
- training samples
- machine learning methods
- missing values
- decision trees
- data quality
- data collection
- training examples
- data processing
- statistical models
- data distribution
- data transfer
- peer to peer
- multi agent
- computational approaches
- data structure
- data intensive
- bayesian methods
- learning models
- distributed environment
- model selection