Using Apache Spark for Ensuring Data Quality in Modern Data Lake Pipeline Architectures.
Martina SestakTimi VovkPublished in: SQAMIA (2023)
Keyphrases
- data quality
- data sets
- data cleaning
- quality assessment
- data privacy
- quality management
- data mining
- information loss
- data warehouse
- original data
- privacy preservation
- data processing
- poor quality
- data sources
- data analysis
- real world
- spatial data
- data cleansing
- distributed data
- open source
- knowledge discovery
- learning algorithm
- databases