Audacity of huge: overcoming challenges of data scarcity and data quality for machine learning in computational materials discovery.
Aditya NandyChenru DuanHeather J. KulikPublished in: CoRR (2021)
Keyphrases
- data quality
- machine learning
- data analysis
- data sets
- information loss
- knowledge discovery
- database
- training data
- data cleansing
- quality management
- data transformation
- data collection
- high dimensional data
- data processing
- data warehouse
- data cleaning
- data sources
- real world
- data reduction
- data privacy
- data preparation
- feature selection
- natural resources