A Probabilistic Method to Predict Classifier Accuracy on Larger Datasets given Small Pilot Data.
Ethan HarveyWansu ChenDavid M. KentMichael C. HughesPublished in: ML4H@NeurIPS (2023)
Keyphrases
- test data
- synthetic data
- high accuracy
- training samples
- data sets
- input data
- computational cost
- database
- classification accuracy
- error rate
- data analysis
- small number
- prior knowledge
- statistical methods
- missing values
- probabilistic model
- data points
- support vector machine
- pairwise
- support vector
- noisy data
- classification method
- input vectors
- receiver operating characteristic curves
- high dimensional data
- feature set
- training data
- missing data
- classification algorithm
- training examples
- support vector machine svm
- data mining techniques
- raw data
- dimensionality reduction
- uncertain data
- roc curve
- training dataset
- dimensionality reduction methods
- unseen data
- similarity measure