Machine learning with the TCGA-HNSC dataset: improving usability by addressing inconsistency, sparsity, and high-dimensionality.
Michael C. RendlemanJohn M. BuattiTerry A. BraunBrian J. SmithChibuzo NwakamaReinhard R. BeichelBartley BrownThomas L. CasavantPublished in: BMC Bioinform. (2019)
Keyphrases
- high dimensionality
- high dimensional
- machine learning
- feature selection
- high dimensional datasets
- dimensionality reduction
- feature space
- high dimensional data
- small sample size
- microarray
- dimension reduction
- microarray datasets
- gene expression data
- highly correlated
- feature ranking
- noisy data
- data analysis
- data dimensionality
- feature selection and classification
- data mining
- decision trees
- low dimensional
- machine learning methods
- high dimensional spaces
- data reduction
- feature set
- text classification
- class imbalance
- sparse representation
- learning algorithm
- real world
- principal component analysis
- supervised learning
- random projections
- data points
- pattern recognition
- similarity measure
- irrelevant features
- redundant features
- low dimensionality
- small number of samples