Autotuning Batch Cholesky Factorization in CUDA with Interleaved Layout of Matrices.
Mark GatesJakub KurzakPiotr LuszczekYu PeiJack J. DongarraPublished in: IPDPS Workshops (2017)
Keyphrases
- kronecker product
- singular value decomposition
- measurement matrix
- sparse matrix
- data matrix
- parallel implementation
- sparse matrices
- matrix completion
- random projections
- low rank
- general purpose
- matrix factorization
- tensor factorization
- singular values
- kernel matrix
- pairwise comparison
- factorization method
- low rank matrix
- multibody
- parallel computing
- pid controller
- original data
- missing data
- pairwise
- collaborative filtering
- layout design
- linear algebra
- gene expression data
- coefficient matrix
- least squares
- active learning