Login / Signup

A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations.

Azzam HaidarAhmad AbdelfattahMawussi ZounonStanimire TomovJack J. Dongarra
Published in: IEEE Trans. Parallel Distributed Syst. (2018)
Keyphrases
  • real time
  • graphics processing units
  • case study
  • small number
  • small sized
  • data sets
  • neural network
  • general purpose
  • graphics hardware
  • search engine
  • pairwise
  • cost effective
  • parallel processing
  • test bed
  • kernel matrix