Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers.
Azzam HaidarStanimire TomovJack J. DongarraNicholas J. HighamPublished in: SC (2018)
Keyphrases
- iterative refinement
- high precision
- higher order
- real time
- parallel implementation
- precision and recall
- parallel computing
- dimensionality reduction
- high order
- sat solvers
- parallel architectures
- tensor space
- numerically stable
- gpu accelerated
- gpu implementation
- multi core processors
- diffusion tensor
- floating point
- search space