XKBlas: a High Performance Implementation of BLAS-3 Kernels on Multi-GPU Server.
Thierry GautierJoão V. F. LimaPublished in: PDP (2020)
Keyphrases
- graphics processing units
- scientific computing
- highly optimized
- graphics cards
- real time
- highly parallel
- parallel implementation
- cluster of workstations
- low overhead
- client server
- web server
- general purpose
- feature space
- computation intensive
- kernel function
- graphics processors
- neural network
- hardware implementation
- efficient implementation
- parallel computation
- low latency
- high reliability
- image processing