Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers.
Harini MuthukrishnanDavid W. NellansDaniel LustigJeffrey A. FesslerThomas F. WenischPublished in: ISCA (2021)
Keyphrases
- fine grained
- coarse grained
- shared memory
- parallel computing
- parallel architectures
- parallel algorithm
- parallel computation
- massively parallel
- message passing
- parallel programming
- multithreading
- distributed memory
- compute unified device architecture
- graphic processing unit
- low overhead
- access control
- multi processor
- high performance computing
- parallel machines
- efficient implementation
- parallel processing
- real time
- parallel implementation
- parallel architecture
- parallel execution
- cloud computing
- web services