Improving GPU Throughput through Parallel Execution Using Tensor Cores and CUDA Cores.
Khoa HoHui ZhaoAdwait JogSaraju P. MohantyPublished in: ISVLSI (2022)
Keyphrases
- parallel execution
- parallel computing
- parallel programming
- parallel architectures
- address space
- transactional memory
- message passing interface
- parallel implementation
- parallel processing
- shared memory
- massively parallel
- parallel computation
- parallel algorithm
- gpu implementation
- graphics processing units
- graphics hardware
- general purpose computing
- real time
- cost model
- higher order
- data partitioning
- compute unified device architecture
- computing systems
- cloud computing
- level parallelism
- response time