Improving GPU Throughput through Parallel Execution Using Tensor Cores and CUDA Cores.

Khoa Ho Hui Zhao Adwait Jog Saraju P. Mohanty

Published in: ISVLSI (2022)

Keyphrases

parallel execution
parallel computing
parallel programming
parallel architectures
address space
transactional memory
message passing interface
parallel implementation
parallel processing
shared memory
massively parallel
parallel computation
parallel algorithm
gpu implementation
graphics processing units
graphics hardware
general purpose computing
real time
cost model
higher order
data partitioning
compute unified device architecture
computing systems
cloud computing
level parallelism
response time