CUDA-NP: realizing nested thread-level parallelism in GPGPU applications.

Yi Yang Huiyang Zhou

Published in: PPOPP (2014)

Keyphrases

level parallelism
compute unified device architecture
parallel algorithm
graphics processing units
shared memory
parallel processing
multi core processors
instruction set
computational complexity
parallel computing
parallel implementation
general purpose
parallel programming
graphics hardware
gpu implementation
parallel computation
database systems
general purpose computing