A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library.

Ganesh Bikshandi Jay Shah

Published in: CoRR (2023)

Keyphrases

parallel implementation
cross platform
gpu implementation
general purpose
graphics processors
management system
real time
kernel methods
efficient implementation
graphics hardware
fusion method
parallel computing
gaussian processes
test bed
information fusion
times faster
data fusion
similarity function
image fusion
multi sensor
hardware implementation
software architecture
infrared
kernel function
support vector
feature extraction
web services