A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library.
Ganesh BikshandiJay ShahPublished in: CoRR (2023)
Keyphrases
- parallel implementation
- cross platform
- gpu implementation
- general purpose
- graphics processors
- management system
- real time
- kernel methods
- efficient implementation
- graphics hardware
- fusion method
- parallel computing
- gaussian processes
- test bed
- information fusion
- times faster
- data fusion
- similarity function
- image fusion
- multi sensor
- hardware implementation
- software architecture
- infrared
- kernel function
- support vector
- feature extraction
- web services