Countering Load-to-Use Stalls in the NVIDIA Turing GPU.
Ram RanganNaman TurakhiaAlexandre JolyPublished in: IEEE Micro (2020)
Keyphrases
- graphics processing units
- graphics processors
- parallel implementation
- graphics hardware
- gpu implementation
- general purpose
- load balancing
- cpu implementation
- parallel computing
- compute unified device architecture
- parallel computation
- real time
- machine intelligence
- efficient implementation
- parallel processing
- parallel algorithm
- case study
- parallel programming
- turing machine
- search algorithm
- gpu accelerated
- artificial intelligence
- neural network
- massively parallel
- times faster
- distributed systems