Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs.
Marcin KnapPawel CzarnulPublished in: J. Supercomput. (2019)
Keyphrases
- prefetching
- graphics processing units
- compute unified device architecture
- parallel implementation
- graphics processors
- cpu implementation
- gpu implementation
- parallel programming
- general purpose
- response time
- graphics hardware
- parallel computing
- access patterns
- cache misses
- parallel processing
- hit rate
- access latency
- memory bandwidth
- web prefetching
- parallel algorithm
- parallel computation
- web caching
- general purpose computing
- scientific computing
- shared memory
- web documents
- massively parallel
- caching scheme
- multiprocessor systems
- user perceived latency
- computing systems
- computational power
- web page prediction
- memory requirements
- hit ratio
- processing units
- parallel architectures
- high performance computing
- floating point
- web logs
- efficient implementation
- distributed memory
- multithreading
- machine learning
- main memory
- web objects
- multi core processors
- memory access
- memory management
- parallel machines
- single instruction multiple data
- scheduling algorithm
- web pages