Cache-efficient implementation and batching of tridiagonalization on manycore CPUs.
Shuhei KudoToshiyuki ImamuraPublished in: HPC Asia (2019)
Keyphrases
- efficient implementation
- graphics processing units
- parallel architectures
- memory access
- highly parallel
- parallel computation
- scheduling problem
- shared memory multiprocessor
- prefetching
- parallel programming
- single machine
- active set
- query processing
- efficient processing
- processing units
- hardware implementation
- data access
- main memory
- parallel processing
- commodity hardware
- high end
- general purpose
- cache misses
- batch processing