Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations.
Jing GongStefano MarkidisErwin LaureMatthew OttenPaul F. FischerMisun MinPublished in: J. Supercomput. (2016)
Keyphrases
- graphics processors
- parallel computation
- gpu implementation
- parallel programming
- general purpose
- graphics processing units
- graphics hardware
- compute unified device architecture
- parallel algorithm
- parallel implementations
- parallel processing
- efficient implementation
- parallel implementation
- cpu implementation
- parallel computing
- scientific computing
- high end
- information retrieval
- machine learning
- processing units
- operating system
- clustering algorithm
- real time