cuTT: A High-Performance Tensor Transpose Library for CUDA Compatible GPUs.
Antti-Pekka HynninenDmitry I. LyakhPublished in: CoRR (2017)
Keyphrases
- graphics processing units
- gpu implementation
- general purpose
- compute unified device architecture
- graphics hardware
- graphics processors
- parallel implementation
- parallel programming
- parallel computing
- high order
- scientific computing
- cpu implementation
- highly parallel
- parallel computation
- real time
- general purpose computing
- parallel processing
- massively parallel
- higher order
- tensor decomposition
- parallel algorithm
- computing systems
- processing units
- digital libraries
- diffusion tensor
- structure tensor
- floating point
- times faster
- tensor factorization
- high performance computing
- feature tracking
- distributed memory
- computational power
- shared memory
- fine grained
- dimensionality reduction
- image sequences