SIMDization of Small Tensor Multiplication Kernels for Wide SIMD Vector Processors.
Christopher RodriguesAmarin PhaosawasdiPeng WuPublished in: WPMVP@PPoPP (2018)
Keyphrases
- parallel algorithm
- parallel processing
- processor array
- wide range
- small number
- highly parallel
- single instruction multiple data
- real time
- mesh connected
- higher order
- parallel implementation
- multiple kernel learning
- dimensionality reduction
- high performance computing
- parallel processors
- gaussian processes
- high order