SIMDization of Small Tensor Multiplication Kernels for Wide SIMD Vector Processors.

Christopher Rodrigues Amarin Phaosawasdi Peng Wu

Published in: WPMVP@PPoPP (2018)

Keyphrases

parallel algorithm
parallel processing
processor array
wide range
small number
highly parallel
single instruction multiple data
real time
mesh connected
higher order
parallel implementation
multiple kernel learning
dimensionality reduction
high performance computing
parallel processors
gaussian processes
high order