High performance training of deep neural networks using pipelined hardware acceleration and distributed memory.
Raghav MehtaYuyang HuangMingxi ChengShrey BaggaNishant MathurJi LiJeffrey DraperShahin NazarianPublished in: ISQED (2018)
Keyphrases
- distributed memory
- parallel architecture
- neural network
- scientific computing
- ibm sp
- shared memory
- fine grain
- training process
- parallel implementation
- multithreading
- multiprocessor systems
- parallel computers
- data parallelism
- matrix multiplication
- artificial neural networks
- multi processor
- computer architecture
- high performance computing
- parallel machines
- parallel architectures
- parallel processing
- genetic algorithm
- data flow
- hardware implementation
- message passing
- dynamic programming
- lower bound
- computer science