Neural gradients are lognormally distributed: understanding sparse and quantized training.
Brian ChmielLiad Ben-UriMoran ShkolnikElad HofferRon BannerDaniel SoudryPublished in: CoRR (2020)
Keyphrases
- distributed environment
- cooperative
- avoid overfitting
- distributed systems
- recurrent networks
- sparse data
- training set
- neural network
- lightweight
- training examples
- fault tolerant
- network architecture
- communication cost
- elastic net
- feed forward neural networks
- distributed data
- training phase
- training algorithm
- training process
- mobile agents
- test set
- bio inspired
- high dimensional
- training samples
- neural model
- peer to peer
- hidden markov models
- artificial neural networks