DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification.
Daegun YoonSangyoon OhPublished in: ICPP (2023)
Keyphrases
- probabilistic model
- computational model
- formal model
- experimental data
- mathematical model
- data sets
- multi layer
- statistical model
- theoretical framework
- theoretical analysis
- objective function
- decision trees
- maximum likelihood
- input data
- least squares
- probability distribution
- reinforcement learning
- case study
- bayesian framework
- database