Does compressing activations help model parallel training?
Song BianDacheng LiHongyi WangEric P. XingShivaram VenkataramanPublished in: CoRR (2023)
Keyphrases
- computational model
- probabilistic model
- objective function
- theoretical framework
- statistical model
- management system
- real time
- parameter estimation
- theoretical analysis
- training algorithm
- data compression
- conceptual model
- test set
- training examples
- markov random field
- training set
- multiscale
- bayesian networks
- high level
- decision trees