Taming Resource Heterogeneity In Distributed ML Training With Dynamic Batching.
Sahil TyagiPrateek SharmaPublished in: ACSOS (2020)
Keyphrases
- scheduling problem
- distributed systems
- dynamic environments
- multi agent
- training set
- dynamic resource allocation
- fault tolerant
- distributed environment
- maximum likelihood
- cooperative
- test set
- digital libraries
- single machine
- communication cost
- training process
- data transfer
- resource sharing
- heterogeneous environments
- web services