Login / Signup

ElasticDL: A Kubernetes-native Deep Learning Framework with Fault-tolerance and Elastic Scheduling.

Jun ZhouKe ZhangFeng ZhuQitao ShiWenjing FangLin WangYi Wang
Published in: WSDM (2023)
Keyphrases
  • fault tolerance
  • deep learning
  • fault tolerant
  • load balancing
  • distributed systems
  • database replication
  • distributed computing
  • data sets
  • pairwise
  • response time
  • machine learning
  • dimensionality reduction
  • mobile agents