Login / Signup
Elvis Rojas
ORCID
Publication Activity (10 Years)
Years Active: 2019-2024
Publications (10 Years): 11
Top Topics
Learning Models
Floating Point
Restricted Boltzmann Machine
Deep Learning
Top Venues
CLUSTER
SBAC-PAD
Rev. Colomb. de Computación
J. Parallel Distributed Comput.
</>
Publications
</>
Elvis Rojas
,
Diego Pérez
,
Esteban Meneses
A characterization of soft-error sensitivity in data-parallel and model-parallel distributed deep learning.
J. Parallel Distributed Comput.
190 (2024)
Hairol Romero-Sandí
,
Gabriel Núñez
,
Elvis Rojas
A snapshot of parallelism in distributed deep learning training.
Rev. Colomb. de Computación
25 (1) (2024)
Gabriel Núñez
,
Hairol Romero-Sandí
,
Elvis Rojas
,
Esteban Meneses
A study of pipeline parallelism in deep neural networks.
Rev. Colomb. de Computación
25 (1) (2024)
Elvis Rojas
,
Diego Pérez
,
Esteban Meneses
Exploring the Effects of Silent Data Corruption in Distributed Deep Learning Training.
SBAC-PAD
(2022)
Elvis Rojas
,
Michael Knobloch
,
Nour Daoud
,
Esteban Meneses
,
Bernd Mohr
Early Experiences of Noise-Sensitivity Performance Analysis of a Distributed Deep Learning Framework.
CLUSTER
(2022)
Elvis Rojas
,
Esteban Meneses
,
Terry Jones
,
Don Maxwell
Understanding failures through the lifetime of a top-level supercomputer.
J. Parallel Distributed Comput.
154 (2021)
Elvis Rojas
,
Diego Pérez
,
Jon C. Calhoun
,
Leonardo Bautista-Gomez
,
Terry Jones
,
Esteban Meneses
Understanding Soft Error Sensitivity of Deep Learning Models and Frameworks through Checkpoint Alteration.
CLUSTER
(2021)
Elvis Rojas
,
Fabricio Quirós-Corella
,
Terry Jones
,
Esteban Meneses
Large-Scale Distributed Deep Learning: A Study of Mechanisms and Trade-Offs with PyTorch.
CARLA
(2021)
Elvis Rojas
,
Esteban Meneses
,
Terry Jones
,
Don Maxwell
Towards a Model to Estimate the Reliability of Large-Scale Hybrid Supercomputers.
Euro-Par
(2020)
Elvis Rojas
,
Albert Njoroge Kahira
,
Esteban Meneses
,
Leonardo Bautista-Gomez
,
Rosa M. Badia
A Study of Checkpointing in Large Scale Training of Deep Neural Networks.
CoRR
(2020)
Elvis Rojas
,
Esteban Meneses
,
Terry Jones
,
Don Maxwell
Analyzing a Five-Year Failure Record of a Leadership-Class Supercomputer.
SBAC-PAD
(2019)