Login / Signup
Adrián Castelló
ORCID
Publication Activity (10 Years)
Years Active: 2022-2024
Publications (10 Years): 23
Top Topics
Deep Learning
Instruction Set
Matrix Multiplication
Efficient Inference
Top Venues
J. Supercomput.
CoRR
J. Syst. Archit.
ISC Workshops
</>
Publications
</>
Adrián Castelló
,
Julian Bellavita
,
Grace Dinh
,
Yuka Ikarashi
,
Héctor Martínez
Tackling the Matrix Multiplication Micro-Kernel Generation with Exo.
CGO
(2024)
Cristián Ramírez
,
Adrián Castelló
,
Héctor Martínez
,
Enrique S. Quintana-Ortí
Parallel GEMM-based convolution for deep learning on multicore RISC-V processors.
J. Supercomput.
80 (9) (2024)
Rafael Rodríguez-Sánchez
,
Adrián Castelló
,
Sandra Catalán
,
Francisco D. Igual
,
Enrique S. Quintana-Ortí
Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors.
Int. J. High Perform. Comput. Appl.
38 (2) (2024)
Guillermo Alaejos
,
Adrián Castelló
,
Pedro Alonso-Jordá
,
Francisco D. Igual
,
Héctor Martínez
,
Enrique S. Quintana-Ortí
Algorithm 1039: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM.
ACM Trans. Math. Softw.
50 (1) (2024)
Cristián Ramírez
,
Adrián Castelló
,
Héctor Martínez
,
Enrique S. Quintana-Ortí
Performance Analysis of Matrix Multiplication for Deep Learning on the Edge.
CoRR
(2024)
Guillermo Alaejos
,
Héctor Martínez
,
Adrián Castelló
,
Manuel F. Dolz
,
Francisco D. Igual
,
Pedro Alonso-Jordá
,
Enrique S. Quintana-Ortí
Automatic generation of ARM NEON micro-kernels for matrix multiplication.
J. Supercomput.
80 (10) (2024)
Héctor Martínez
,
Sandra Catalán
,
Adrián Castelló
,
Enrique S. Quintana-Ortí
Parallel GEMM-based convolutions for deep learning on multicore ARM and RISC-V architectures.
J. Syst. Archit.
153 (2024)
Guillermo Alaejos
,
Adrián Castelló
,
Héctor Martínez
,
Pedro Alonso-Jordá
,
Francisco D. Igual
,
Enrique S. Quintana-Ortí
Micro-kernels for portable and efficient matrix multiplication in deep learning.
J. Supercomput.
79 (7) (2023)
Manuel F. Dolz
,
Sergio Barrachina
,
Héctor Martínez
,
Adrián Castelló
,
Antonio-Manuel Vidal-Maciá
,
Germán Fabregat
,
Andrés E. Tomás
Performance-energy trade-offs of deep learning convolution algorithms on ARM processors.
J. Supercomput.
79 (9) (2023)
Manuel F. Dolz
,
Héctor Martínez
,
Adrián Castelló
,
Pedro Alonso-Jordá
,
Enrique S. Quintana-Ortí
Efficient and portable Winograd convolutions for multi-core processors.
J. Supercomput.
79 (10) (2023)
Adrián Castelló
,
Mar Catalán
,
Manuel F. Dolz
,
Enrique S. Quintana-Ortí
,
José Duato
Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks.
Computing
105 (5) (2023)
Adrián Castelló
,
Julian Bellavita
,
Grace Dinh
,
Yuka Ikarashi
,
Héctor Martínez
Tackling the Matrix Multiplication Micro-kernel Generation with Exo.
CoRR
(2023)
Guillermo Alaejos
,
Adrián Castelló
,
Pedro Alonso-Jordá
,
Francisco D. Igual
,
Héctor Martínez
,
Enrique S. Quintana-Ortí
Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM.
CoRR
(2023)
Francisco D. Igual
,
Luis Piñuel
,
Sandra Catalán
,
Héctor Martínez
,
Adrián Castelló
,
Enrique S. Quintana-Ortí
Automatic Generation of Micro-kernels for Performance Portability of Matrix Multiplication on RISC-V Vector Processors.
SC Workshops
(2023)
Sergio Barrachina
,
Adrián Castelló
,
Manuel F. Dolz
,
Tze Meng Low
,
Héctor Martínez
,
Enrique S. Quintana-Ortí
,
Upasana Sridhar
,
Andrés E. Tomás
Reformulating the direct convolution for high-performance deep learning inference on ARM processors.
J. Syst. Archit.
135 (2023)
Sergio Barrachina
,
Adrián Castelló
,
Mar Catalán
,
Manuel F. Dolz
,
José I. Mestre
Using machine learning to model the training scalability of convolutional neural networks on clusters of GPUs.
Computing
105 (5) (2023)
Adrián Castelló
,
Enrique S. Quintana-Ortí
,
Francisco D. Igual
Anatomy of the BLIS Family of Algorithms for Matrix Multiplication.
PDP
(2022)
Cristián Ramírez
,
Adrián Castelló
,
Enrique S. Quintana-Ortí
A BLIS-like matrix multiplication for machine learning in the RISC-V ISA-based GAP8 processor.
J. Supercomput.
78 (16) (2022)
Adrián Castelló
,
Sergio Barrachina
,
Manuel F. Dolz
,
Enrique S. Quintana-Ortí
,
Pau San Juan
,
Andrés E. Tomás
High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS.
J. Syst. Archit.
125 (2022)
Adrián Castelló
,
Sandra Catalán
,
Francisco D. Igual
,
Enrique S. Quintana-Ortí
,
Rafael Rodríguez-Sánchez
QR Factorization Using Malleable BLAS on Multicore Processors.
ISC Workshops
(2022)
Cristián Ramírez
,
Adrián Castelló
,
Héctor Martínez
,
Enrique S. Quintana-Ortí
Performance Analysis of Matrix Multiplication for Deep Learning on the Edge.
ISC Workshops
(2022)
Manuel F. Dolz
,
Adrián Castelló
,
Enrique S. Quintana-Ortí
Towards Portable Realizations of Winograd-based Convolution with Vector Intrinsics and OpenMP.
PDP
(2022)
Sergio Barrachina
,
Adrián Castelló
,
Manuel F. Dolz
,
Andrés E. Tomás
BestOf: an online implementation selector for the training and inference of deep neural networks.
J. Supercomput.
78 (16) (2022)