Login / Signup
Daichi Mukunoki
ORCID
Publication Activity (10 Years)
Years Active: 2010-2023
Publications (10 Years): 19
Top Topics
Sparse Matrix
High Performance Computing
Error Estimation
Block Size
Top Venues
MCSoC
PPAM (1)
CoRR
PARCO
</>
Publications
</>
Daichi Mukunoki
,
Masatoshi Kawai
,
Toshiyuki Imamura
Sparse Matrix-Vector Multiplication with Reduced-Precision Memory Accessor.
MCSoC
(2023)
Daichi Mukunoki
,
Katsuhisa Ozaki
,
Takeshi Ogita
,
Toshiyuki Imamura
Infinite-Precision Inner Product and Sparse Matrix-Vector Multiplication Using Ozaki Scheme with Dot2 on Manycore Processors.
PPAM (1)
(2022)
Daichi Mukunoki
,
Katsuhisa Ozaki
,
Takeshi Ogita
,
Toshiyuki Imamura
Accurate Matrix Multiplication on Binary128 Format Accelerated by Ozaki Scheme.
ICPP
(2021)
Daichi Mukunoki
,
Katsuhisa Ozaki
,
Takeshi Ogita
,
Roman Iakymchuk
Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki scheme.
HPC Asia
(2021)
Takeyuki Harayama
,
Shuhei Kudo
,
Daichi Mukunoki
,
Toshiyuki Imamura
,
Daisuke Takahashi
A Rapid Euclidean Norm Calculation Algorithm that Reduces Overflow and Underflow.
ICCSA (1)
(2021)
Daichi Mukunoki
,
Yusuke Hirota
,
Toshiyuki Imamura
Task Scheduling Strategies for Batched Basic Linear Algebra Subprograms on Many-core CPUs.
MCSoC
(2021)
Jens Domke
,
Emil Vatai
,
Aleksandr Drozd
,
Peng Chen
,
Yosuke Oyama
,
Lingqi Zhang
,
Shweta Salaria
,
Daichi Mukunoki
,
Artur Podobas
,
Mohamed Wahib
,
Satoshi Matsuoka
Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?
IPDPS
(2021)
Fabienne Jézéquel
,
Stef Graillat
,
Daichi Mukunoki
,
Toshiyuki Imamura
,
Roman Iakymchuk
Can We Avoid Rounding-Error Estimation in HPC Codes and Still Get Trustworthy Results?
VSTTE
(2020)
Jens Domke
,
Emil Vatai
,
Aleksandr Drozd
,
Peng Chen
,
Yosuke Oyama
,
Lingqi Zhang
,
Shweta Salaria
,
Daichi Mukunoki
,
Artur Podobas
,
Mohamed Wahib
,
Satoshi Matsuoka
Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?
CoRR
(2020)
Daichi Mukunoki
,
Takeshi Ogita
Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs.
J. Comput. Appl. Math.
372 (2020)
Daichi Mukunoki
,
Katsuhisa Ozaki
,
Takeshi Ogita
,
Toshiyuki Imamura
DGEMM Using Tensor Cores, and Its Accurate and Reproducible Versions.
ISC
(2020)
Roman Iakymchuk
,
Daichi Mukunoki
,
Artur Podobas
,
Fabienne Jézéquel
,
Toshiyuki Imamura
,
Norihisa Fujita
,
Jens Huthmann
,
Shuhei Kudo
,
Yiyu Tan
,
Jens Domke
,
Kai Torben Ohlhus
,
Takeshi Fukaya
,
Takeo Hoshi
,
Yuki Murakami
,
Maho Nakata
,
Takeshi Ogita
,
Kentaro Sano
,
Taisuke Boku
White Paper from Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic toward Minimal-Precision Computing.
CoRR
(2020)
Daichi Mukunoki
,
Takeshi Ogita
,
Katsuhisa Ozaki
Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme for Many-Core Architectures.
PPAM (1)
(2019)
Yiyu Tan
,
Toshiyuki Imamura
,
Daichi Mukunoki
Design of an FPGA-Based Matrix Multiplier with Task Parallelism.
PARCO
(2019)
Daichi Mukunoki
,
Toshiyuki Imamura
Performance Analysis of 2D-compatible 2.5D-PDGEMM on Knights Landing Cluster.
ICCS (3)
(2018)
Toshiyuki Imamura
,
Daichi Mukunoki
,
Yusuke Hirota
,
Susumu Yamada
,
Masahiko Machida
Design Towards Modern High Performance Numerical LA Library Enabling Heterogeneity and Flexible Data Formats.
PARCO
(2017)
Daichi Mukunoki
,
Toshiyuki Imamura
Implementation and Performance Analysis of 2.5D-PDGEMM on the K Computer.
PPAM (1)
(2017)
Daichi Mukunoki
,
Toshiyuki Imamura
Reduced-Precision Floating-Point Formats on GPUs for High Performance and Energy Efficient Computation.
CLUSTER
(2016)
Daichi Mukunoki
,
Toshiyuki Imamura
,
Daisuke Takahashi
Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs.
MCSoC
(2016)
Daichi Mukunoki
,
Toshiyuki Imamura
,
Daisuke Takahashi
Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs.
PDP
(2015)
Daichi Mukunoki
,
Daisuke Takahashi
Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs.
ICCSA (5)
(2013)
Daichi Mukunoki
,
Daisuke Takahashi
Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs.
PPAM (1)
(2013)
Daichi Mukunoki
,
Daisuke Takahashi
Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs.
IPDPS Workshops
(2012)
Daichi Mukunoki
,
Daisuke Takahashi
Implementation and Evaluation of Quadruple Precision BLAS Functions on GPUs.
PARA (1)
(2010)