Login / Signup
Naoki Makishima
ORCID
Publication Activity (10 Years)
Years Active: 2019-2023
Publications (10 Years): 46
Top Topics
Customer Satisfaction
Voice Activity Detection
Autoregressive
Speech Recognition
Top Venues
CoRR
INTERSPEECH
Interspeech
ICASSP
</>
Publications
</>
Satoshi Suzuki
,
Taiga Yamane
,
Naoki Makishima
,
Keita Suzuki
,
Atsushi Ando
,
Ryo Masumura
OnDA-DETR: Online Domain Adaptation for Detection Transformers with Self-Training Framework.
ICIP
(2023)
Ryo Masumura
,
Naoki Makishima
,
Taiga Yamane
,
Yoshihiko Yamazaki
,
Saki Mizuno
,
Mana Ihori
,
Mihiro Uchida
,
Keita Suzuki
,
Hiroshi Sato
,
Tomohiro Tanaka
,
Akihiko Takashima
,
Satoshi Suzuki
,
Takafumi Moriya
,
Nobukatsu Hojo
,
Atsushi Ando
End-to-End Joint Target and Non-Target Speakers ASR.
CoRR
(2023)
Naoki Makishima
,
Keita Suzuki
,
Satoshi Suzuki
,
Atsushi Ando
,
Ryo Masumura
Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-level Timestamp Prediction.
INTERSPEECH
(2023)
Ryo Masumura
,
Naoki Makishima
,
Taiga Yamane
,
Yoshihiko Yamazaki
,
Saki Mizuno
,
Mana Ihori
,
Mihiro Uchida
,
Keita Suzuki
,
Hiroshi Sato
,
Tomohiro Tanaka
,
Akihiko Takashima
,
Satoshi Suzuki
,
Takafumi Moriya
,
Nobukatsu Hojo
,
Atsushi Ando
End-to-End Joint Target and Non-Target Speakers ASR.
INTERSPEECH
(2023)
Ryo Masumura
,
Naoki Makishima
,
Mana Ihori
,
Akihiko Takashima
,
Tomohiro Tanaka
,
Shota Orihashi
Text-to-Text Pre-Training with Paraphrasing for Improving Transformer-Based Image Captioning.
EUSIPCO
(2023)
Keita Suzuki
,
Satoshi Suzuki
,
Ryo Masumura
,
Atsushi Ando
,
Naoki Makishima
Multi-region CNN-Transformer for Micro-gesture Recognition in Face and Upper Body.
MMAsia
(2023)
Satoshi Suzuki
,
Shin'ya Yamaguchi
,
Shoichiro Takeda
,
Sekitoshi Kanai
,
Naoki Makishima
,
Atsushi Ando
,
Ryo Masumura
Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff.
CoRR
(2023)
Satoshi Suzuki
,
Shin'ya Yamaguchi
,
Shoichiro Takeda
,
Sekitoshi Kanai
,
Naoki Makishima
,
Atsushi Ando
,
Ryo Masumura
Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff.
ICCV
(2023)
Atsushi Ando
,
Ryo Masumura
,
Akihiko Takashima
,
Satoshi Suzuki
,
Naoki Makishima
,
Keita Suzuki
,
Takafumi Moriya
,
Takanori Ashihara
,
Hiroshi Sato
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis.
SLT
(2022)
Satoshi Suzuki
,
Shoichiro Takeda
,
Naoki Makishima
,
Atsushi Ando
,
Ryo Masumura
,
Hayaru Shouno
Knowledge Transferred Fine-Tuning: Convolutional Neural Network Is Born Again With Anti-Aliasing Even in Data-Limited Situations.
IEEE Access
10 (2022)
Hiroshi Sato
,
Tsubasa Ochiai
,
Marc Delcroix
,
Keisuke Kinoshita
,
Takafumi Moriya
,
Naoki Makishima
,
Mana Ihori
,
Tomohiro Tanaka
,
Ryo Masumura
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.
CoRR
(2022)
Naoki Makishima
,
Satoshi Suzuki
,
Atsushi Ando
,
Ryo Masumura
Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data.
INTERSPEECH
(2022)
Atsushi Ando
,
Ryo Masumura
,
Akihiko Takashima
,
Satoshi Suzuki
,
Naoki Makishima
,
Keita Suzuki
,
Takafumi Moriya
,
Takanori Ashihara
,
Hiroshi Sato
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis.
CoRR
(2022)
Atsushi Ando
,
Yumiko Murata
,
Ryo Masumura
,
Satoshi Suzuki
,
Naoki Makishima
,
Takafumi Moriya
,
Takanori Ashihara
,
Hiroshi Sato
Customer Satisfaction Estimation Using Unsupervised Representation Learning with Multi-Format Prediction Loss.
ICASSP
(2022)
Naoki Makishima
,
Satoshi Suzuki
,
Atsushi Ando
,
Ryo Masumura
Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data.
CoRR
(2022)
Ryo Masumura
,
Yoshihiro Yamazaki
,
Saki Mizuno
,
Naoki Makishima
,
Mana Ihori
,
Mihiro Uchida
,
Hiroshi Sato
,
Tomohiro Tanaka
,
Akihiko Takashima
,
Satoshi Suzuki
,
Shota Orihashi
,
Takafumi Moriya
,
Nobukatsu Hojo
,
Atsushi Ando
End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.
INTERSPEECH
(2022)
Hiroshi Sato
,
Tsubasa Ochiai
,
Marc Delcroix
,
Keisuke Kinoshita
,
Takafumi Moriya
,
Naoki Makishima
,
Mana Ihori
,
Tomohiro Tanaka
,
Ryo Masumura
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations.
INTERSPEECH
(2022)
Naoki Makishima
,
Mana Ihori
,
Tomohiro Tanaka
,
Akihiko Takashima
,
Shota Orihashi
,
Ryo Masumura
Enrollment-Less Training for Personalized Voice Activity Detection.
Interspeech
(2021)
Shota Orihashi
,
Yoshihiro Yamazaki
,
Naoki Makishima
,
Mana Ihori
,
Akihiko Takashima
,
Tomohiro Tanaka
,
Ryo Masumura
Hierarchical Knowledge Distillation for Dialogue Sequence Labeling.
ASRU
(2021)
Mana Ihori
,
Naoki Makishima
,
Tomohiro Tanaka
,
Akihiko Takashima
,
Shota Orihashi
,
Ryo Masumura
MAPGN: MAsked Pointer-Generator Network for sequence-to-sequence pre-training.
CoRR
(2021)
Naoki Makishima
,
Mana Ihori
,
Akihiko Takashima
,
Tomohiro Tanaka
,
Shota Orihashi
,
Ryo Masumura
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss.
ICASSP
(2021)
Tomohiro Tanaka
,
Ryo Masumura
,
Mana Ihori
,
Akihiko Takashima
,
Takafumi Moriya
,
Takanori Ashihara
,
Shota Orihashi
,
Naoki Makishima
Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition.
Interspeech
(2021)
Ryo Masumura
,
Daiki Okamura
,
Naoki Makishima
,
Mana Ihori
,
Akihiko Takashima
,
Tomohiro Tanaka
,
Shota Orihashi
Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation.
CoRR
(2021)
Mana Ihori
,
Naoki Makishima
,
Tomohiro Tanaka
,
Akihiko Takashima
,
Shota Orihashi
,
Ryo Masumura
Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks Using Switching Tokens.
Interspeech
(2021)
Shota Orihashi
,
Yoshihiro Yamazaki
,
Naoki Makishima
,
Mana Ihori
,
Akihiko Takashima
,
Tomohiro Tanaka
,
Ryo Masumura
Hierarchical Knowledge Distillation for Dialogue Sequence Labeling.
CoRR
(2021)
Naoki Makishima
,
Yoshiki Mitsui
,
Norihiro Takamune
,
Daichi Kitamura
,
Hiroshi Saruwatari
,
Yu Takahashi
,
Kazunobu Kondo
Independent deeply learned matrix analysis with automatic selection of stable microphone-wise update and fast sourcewise update of demixing matrix.
Signal Process.
178 (2021)
Ryo Masumura
,
Naoki Makishima
,
Mana Ihori
,
Akihiko Takashima
,
Tomohiro Tanaka
,
Shota Orihashi
Large-Context Conversational Representation Learning: Self-Supervised Learning For Conversational Documents.
SLT
(2021)
Ryo Masumura
,
Naoki Makishima
,
Mana Ihori
,
Akihiko Takashima
,
Tomohiro Tanaka
,
Shota Orihashi
Large-Context Conversational Representation Learning: Self-Supervised Learning for Conversational Documents.
CoRR
(2021)
Shota Orihashi
,
Yoshihiro Yamazaki
,
Naoki Makishima
,
Mana Ihori
,
Akihiko Takashima
,
Tomohiro Tanaka
,
Ryo Masumura
Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages.
MMAsia
(2021)
Shota Orihashi
,
Yoshihiro Yamazaki
,
Naoki Makishima
,
Mana Ihori
,
Akihiko Takashima
,
Tomohiro Tanaka
,
Ryo Masumura
Utilizing Resource-Rich Language Datasets for End-to-End Scene Text Recognition in Resource-Poor Languages.
CoRR
(2021)
Mana Ihori
,
Naoki Makishima
,
Tomohiro Tanaka
,
Akihiko Takashima
,
Shota Orihashi
,
Ryo Masumura
MAPGN: Masked Pointer-Generator Network for Sequence-to-Sequence Pre-Training.
ICASSP
(2021)
Naoki Makishima
,
Mana Ihori
,
Akihiko Takashima
,
Tomohiro Tanaka
,
Shota Orihashi
,
Ryo Masumura
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss.
CoRR
(2021)
Ryo Masumura
,
Naoki Makishima
,
Mana Ihori
,
Akihiko Takashima
,
Tomohiro Tanaka
,
Shota Orihashi
Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation.
ICASSP
(2021)
Tomohiro Tanaka
,
Ryo Masumura
,
Mana Ihori
,
Akihiko Takashima
,
Takafumi Moriya
,
Takanori Ashihara
,
Shota Orihashi
,
Naoki Makishima
Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition.
CoRR
(2021)
Mana Ihori
,
Naoki Makishima
,
Tomohiro Tanaka
,
Akihiko Takashima
,
Shota Orihashi
,
Ryo Masumura
Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks using Switching Tokens.
CoRR
(2021)
Ryo Masumura
,
Daiki Okamura
,
Naoki Makishima
,
Mana Ihori
,
Akihiko Takashima
,
Tomohiro Tanaka
,
Shota Orihashi
Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation.
Interspeech
(2021)
Tomohiro Tanaka
,
Ryo Masumura
,
Mana Ihori
,
Akihiko Takashima
,
Shota Orihashi
,
Naoki Makishima
End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning.
Interspeech
(2021)
Ryo Masumura
,
Naoki Makishima
,
Mana Ihori
,
Akihiko Takashima
,
Tomohiro Tanaka
,
Shota Orihashi
Hierarchical Transformer-based Large-Context End-to-end ASR with Large-Context Knowledge Distillation.
CoRR
(2021)
Tomohiro Tanaka
,
Ryo Masumura
,
Mana Ihori
,
Akihiko Takashima
,
Shota Orihashi
,
Naoki Makishima
End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning.
CoRR
(2021)
Naoki Makishima
,
Mana Ihori
,
Tomohiro Tanaka
,
Akihiko Takashima
,
Shota Orihashi
,
Ryo Masumura
Enrollment-less training for personalized voice activity detection.
CoRR
(2021)
Akihiko Takashima
,
Naoki Makishima
,
Mana Ihori
,
Tomohiro Tanaka
,
Shota Orihashi
,
Ryo Masumura
Unsupervised Domain Adversarial Training in Angular Space for Facial Expression Recognition.
APSIPA
(2020)
Mana Ihori
,
Ryo Masumura
,
Naoki Makishima
,
Tomohiro Tanaka
,
Akihiko Takashima
,
Shota Orihashi
Memory Attentive Fusion: External Language Model Integration for Transformer-based Sequence-to-Sequence Model.
CoRR
(2020)
Mana Ihori
,
Ryo Masumura
,
Naoki Makishima
,
Tomohiro Tanaka
,
Akihiko Takashima
,
Shota Orihashi
Memory Attentive Fusion: External Language Model Integration for Transformer-based Sequence-to-Sequence Model.
INLG
(2020)
Ryo Masumura
,
Naoki Makishima
,
Mana Ihori
,
Akihiko Takashima
,
Tomohiro Tanaka
,
Shota Orihashi
Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition.
INTERSPEECH
(2020)
Naoki Makishima
,
Norihiro Takamune
,
Hiroshi Saruwatari
,
Daichi Kitamura
,
Yu Takahashi
,
Kazunobu Kondo
Robust Demixing Filter Update Algorithm Based on Microphone-wise Coordinate Descent for Independent Deeply Learned Matrix Analysis.
APSIPA
(2019)
Naoki Makishima
,
Shinichi Mogami
,
Norihiro Takamune
,
Daichi Kitamura
,
Hayato Sumino
,
Shinnosuke Takamichi
,
Hiroshi Saruwatari
,
Nobutaka Ono
Independent Deeply Learned Matrix Analysis for Determined Audio Source Separation.
IEEE ACM Trans. Audio Speech Lang. Process.
27 (10) (2019)