Login / Signup
Atsushi Ando
ORCID
Publication Activity (10 Years)
Years Active: 2015-2024
Publications (10 Years): 46
Top Topics
Scene Segmentation
Speech Recognition
Language Model
Autoregressive
Top Venues
INTERSPEECH
CoRR
ICASSP
APSIPA
</>
Publications
</>
Kenichi Fujita
,
Atsushi Ando
,
Yusuke Ijima
Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis.
IEICE Trans. Inf. Syst.
107 (1) (2024)
Kenichi Fujita
,
Atsushi Ando
,
Yusuke Ijima
Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis.
CoRR
(2024)
Hiroshi Sato
,
Takafumi Moriya
,
Masato Mimura
,
Shota Horiguchi
,
Tsubasa Ochiai
,
Takanori Ashihara
,
Atsushi Ando
,
Kentaro Shinayama
,
Marc Delcroix
SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling.
CoRR
(2024)
Naohiro Tawara
,
Marc Delcroix
,
Atsushi Ando
,
Atsunori Ogawa
NTT Speaker Diarization System for Chime-7: Multi-Domain, Multi-Microphone end-to-end and Vector Clustering Diarization.
ICASSP
(2024)
Atsushi Ando
,
Takafumi Moriya
,
Shota Horiguchi
,
Ryo Masumura
Factor-Conditioned Speaking-Style Captioning.
CoRR
(2024)
Satoshi Suzuki
,
Taiga Yamane
,
Naoki Makishima
,
Keita Suzuki
,
Atsushi Ando
,
Ryo Masumura
OnDA-DETR: Online Domain Adaptation for Detection Transformers with Self-Training Framework.
ICIP
(2023)
Ryo Masumura
,
Naoki Makishima
,
Taiga Yamane
,
Yoshihiko Yamazaki
,
Saki Mizuno
,
Mana Ihori
,
Mihiro Uchida
,
Keita Suzuki
,
Hiroshi Sato
,
Tomohiro Tanaka
,
Akihiko Takashima
,
Satoshi Suzuki
,
Takafumi Moriya
,
Nobukatsu Hojo
,
Atsushi Ando
End-to-End Joint Target and Non-Target Speakers ASR.
CoRR
(2023)
Naoki Makishima
,
Keita Suzuki
,
Satoshi Suzuki
,
Atsushi Ando
,
Ryo Masumura
Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-level Timestamp Prediction.
INTERSPEECH
(2023)
Ryo Masumura
,
Naoki Makishima
,
Taiga Yamane
,
Yoshihiko Yamazaki
,
Saki Mizuno
,
Mana Ihori
,
Mihiro Uchida
,
Keita Suzuki
,
Hiroshi Sato
,
Tomohiro Tanaka
,
Akihiko Takashima
,
Satoshi Suzuki
,
Takafumi Moriya
,
Nobukatsu Hojo
,
Atsushi Ando
End-to-End Joint Target and Non-Target Speakers ASR.
INTERSPEECH
(2023)
Keita Suzuki
,
Satoshi Suzuki
,
Ryo Masumura
,
Atsushi Ando
,
Naoki Makishima
Multi-region CNN-Transformer for Micro-gesture Recognition in Face and Upper Body.
MMAsia
(2023)
Naohiro Tawara
,
Marc Delcroix
,
Atsushi Ando
,
Atsunori Ogawa
NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization.
CoRR
(2023)
Satoshi Suzuki
,
Shin'ya Yamaguchi
,
Shoichiro Takeda
,
Sekitoshi Kanai
,
Naoki Makishima
,
Atsushi Ando
,
Ryo Masumura
Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff.
CoRR
(2023)
Satoshi Suzuki
,
Shin'ya Yamaguchi
,
Shoichiro Takeda
,
Sekitoshi Kanai
,
Naoki Makishima
,
Atsushi Ando
,
Ryo Masumura
Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff.
ICCV
(2023)
Atsushi Ando
,
Ryo Masumura
,
Akihiko Takashima
,
Satoshi Suzuki
,
Naoki Makishima
,
Keita Suzuki
,
Takafumi Moriya
,
Takanori Ashihara
,
Hiroshi Sato
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis.
SLT
(2022)
Satoshi Suzuki
,
Shoichiro Takeda
,
Naoki Makishima
,
Atsushi Ando
,
Ryo Masumura
,
Hayaru Shouno
Knowledge Transferred Fine-Tuning: Convolutional Neural Network Is Born Again With Anti-Aliasing Even in Data-Limited Situations.
IEEE Access
10 (2022)
Takafumi Moriya
,
Takanori Ashihara
,
Atsushi Ando
,
Hiroshi Sato
,
Tomohiro Tanaka
,
Kohei Matsuura
,
Ryo Masumura
,
Marc Delcroix
,
Takahiro Shinozaki
Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration.
ICASSP
(2022)
Naoki Makishima
,
Satoshi Suzuki
,
Atsushi Ando
,
Ryo Masumura
Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data.
INTERSPEECH
(2022)
Atsushi Ando
,
Ryo Masumura
,
Akihiko Takashima
,
Satoshi Suzuki
,
Naoki Makishima
,
Keita Suzuki
,
Takafumi Moriya
,
Takanori Ashihara
,
Hiroshi Sato
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis.
CoRR
(2022)
Atsushi Ando
,
Yumiko Murata
,
Ryo Masumura
,
Satoshi Suzuki
,
Naoki Makishima
,
Takafumi Moriya
,
Takanori Ashihara
,
Hiroshi Sato
Customer Satisfaction Estimation Using Unsupervised Representation Learning with Multi-Format Prediction Loss.
ICASSP
(2022)
Naoki Makishima
,
Satoshi Suzuki
,
Atsushi Ando
,
Ryo Masumura
Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data.
CoRR
(2022)
Akihiko Takashima
,
Ryo Masumura
,
Atsushi Ando
,
Yoshihiro Yamazaki
,
Mihiro Uchida
,
Shota Orihashi
Interactive Co-Learning with Cross-Modal Transformer for Audio-Visual Emotion Recognition.
INTERSPEECH
(2022)
Ryo Masumura
,
Yoshihiro Yamazaki
,
Saki Mizuno
,
Naoki Makishima
,
Mana Ihori
,
Mihiro Uchida
,
Hiroshi Sato
,
Tomohiro Tanaka
,
Akihiko Takashima
,
Satoshi Suzuki
,
Shota Orihashi
,
Takafumi Moriya
,
Nobukatsu Hojo
,
Atsushi Ando
End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training.
INTERSPEECH
(2022)
Takafumi Moriya
,
Tomohiro Tanaka
,
Takanori Ashihara
,
Tsubasa Ochiai
,
Hiroshi Sato
,
Atsushi Ando
,
Ryo Masumura
,
Marc Delcroix
,
Taichi Asami
Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture.
Interspeech
(2021)
Takafumi Moriya
,
Takanori Ashihara
,
Tomohiro Tanaka
,
Tsubasa Ochiai
,
Hiroshi Sato
,
Atsushi Ando
,
Yusuke Ijima
,
Ryo Masumura
,
Yusuke Shinohara
Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition.
ICASSP
(2021)
Kenichi Fujita
,
Atsushi Ando
,
Yusuke Ijima
Phoneme Duration Modeling Using Speech Rhythm-Based Speaker Embeddings for Multi-Speaker Speech Synthesis.
Interspeech
(2021)
Atsushi Ando
,
Ryo Masumura
,
Hiroshi Sato
,
Takafumi Moriya
,
Takanori Ashihara
,
Yusuke Ijima
,
Tomoki Toda
Speech Emotion Recognition Based on Listener Adaptive Models.
ICASSP
(2021)
Ryo Masumura
,
Mana Ihori
,
Akihiko Takashima
,
Takafumi Moriya
,
Atsushi Ando
,
Yusuke Shinohara
Sequence-Level Consistency Training for Semi-Supervised End-to-End Automatic Speech Recognition.
ICASSP
(2020)
Yuki Kitagishi
,
Hosana Kamiyama
,
Atsushi Ando
,
Naohiro Tawara
,
Takeshi Mori
,
Satoshi Kobashikawa
Speaker Age Estimation Using Age-Dependent Insensitive Loss.
APSIPA
(2020)
Atsushi Ando
,
Ryo Masumura
,
Hosana Kamiyama
,
Satoshi Kobashikawa
,
Yushi Aono
,
Tomoki Toda
Customer Satisfaction Estimation in Contact Center Calls Based on a Hierarchical Multi-Task Model.
IEEE ACM Trans. Audio Speech Lang. Process.
28 (2020)
Atsushi Ando
,
Ryo Masumura
,
Hosana Kamiyama
,
Satoshi Kobashikawa
,
Yushi Aono
Speech Emotion Recognition Based on Multi-Label Emotion Existence Model.
INTERSPEECH
(2019)
Hosana Kamiyama
,
Atsushi Ando
,
Ryo Masumura
,
Satoshi Kobashikawa
,
Yushi Aono
Likability Estimation of Call-center Agents by Suppressing Annotator Variability.
APSIPA
(2019)
Ryo Masumura
,
Mana Ihori
,
Tomohiro Tanaka
,
Atsushi Ando
,
Ryo Ishii
,
Takanobu Oba
,
Ryuichiro Higashinaka
Improving Speech-Based End-of-Turn Detection Via Cross-Modal Representation Learning with Punctuated Text Data.
ASRU
(2019)
Ryo Masumura
,
Tomohiro Tanaka
,
Atsushi Ando
,
Hosana Kamiyama
,
Takanobu Oba
,
Satoshi Kobashikawa
,
Yushi Aono
Improving Conversation-Context Language Models with Multiple Spoken Language Understanding Models.
INTERSPEECH
(2019)
Hosana Kamiyama
,
Atsushi Ando
,
Ryo Masumura
,
Satoshi Kobashikawa
,
Yushi Aono
Urgent Voicemail Detection Focused on Long-term Temporal Variation.
APSIPA
(2019)
Yi Zhao
,
Atsushi Ando
,
Shinji Takaki
,
Junichi Yamagishi
,
Satoshi Kobashikawa
Does the Lombard Effect Improve Emotional Communication in Noise? - Analysis of Emotional Speech Acted in Noise.
INTERSPEECH
(2019)
Yi Zhao
,
Atsushi Ando
,
Shinji Takaki
,
Junichi Yamagishi
,
Satoshi Kobashikawa
Does the Lombard Effect Improve Emotional Communication in Noise? - Analysis of Emotional Speech Acted in Noise -.
CoRR
(2019)
Ryo Masumura
,
Setsuo Yamada
,
Tomohiro Tanaka
,
Atsushi Ando
,
Hosana Kamiyama
,
Yushi Aono
Online Call Scene Segmentation of Contact Center Dialogues based on Role Aware Hierarchical LSTM-RNNs.
APSIPA
(2018)
Atsushi Ando
,
Satoshi Kobashikawa
,
Hosana Kamiyama
,
Ryo Masumura
,
Yusuke Ijima
,
Yushi Aono
Soft-Target Training with Ambiguous Emotional Utterances for DNN-Based Speech Emotion Classification.
ICASSP
(2018)
Ryo Masumura
,
Tomohiro Tanaka
,
Atsushi Ando
,
Ryo Ishii
,
Ryuichiro Higashinaka
,
Yushi Aono
Neural Dialogue Context Online End-of-Turn Detection.
SIGDIAL Conference
(2018)
Ryo Masumura
,
Tomohiro Tanaka
,
Atsushi Ando
,
Hirokazu Masataki
,
Yushi Aono
Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder.
INTERSPEECH
(2018)
Atsushi Ando
,
Reine Asakawa
,
Ryo Masumura
,
Hosana Kamiyama
,
Satoshi Kobashikawa
,
Yushi Aono
Automatic Question Detection from Acoustic and Phonetic Features Using Feature-wise Pre-training.
INTERSPEECH
(2018)
Atsushi Ando
,
Ryo Masumura
,
Hosana Kamiyama
,
Satoshi Kobashikawa
,
Yushi Aono
Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center Calls.
INTERSPEECH
(2017)
Hosana Kamiyama
,
Atsushi Ando
,
Satoshi Kobashikawa
,
Yushi Aono
Robust children and adults speech identification and confidence measure based on DNN posteriorgram.
APSIPA
(2017)
Ruo Zhang
,
Atsushi Ando
,
Satoshi Kobashikawa
,
Yushi Aono
Interaction and Transition Model for Speech Emotion Recognition in Dialogue.
INTERSPEECH
(2017)
Atsushi Ando
,
Taichi Asami
,
Yoshikazu Yamaguchi
,
Yushi Aono
Speaker recognition in duration-mismatched condition using bootstrapped i-vectors.
APSIPA
(2016)
Atsushi Ando
,
Taichi Asami
,
Manabu Okamoto
,
Hirokazu Masataki
,
Sumitaka Sakauchi
Agreement and disagreement utterance detection in conversational speech by extracting and integrating local features.
INTERSPEECH
(2015)