Login / Signup
Brian Yan
Publication Activity (10 Years)
Years Active: 2021-2024
Publications (10 Years): 75
Top Topics
Bayes Risk
Speech Recognition
Spoken Language
Autoregressive
Top Venues
CoRR
ICASSP
INTERSPEECH
ASRU
</>
Publications
</>
Yui Sudo
,
Muhammad Shakeel
,
Yosuke Fukumoto
,
Brian Yan
,
Jiatong Shi
,
Yifan Peng
,
Shinji Watanabe
4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders.
CoRR
(2024)
Yifan Peng
,
Jinchuan Tian
,
William Chen
,
Siddhant Arora
,
Brian Yan
,
Yui Sudo
,
Muhammad Shakeel
,
Kwanghee Choi
,
Jiatong Shi
,
Xuankai Chang
,
Jee-weon Jung
,
Shinji Watanabe
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer.
CoRR
(2024)
Amir Hussein
,
Dorsa Zeinali
,
Ondrej Klejch
,
Matthew Wiesner
,
Brian Yan
,
Shammur Absar Chowdhury
,
Ahmed Ali
,
Shinji Watanabe
,
Sanjeev Khudanpur
Speech Collage: Code-Switched Audio Generation by Collaging Monolingual Corpora.
ICASSP
(2024)
Brian Yan
,
Xuankai Chang
,
Antonios Anastasopoulos
,
Yuya Fujita
,
Shinji Watanabe
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing.
ICASSP
(2024)
Xuankai Chang
,
Brian Yan
,
Kwanghee Choi
,
Jee-Weon Jung
,
Yichen Lu
,
Soumi Maiti
,
Roshan S. Sharma
,
Jiatong Shi
,
Jinchuan Tian
,
Shinji Watanabe
,
Yuya Fujita
,
Takashi Maekaku
,
Pengcheng Guo
,
Yao-Fei Cheng
,
Pavel Denisov
,
Kohei Saijo
,
Hsiu-Hsuan Wang
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.
ICASSP
(2024)
Amir Hussein
,
Brian Yan
,
Antonios Anastasopoulos
,
Shinji Watanabe
,
Sanjeev Khudanpur
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization.
ICASSP
(2024)
Yosuke Kashiwagi
,
Siddhant Arora
,
Hayato Futami
,
Jessica Huynh
,
Shih-Lun Wu
,
Yifan Peng
,
Brian Yan
,
Emiru Tsunoo
,
Shinji Watanabe
Tensor decomposition for minimization of E2E SLU model toward on-device processing.
INTERSPEECH
(2023)
Puyuan Peng
,
Brian Yan
,
Shinji Watanabe
,
David Harwath
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization.
INTERSPEECH
(2023)
William Chen
,
Brian Yan
,
Jiatong Shi
,
Yifan Peng
,
Soumi Maiti
,
Shinji Watanabe
Improving Massively Multilingual ASR with Auxiliary CTC Objectives.
ICASSP
(2023)
Brian Yan
,
Matthew Wiesner
,
Ondrej Klejch
,
Preethi Jyothi
,
Shinji Watanabe
Towards Zero-Shot Code-Switched Speech Recognition.
ICASSP
(2023)
Yen-Ju Lu
,
Xuankai Chang
,
Chenda Li
,
Wangyou Zhang
,
Samuele Cornell
,
Zhaoheng Ni
,
Yoshiki Masuyama
,
Brian Yan
,
Robin Scheibler
,
Zhong-Qiu Wang
,
Yu Tsao
,
Yanmin Qian
,
Shinji Watanabe
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing.
J. Open Source Softw.
8 (91) (2023)
Hayato Futami
,
Jessica Huynh
,
Siddhant Arora
,
Shih-Lun Wu
,
Yosuke Kashiwagi
,
Yifan Peng
,
Brian Yan
,
Emiru Tsunoo
,
Shinji Watanabe
The Pipeline System of ASR and NLU with MLM-based Data Augmentation toward STOP Low-resource Challenge.
CoRR
(2023)
Xuankai Chang
,
Brian Yan
,
Yuya Fujita
,
Takashi Maekaku
,
Shinji Watanabe
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning.
INTERSPEECH
(2023)
Brian Yan
,
Siddharth Dalmia
,
Yosuke Higuchi
,
Graham Neubig
,
Florian Metze
,
Alan W. Black
,
Shinji Watanabe
CTC Alignments Improve Autoregressive Translation.
EACL
(2023)
Siddhant Arora
,
Hayato Futami
,
Yosuke Kashiwagi
,
Emiru Tsunoo
,
Brian Yan
,
Shinji Watanabe
Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding.
INTERSPEECH
(2023)
Yifan Peng
,
Jinchuan Tian
,
Brian Yan
,
Dan Berrebbi
,
Xuankai Chang
,
Xinjian Li
,
Jiatong Shi
,
Siddhant Arora
,
William Chen
,
Roshan S. Sharma
,
Wangyou Zhang
,
Yui Sudo
,
Muhammad Shakeel
,
Jee-weon Jung
,
Soumi Maiti
,
Shinji Watanabe
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data.
CoRR
(2023)
Jinchuan Tian
,
Jianwei Yu
,
Hangting Chen
,
Brian Yan
,
Chao Weng
,
Dong Yu
,
Shinji Watanabe
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction.
INTERSPEECH
(2023)
Yifan Peng
,
Kwangyoun Kim
,
Felix Wu
,
Brian Yan
,
Siddhant Arora
,
William Chen
,
Jiyang Tang
,
Suwon Shon
,
Prashant Sridhar
,
Shinji Watanabe
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks.
CoRR
(2023)
William Chen
,
Jiatong Shi
,
Brian Yan
,
Dan Berrebbi
,
Wangyou Zhang
,
Yifan Peng
,
Xuankai Chang
,
Soumi Maiti
,
Shinji Watanabe
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning.
CoRR
(2023)
Siddhant Arora
,
Hayato Futami
,
Shih-Lun Wu
,
Jessica Huynh
,
Yifan Peng
,
Yosuke Kashiwagi
,
Emiru Tsunoo
,
Brian Yan
,
Shinji Watanabe
A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge.
CoRR
(2023)
Xuankai Chang
,
Brian Yan
,
Kwanghee Choi
,
Jee-Weon Jung
,
Yichen Lu
,
Soumi Maiti
,
Roshan S. Sharma
,
Jiatong Shi
,
Jinchuan Tian
,
Shinji Watanabe
,
Yuya Fujita
,
Takashi Maekaku
,
Pengcheng Guo
,
Yao-Fei Cheng
,
Pavel Denisov
,
Kohei Saijo
,
Hsiu-Hsuan Wang
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.
CoRR
(2023)
Hayato Futami
,
Jessica Huynh
,
Siddhant Arora
,
Shih-Lun Wu
,
Yosuke Kashiwagi
,
Yifan Peng
,
Brian Yan
,
Emiru Tsunoo
,
Shinji Watanabe
The Pipeline System of ASR and NLU with MLM-based data Augmentation Toward Stop Low-Resource Challenge.
ICASSP
(2023)
Siddhant Arora
,
Hayato Futami
,
Emiru Tsunoo
,
Brian Yan
,
Shinji Watanabe
Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History.
CoRR
(2023)
Siddhant Arora
,
Hayato Futami
,
Yosuke Kashiwagi
,
Emiru Tsunoo
,
Brian Yan
,
Shinji Watanabe
Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding.
CoRR
(2023)
Puyuan Peng
,
Brian Yan
,
Shinji Watanabe
,
David Harwath
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization.
CoRR
(2023)
William Chen
,
Brian Yan
,
Jiatong Shi
,
Yifan Peng
,
Soumi Maiti
,
Shinji Watanabe
Improving Massively Multilingual ASR With Auxiliary CTC Objectives.
CoRR
(2023)
Dan Berrebbi
,
Brian Yan
,
Shinji Watanabe
Avoid Overthinking in Self-Supervised Models for Speech Recognition.
ICASSP
(2023)
Yosuke Kashiwagi
,
Siddhant Arora
,
Hayato Futami
,
Jessica Huynh
,
Shih-Lun Wu
,
Yifan Peng
,
Brian Yan
,
Emiru Tsunoo
,
Shinji Watanabe
E-Branchformer-Based E2E SLU Toward Stop on-Device Challenge.
ICASSP
(2023)
Yui Sudo
,
Muhammad Shakeel
,
Brian Yan
,
Jiatong Shi
,
Shinji Watanabe
4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders.
INTERSPEECH
(2023)
William Chen
,
Jiatong Shi
,
Brian Yan
,
Dan Berrebbi
,
Wangyou Zhang
,
Yifan Peng
,
Xuankai Chang
,
Soumi Maiti
,
Shinji Watanabe
Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.
ASRU
(2023)
Xuankai Chang
,
Brian Yan
,
Yuya Fujita
,
Takashi Maekaku
,
Shinji Watanabe
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning.
CoRR
(2023)
Siddhant Arora
,
Hayato Futami
,
Emiru Tsunoo
,
Brian Yan
,
Shinji Watanabe
Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History.
ICASSP
(2023)
Yifan Peng
,
Kwangyoun Kim
,
Felix Wu
,
Brian Yan
,
Siddhant Arora
,
William Chen
,
Jiyang Tang
,
Suwon Shon
,
Prashant Sridhar
,
Shinji Watanabe
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks.
INTERSPEECH
(2023)
Brian Yan
,
Jiatong Shi
,
Soumi Maiti
,
William Chen
,
Xinjian Li
,
Yifan Peng
,
Siddhant Arora
,
Shinji Watanabe
CMU's IWSLT 2023 Simultaneous Speech Translation System.
IWSLT@ACL
(2023)
Amir Hussein
,
Brian Yan
,
Antonios Anastasopoulos
,
Shinji Watanabe
,
Sanjeev Khudanpur
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization.
CoRR
(2023)
Amir Hussein
,
Dorsa Zeinali
,
Ondrej Klejch
,
Matthew Wiesner
,
Brian Yan
,
Shammur Absar Chowdhury
,
Ahmed M. Ali
,
Shinji Watanabe
,
Sanjeev Khudanpur
Speech collage: code-switched audio generation by collaging monolingual corpora.
CoRR
(2023)
Jinchuan Tian
,
Brian Yan
,
Jianwei Yu
,
Chao Weng
,
Dong Yu
,
Shinji Watanabe
Bayes Risk CTC: Controllable CTC Alignment in Sequence-to-Sequence Tasks.
ICLR
(2023)
Jinchuan Tian
,
Jianwei Yu
,
Hangting Chen
,
Brian Yan
,
Chao Weng
,
Dong Yu
,
Shinji Watanabe
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction.
CoRR
(2023)
Siddhant Arora
,
Hayato Futami
,
Shih-Lun Wu
,
Jessica Huynh
,
Yifan Peng
,
Yosuke Kashiwagi
,
Emiru Tsunoo
,
Brian Yan
,
Shinji Watanabe
A Study on the Integration of Pipeline and E2E SLU Systems for Spoken Semantic Parsing Toward Stop Quality Challenge.
ICASSP
(2023)
Brian Yan
,
Jiatong Shi
,
Yun Tang
,
Hirofumi Inaguma
,
Yifan Peng
,
Siddharth Dalmia
,
Peter Polak
,
Patrick Fernandes
,
Dan Berrebbi
,
Tomoki Hayashi
,
Xiaohui Zhang
,
Zhaoheng Ni
,
Moto Hira
,
Soumi Maiti
,
Juan Pino
,
Shinji Watanabe
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit.
ACL (demo)
(2023)
Motoi Omachi
,
Brian Yan
,
Siddharth Dalmia
,
Yuya Fujita
,
Shinji Watanabe
Align, Write, Re-Order: Explainable End-to-End Speech Translation via Operation Sequence Generation.
ICASSP
(2023)
Peter Polák
,
Brian Yan
,
Shinji Watanabe
,
Alex Waibel
,
Ondrej Bojar
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff.
INTERSPEECH
(2023)
Peter Polák
,
Brian Yan
,
Shinji Watanabe
,
Alex Waibel
,
Ondrej Bojar
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff.
CoRR
(2023)
Brian Yan
,
Jiatong Shi
,
Yun Tang
,
Hirofumi Inaguma
,
Yifan Peng
,
Siddharth Dalmia
,
Peter Polák
,
Patrick Fernandes
,
Dan Berrebbi
,
Tomoki Hayashi
,
Xiaohui Zhang
,
Zhaoheng Ni
,
Moto Hira
,
Soumi Maiti
,
Juan Pino
,
Shinji Watanabe
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit.
CoRR
(2023)
Yifan Peng
,
Jinchuan Tian
,
Brian Yan
,
Dan Berrebbi
,
Xuankai Chang
,
Xinjian Li
,
Jiatong Shi
,
Siddhant Arora
,
William Chen
,
Roshan S. Sharma
,
Wangyou Zhang
,
Yui Sudo
,
Muhammad Shakeel
,
Jee-Weon Jung
,
Soumi Maiti
,
Shinji Watanabe
Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.
ASRU
(2023)
Brian Yan
,
Xuankai Chang
,
Antonios Anastasopoulos
,
Yuya Fujita
,
Shinji Watanabe
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing.
CoRR
(2023)
Siddhant Arora
,
Siddharth Dalmia
,
Xuankai Chang
,
Brian Yan
,
Alan W. Black
,
Shinji Watanabe
Two-Pass Low Latency End-to-End Spoken Language Understanding.
INTERSPEECH
(2022)
Brian Yan
,
Patrick Fernandes
,
Siddharth Dalmia
,
Jiatong Shi
,
Yifan Peng
,
Dan Berrebbi
,
Xinyi Wang
,
Graham Neubig
,
Shinji Watanabe
CMU's IWSLT 2022 Dialect Speech Translation System.
IWSLT@ACL
(2022)
Yosuke Higuchi
,
Brian Yan
,
Siddhant Arora
,
Tetsuji Ogawa
,
Tetsunori Kobayashi
,
Shinji Watanabe
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model.
CoRR
(2022)
Motoi Omachi
,
Brian Yan
,
Siddharth Dalmia
,
Yuya Fujita
,
Shinji Watanabe
Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation.
CoRR
(2022)
Yui Sudo
,
Muhammad Shakeel
,
Brian Yan
,
Jiatong Shi
,
Shinji Watanabe
4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders.
CoRR
(2022)
Brian Yan
,
Matthew Wiesner
,
Ondrej Klejch
,
Preethi Jyothi
,
Shinji Watanabe
Towards Zero-Shot Code-Switched Speech Recognition.
CoRR
(2022)
Yen-Ju Lu
,
Xuankai Chang
,
Chenda Li
,
Wangyou Zhang
,
Samuele Cornell
,
Zhaoheng Ni
,
Yoshiki Masuyama
,
Brian Yan
,
Robin Scheibler
,
Zhong-Qiu Wang
,
Yu Tsao
,
Yanmin Qian
,
Shinji Watanabe
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.
CoRR
(2022)
Yen-Ju Lu
,
Xuankai Chang
,
Chenda Li
,
Wangyou Zhang
,
Samuele Cornell
,
Zhaoheng Ni
,
Yoshiki Masuyama
,
Brian Yan
,
Robin Scheibler
,
Zhong-Qiu Wang
,
Yu Tsao
,
Yanmin Qian
,
Shinji Watanabe
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.
INTERSPEECH
(2022)
Siddhant Arora
,
Siddharth Dalmia
,
Pavel Denisov
,
Xuankai Chang
,
Yushi Ueda
,
Yifan Peng
,
Yuekai Zhang
,
Sujay Kumar
,
Karthik Ganesan
,
Brian Yan
,
Ngoc Thang Vu
,
Alan W. Black
,
Shinji Watanabe
ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet.
ICASSP
(2022)
Jinchuan Tian
,
Brian Yan
,
Jianwei Yu
,
Chao Weng
,
Dong Yu
,
Shinji Watanabe
Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks.
CoRR
(2022)
Dan Berrebbi
,
Jiatong Shi
,
Brian Yan
,
Osbel López-Francisco
,
Jonathan D. Amith
,
Shinji Watanabe
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation.
INTERSPEECH
(2022)
Dan Berrebbi
,
Brian Yan
,
Shinji Watanabe
Avoid Overthinking in Self-Supervised Models for Speech Recognition.
CoRR
(2022)
Siddhant Arora
,
Siddharth Dalmia
,
Xuankai Chang
,
Brian Yan
,
Alan W. Black
,
Shinji Watanabe
Two-Pass Low Latency End-to-End Spoken Language Understanding.
CoRR
(2022)
Yosuke Higuchi
,
Brian Yan
,
Siddhant Arora
,
Tetsuji Ogawa
,
Tetsunori Kobayashi
,
Shinji Watanabe
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model.
EMNLP (Findings)
(2022)
Brian Yan
,
Siddharth Dalmia
,
Yosuke Higuchi
,
Graham Neubig
,
Florian Metze
,
Alan W. Black
,
Shinji Watanabe
CTC Alignments Improve Autoregressive Translation.
CoRR
(2022)
Dan Berrebbi
,
Jiatong Shi
,
Brian Yan
,
Osbel López-Francisco
,
Jonathan D. Amith
,
Shinji Watanabe
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation.
CoRR
(2022)
Brian Yan
,
Chunlei Zhang
,
Meng Yu
,
Shi-Xiong Zhang
,
Siddharth Dalmia
,
Dan Berrebbi
,
Chao Weng
,
Shinji Watanabe
,
Dong Yu
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.
ICASSP
(2022)
Siddhant Arora
,
Siddharth Dalmia
,
Brian Yan
,
Florian Metze
,
Alan W. Black
,
Shinji Watanabe
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models.
EMNLP (Findings)
(2022)
Siddhant Arora
,
Siddharth Dalmia
,
Brian Yan
,
Florian Metze
,
Alan W. Black
,
Shinji Watanabe
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models.
CoRR
(2022)
Brian Yan
,
Siddharth Dalmia
,
David R. Mortensen
,
Florian Metze
,
Shinji Watanabe
Differentiable Allophone Graphs for Language-Universal Speech Recognition.
Interspeech
(2021)
Hirofumi Inaguma
,
Siddharth Dalmia
,
Brian Yan
,
Shinji Watanabe
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates.
ASRU
(2021)
Brian Yan
,
Chunlei Zhang
,
Meng Yu
,
Shi-Xiong Zhang
,
Siddharth Dalmia
,
Dan Berrebbi
,
Chao Weng
,
Shinji Watanabe
,
Dong Yu
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.
CoRR
(2021)
Hirofumi Inaguma
,
Brian Yan
,
Siddharth Dalmia
,
Pengcheng Guo
,
Jiatong Shi
,
Kevin Duh
,
Shinji Watanabe
ESPnet-ST IWSLT 2021 Offline Speech Translation System.
CoRR
(2021)
Siddharth Dalmia
,
Brian Yan
,
Vikas Raunak
,
Florian Metze
,
Shinji Watanabe
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks.
NAACL-HLT
(2021)
Hirofumi Inaguma
,
Siddharth Dalmia
,
Brian Yan
,
Shinji Watanabe
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates.
CoRR
(2021)
Siddharth Dalmia
,
Brian Yan
,
Vikas Raunak
,
Florian Metze
,
Shinji Watanabe
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks.
CoRR
(2021)
Hirofumi Inaguma
,
Brian Yan
,
Siddharth Dalmia
,
Pengcheng Guo
,
Jiatong Shi
,
Kevin Duh
,
Shinji Watanabe
ESPnet-ST IWSLT 2021 Offline Speech Translation System.
IWSLT
(2021)
Siddhant Arora
,
Siddharth Dalmia
,
Pavel Denisov
,
Xuankai Chang
,
Yushi Ueda
,
Yifan Peng
,
Yuekai Zhang
,
Sujay Kumar
,
Karthik Ganesan
,
Brian Yan
,
Ngoc Thang Vu
,
Alan W. Black
,
Shinji Watanabe
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet.
CoRR
(2021)
Brian Yan
,
Siddharth Dalmia
,
David R. Mortensen
,
Florian Metze
,
Shinji Watanabe
Differentiable Allophone Graphs for Language-Universal Speech Recognition.
CoRR
(2021)