​
Login / Signup
Xiaofei Wang
Publication Activity (10 Years)
Years Active: 2021-2024
Publications (10 Years): 30
Top Topics
Speech Recognition
Speaker Diarization
Autoregressive
Noisy Environments
Top Venues
CoRR
ICASSP
INTERSPEECH
Interspeech
</>
Publications
</>
Leying Zhang
,
Yao Qian
,
Long Zhou
,
Shujie Liu
,
Dongmei Wang
,
Xiaofei Wang
,
Midia Yousefi
,
Yanmin Qian
,
Jinyu Li
,
Lei He
,
Sheng Zhao
,
Michael Zeng
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations.
CoRR
(2024)
Sefik Emre Eskimez
,
Xiaofei Wang
,
Manthan Thakker
,
Canrun Li
,
Chung-Hsien Tsai
,
Zhen Xiao
,
Hemin Yang
,
Zirun Zhu
,
Min Tang
,
Xu Tan
,
Yanqing Liu
,
Sheng Zhao
,
Naoyuki Kanda
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS.
CoRR
(2024)
Xiaofei Wang
,
Sefik Emre Eskimez
,
Manthan Thakker
,
Hemin Yang
,
Zirun Zhu
,
Min Tang
,
Yufei Xia
,
Jinzhu Li
,
Sheng Zhao
,
Jinyu Li
,
Naoyuki Kanda
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS.
CoRR
(2024)
Mu Yang
,
Naoyuki Kanda
,
Xiaofei Wang
,
Junkun Chen
,
Peidong Wang
,
Jian Xue
,
Jinyu Li
,
Takuya Yoshioka
Diarist: Streaming Speech Translation with Speaker Diarization.
ICASSP
(2024)
Chenyang Le
,
Yao Qian
,
Dongmei Wang
,
Long Zhou
,
Shujie Liu
,
Xiaofei Wang
,
Midia Yousefi
,
Yanmin Qian
,
Jinyu Li
,
Sheng Zhao
,
Michael Zeng
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation.
CoRR
(2024)
Haibin Wu
,
Xiaofei Wang
,
Sefik Emre Eskimez
,
Manthan Thakker
,
Daniel Tompkins
,
Chung-Hsien Tsai
,
Canrun Li
,
Zhen Xiao
,
Sheng Zhao
,
Jinyu Li
,
Naoyuki Kanda
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech.
CoRR
(2024)
Zili Huang
,
Zhuo Chen
,
Naoyuki Kanda
,
Jian Wu
,
Yiming Wang
,
Jinyu Li
,
Takuya Yoshioka
,
Xiaofei Wang
,
Peidong Wang
Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition.
ICASSP
(2023)
Midia Yousefi
,
Naoyuki Kanda
,
Dongmei Wang
,
Zhuo Chen
,
Xiaofei Wang
,
Takuya Yoshioka
Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach.
INTERSPEECH
(2023)
Muqiao Yang
,
Naoyuki Kanda
,
Xiaofei Wang
,
Jian Wu
,
Sunit Sivasankaran
,
Zhuo Chen
,
Jinyu Li
,
Takuya Yoshioka
Simulating Realistic Speech Overlaps Improves Multi-Talker ASR.
ICASSP
(2023)
Zhuo Chen
,
Naoyuki Kanda
,
Jian Wu
,
Yu Wu
,
Xiaofei Wang
,
Takuya Yoshioka
,
Jinyu Li
,
Sunit Sivasankaran
,
Sefik Emre Eskimez
Speech Separation with Large-Scale Self-Supervised Learning.
ICASSP
(2023)
Naoyuki Kanda
,
Jian Wu
,
Xiaofei Wang
,
Zhuo Chen
,
Jinyu Li
,
Takuya Yoshioka
Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.
ICASSP
(2023)
Mu Yang
,
Naoyuki Kanda
,
Xiaofei Wang
,
Junkun Chen
,
Peidong Wang
,
Jian Xue
,
Jinyu Li
,
Takuya Yoshioka
DiariST: Streaming Speech Translation with Speaker Diarization.
CoRR
(2023)
Takuya Yoshioka
,
Xiaofei Wang
,
Dongmei Wang
,
Min Tang
,
Zirun Zhu
,
Zhuo Chen
,
Naoyuki Kanda
VarArray: Array-Geometry-Agnostic Continuous Speech Separation.
ICASSP
(2022)
Muqiao Yang
,
Naoyuki Kanda
,
Xiaofei Wang
,
Jian Wu
,
Sunit Sivasankaran
,
Zhuo Chen
,
Jinyu Li
,
Takuya Yoshioka
Simulating realistic speech overlaps improves multi-talker ASR.
CoRR
(2022)
Takuya Yoshioka
,
Xiaofei Wang
,
Dongmei Wang
Picknet: Real-Time Channel Selection for Ad Hoc Microphone Arrays.
ICASSP
(2022)
Zhuohuang Zhang
,
Takuya Yoshioka
,
Naoyuki Kanda
,
Zhuo Chen
,
Xiaofei Wang
,
Dongmei Wang
,
Sefik Emre Eskimez
All-Neural Beamformer for Continuous Speech Separation.
ICASSP
(2022)
Xiaofei Wang
,
Dongmei Wang
,
Naoyuki Kanda
,
Sefik Emre Eskimez
,
Takuya Yoshioka
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation.
INTERSPEECH
(2022)
Zhuo Chen
,
Naoyuki Kanda
,
Jian Wu
,
Yu Wu
,
Xiaofei Wang
,
Takuya Yoshioka
,
Jinyu Li
,
Sunit Sivasankaran
,
Sefik Emre Eskimez
Speech separation with large-scale self-supervised learning.
CoRR
(2022)
Heming Wang
,
Yao Qian
,
Xiaofei Wang
,
Yiming Wang
,
Chengyi Wang
,
Shujie Liu
,
Takuya Yoshioka
,
Jinyu Li
,
DeLiang Wang
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.
ICASSP
(2022)
Naoyuki Kanda
,
Xiong Xiao
,
Yashesh Gaur
,
Xiaofei Wang
,
Zhong Meng
,
Zhuo Chen
,
Takuya Yoshioka
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR.
ICASSP
(2022)
Sefik Emre Eskimez
,
Takuya Yoshioka
,
Huaming Wang
,
Xiaofei Wang
,
Zhuo Chen
,
Xuedong Huang
Personalized speech enhancement: new models and Comprehensive evaluation.
ICASSP
(2022)
Naoyuki Kanda
,
Jian Wu
,
Yu Wu
,
Xiong Xiao
,
Zhong Meng
,
Xiaofei Wang
,
Yashesh Gaur
,
Zhuo Chen
,
Jinyu Li
,
Takuya Yoshioka
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
INTERSPEECH
(2022)
Naoyuki Kanda
,
Jian Wu
,
Yu Wu
,
Xiong Xiao
,
Zhong Meng
,
Xiaofei Wang
,
Yashesh Gaur
,
Zhuo Chen
,
Jinyu Li
,
Takuya Yoshioka
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
CoRR
(2022)
Naoyuki Kanda
,
Jian Wu
,
Yu Wu
,
Xiong Xiao
,
Zhong Meng
,
Xiaofei Wang
,
Yashesh Gaur
,
Zhuo Chen
,
Jinyu Li
,
Takuya Yoshioka
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
INTERSPEECH
(2022)
Xiaofei Wang
,
Dongmei Wang
,
Naoyuki Kanda
,
Sefik Emre Eskimez
,
Takuya Yoshioka
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation.
CoRR
(2022)
Naoyuki Kanda
,
Jian Wu
,
Xiaofei Wang
,
Zhuo Chen
,
Jinyu Li
,
Takuya Yoshioka
VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.
CoRR
(2022)
Zili Huang
,
Zhuo Chen
,
Naoyuki Kanda
,
Jian Wu
,
Yiming Wang
,
Jinyu Li
,
Takuya Yoshioka
,
Xiaofei Wang
,
Peidong Wang
Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition.
CoRR
(2022)
Sefik Emre Eskimez
,
Xiaofei Wang
,
Min Tang
,
Hemin Yang
,
Zirun Zhu
,
Zhuo Chen
,
Huaming Wang
,
Takuya Yoshioka
Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement.
Interspeech
(2021)
Naoyuki Kanda
,
Guoli Ye
,
Yu Wu
,
Yashesh Gaur
,
Xiaofei Wang
,
Zhong Meng
,
Zhuo Chen
,
Takuya Yoshioka
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone.
Interspeech
(2021)
Naoyuki Kanda
,
Guoli Ye
,
Yashesh Gaur
,
Xiaofei Wang
,
Zhong Meng
,
Zhuo Chen
,
Takuya Yoshioka
End-to-End Speaker-Attributed ASR with Transformer.
Interspeech
(2021)