​
Login / Signup
Jinyu Li
ORCID
Publication Activity (10 Years)
Years Active: 2000-2024
Publications (10 Years): 253
Top Topics
Speaker Adaptation
Speech Recognition
Language Model
Neural Network
Top Venues
CoRR
ICASSP
INTERSPEECH
ASRU
</>
Publications
</>
Shujie Hu
,
Long Zhou
,
Shujie Liu
,
Sanyuan Chen
,
Hongkun Hao
,
Jing Pan
,
Xunying Liu
,
Jinyu Li
,
Sunit Sivasankaran
,
Linquan Liu
,
Furu Wei
WavLLM: Towards Robust and Adaptive Speech Large Language Model.
CoRR
(2024)
Bing Han
,
Long Zhou
,
Shujie Liu
,
Sanyuan Chen
,
Lingwei Meng
,
Yanming Qian
,
Yanqing Liu
,
Sheng Zhao
,
Jinyu Li
,
Furu Wei
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment.
CoRR
(2024)
Leying Zhang
,
Yao Qian
,
Long Zhou
,
Shujie Liu
,
Dongmei Wang
,
Xiaofei Wang
,
Midia Yousefi
,
Yanmin Qian
,
Jinyu Li
,
Lei He
,
Sheng Zhao
,
Michael Zeng
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations.
CoRR
(2024)
Xun Gong
,
Yu Wu
,
Jinyu Li
,
Shujie Liu
,
Rui Zhao
,
Xie Chen
,
Yanmin Qian
Advanced Long-Content Speech Recognition With Factorized Neural Transducer.
CoRR
(2024)
Zeqian Ju
,
Yuancheng Wang
,
Kai Shen
,
Xu Tan
,
Detai Xin
,
Dongchao Yang
,
Yanqing Liu
,
Yichong Leng
,
Kaitao Song
,
Siliang Tang
,
Zhizheng Wu
,
Tao Qin
,
Xiang-Yang Li
,
Wei Ye
,
Shikun Zhang
,
Jiang Bian
,
Lei He
,
Jinyu Li
,
Sheng Zhao
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
CoRR
(2024)
Peidong Wang
,
Jian Xue
,
Jinyu Li
,
Junkun Chen
,
Aswin Shanmugam Subramanian
Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation.
CoRR
(2024)
Detai Xin
,
Xu Tan
,
Kai Shen
,
Zeqian Ju
,
Dongchao Yang
,
Yuancheng Wang
,
Shinnosuke Takamichi
,
Hiroshi Saruwatari
,
Shujie Liu
,
Jinyu Li
,
Sheng Zhao
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis.
CoRR
(2024)
Tianrui Wang
,
Long Zhou
,
Ziqiang Zhang
,
Yu Wu
,
Shujie Liu
,
Yashesh Gaur
,
Zhuo Chen
,
Jinyu Li
,
Furu Wei
VioLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation.
IEEE ACM Trans. Audio Speech Lang. Process.
32 (2024)
Xiaofei Wang
,
Sefik Emre Eskimez
,
Manthan Thakker
,
Hemin Yang
,
Zirun Zhu
,
Min Tang
,
Yufei Xia
,
Jinzhu Li
,
Sheng Zhao
,
Jinyu Li
,
Naoyuki Kanda
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS.
CoRR
(2024)
Yiming Wang
,
Jinyu Li
Residualtransformer: Residual Low-Rank Learning With Weight-Sharing For Transformer Layers.
ICASSP
(2024)
Qiushi Zhu
,
Long Zhou
,
Ziqiang Zhang
,
Shujie Liu
,
Binxing Jiao
,
Jie Zhang
,
Li-Rong Dai
,
Daxin Jiang
,
Jinyu Li
,
Furu Wei
VatLM: Visual-Audio-Text Pre-Training With Unified Masked Prediction for Speech Representation Learning.
IEEE Trans. Multim.
26 (2024)
Hongkun Hao
,
Long Zhou
,
Shujie Liu
,
Jinyu Li
,
Shujie Hu
,
Rui Wang
,
Furu Wei
Boosting Large Language Model for Speech Synthesis: An Empirical Study.
CoRR
(2024)
Lingwei Meng
,
Long Zhou
,
Shujie Liu
,
Sanyuan Chen
,
Bing Han
,
Shujie Hu
,
Yanqing Liu
,
Jinyu Li
,
Sheng Zhao
,
Xixin Wu
,
Helen Meng
,
Furu Wei
Autoregressive Speech Synthesis without Vector Quantization.
CoRR
(2024)
Mu Yang
,
Naoyuki Kanda
,
Xiaofei Wang
,
Junkun Chen
,
Peidong Wang
,
Jian Xue
,
Jinyu Li
,
Takuya Yoshioka
Diarist: Streaming Speech Translation with Speaker Diarization.
ICASSP
(2024)
Ziqiang Zhang
,
Sanyuan Chen
,
Long Zhou
,
Yu Wu
,
Shuo Ren
,
Shujie Liu
,
Zhuoyuan Yao
,
Xun Gong
,
Li-Rong Dai
,
Jinyu Li
,
Furu Wei
SpeechLM: Enhanced Speech Pre-Training With Unpaired Textual Data.
IEEE ACM Trans. Audio Speech Lang. Process.
32 (2024)
Xiaofei Wang
,
Manthan Thakker
,
Zhuo Chen
,
Naoyuki Kanda
,
Sefik Emre Eskimez
,
Sanyuan Chen
,
Min Tang
,
Shujie Liu
,
Jinyu Li
,
Takuya Yoshioka
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
IEEE ACM Trans. Audio Speech Lang. Process.
32 (2024)
Sanyuan Chen
,
Shujie Liu
,
Long Zhou
,
Yanqing Liu
,
Xu Tan
,
Jinyu Li
,
Sheng Zhao
,
Yao Qian
,
Furu Wei
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers.
CoRR
(2024)
Sara Papi
,
Peidong Wang
,
Junkun Chen
,
Jian Xue
,
Naoyuki Kanda
,
Jinyu Li
,
Yashesh Gaur
Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation.
ICASSP
(2024)
Chenyang Le
,
Yao Qian
,
Dongmei Wang
,
Long Zhou
,
Shujie Liu
,
Xiaofei Wang
,
Midia Yousefi
,
Yanmin Qian
,
Jinyu Li
,
Sheng Zhao
,
Michael Zeng
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation.
CoRR
(2024)
Xun Gong
,
Yu Wu
,
Jinyu Li
,
Shujie Liu
,
Rui Zhao
,
Xie Chen
,
Yanmin Qian
Advanced Long-Content Speech Recognition With Factorized Neural Transducer.
IEEE ACM Trans. Audio Speech Lang. Process.
32 (2024)
Haibin Wu
,
Xiaofei Wang
,
Sefik Emre Eskimez
,
Manthan Thakker
,
Daniel Tompkins
,
Chung-Hsien Tsai
,
Canrun Li
,
Zhen Xiao
,
Sheng Zhao
,
Jinyu Li
,
Naoyuki Kanda
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech.
CoRR
(2024)
Jian Wu
,
Naoyuki Kanda
,
Takuya Yoshioka
,
Rui Zhao
,
Zhuo Chen
,
Jinyu Li
T-SOT FNT: Streaming Multi-Talker ASR with Text-Only Domain Adaptation Capability.
ICASSP
(2024)
Xun Gong
,
Yu Wu
,
Jinyu Li
,
Shujie Liu
,
Rui Zhao
,
Xie Chen
,
Yanmin Qian
LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer.
ICASSP
(2023)
Sara Papi
,
Peidong Wang
,
Junkun Chen
,
Jian Xue
,
Jinyu Li
,
Yashesh Gaur
Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments.
CoRR
(2023)
Peidong Wang
,
Eric Sun
,
Jian Xue
,
Yu Wu
,
Long Zhou
,
Yashesh Gaur
,
Shujie Liu
,
Jinyu Li
LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers.
INTERSPEECH
(2023)
Jian Xue
,
Peidong Wang
,
Jinyu Li
,
Eric Sun
A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability.
ASRU
(2023)
Junkun Chen
,
Jian Xue
,
Peidong Wang
,
Jing Pan
,
Jinyu Li
Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach.
ASRU
(2023)
Eric Sun
,
Jinyu Li
,
Yuxuan Hu
,
Yimeng Zhu
,
Long Zhou
,
Jian Xue
,
Peidong Wang
,
Linquan Liu
,
Shujie Liu
,
Edward Lin
,
Yifan Gong
Building High-Accuracy Multilingual ASR With Gated Language Experts and Curriculum Training.
ASRU
(2023)
Tianrui Wang
,
Long Zhou
,
Ziqiang Zhang
,
Yu Wu
,
Shujie Liu
,
Yashesh Gaur
,
Zhuo Chen
,
Jinyu Li
,
Furu Wei
VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation.
CoRR
(2023)
Jing Pan
,
Jian Wu
,
Yashesh Gaur
,
Sunit Sivasankaran
,
Zhuo Chen
,
Shujie Liu
,
Jinyu Li
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning.
CoRR
(2023)
Yuang Li
,
Yu Wu
,
Jinyu Li
,
Shujie Liu
Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition.
ASRU
(2023)
Ruchao Fan
,
Yiming Wang
,
Yashesh Gaur
,
Jinyu Li
CTCBERT: Advancing Hidden-Unit Bert with CTC Objectives.
ICASSP
(2023)
Jian Wu
,
Yashesh Gaur
,
Zhuo Chen
,
Long Zhou
,
Yimeng Zhu
,
Tianrui Wang
,
Jinyu Li
,
Shujie Liu
,
Bo Ren
,
Linquan Liu
,
Yu Wu
On Decoder-Only Architecture For Speech-to-Text and Large Language Model Integration.
ASRU
(2023)
Jian Wu
,
Yashesh Gaur
,
Zhuo Chen
,
Long Zhou
,
Yimeng Zhu
,
Tianrui Wang
,
Jinyu Li
,
Shujie Liu
,
Bo Ren
,
Linquan Liu
,
Yu Wu
On decoder-only architecture for speech-to-text and large language model integration.
CoRR
(2023)
Yiming Wang
,
Jinyu Li
ResidualTransformer: Residual Low-rank Learning with Weight-sharing for Transformer Layers.
CoRR
(2023)
Rui Zhao
,
Jian Xue
,
Partha Parthasarathy
,
Veljko Miljanic
,
Jinyu Li
Fast and Accurate Factorized Neural Transducer for Text Adaption of End-to-End Speech Recognition Models.
ICASSP
(2023)
Yuang Li
,
Yu Wu
,
Jinyu Li
,
Shujie Liu
Accelerating Transducers through Adjacent Token Merging.
INTERSPEECH
(2023)
Jian Wu
,
Zhuo Chen
,
Min Hu
,
Xiong Xiao
,
Jinyu Li
Speaker Change Detection For Transformer Transducer ASR.
ICASSP
(2023)
Junkun Chen
,
Jian Xue
,
Peidong Wang
,
Jing Pan
,
Jinyu Li
Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach.
CoRR
(2023)
Kun Wei
,
Long Zhou
,
Ziqiang Zhang
,
Liping Chen
,
Shujie Liu
,
Lei He
,
Jinyu Li
,
Furu Wei
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation.
ICASSP
(2023)
Eric Sun
,
Jinyu Li
,
Yuxuan Hu
,
Yimeng Zhu
,
Long Zhou
,
Jian Xue
,
Peidong Wang
,
Linquan Liu
,
Shujie Liu
,
Edward Lin
,
Yifan Gong
Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training.
CoRR
(2023)
Jian Wu
,
Naoyuki Kanda
,
Takuya Yoshioka
,
Rui Zhao
,
Zhuo Chen
,
Jinyu Li
t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability.
CoRR
(2023)
Xiaoqiang Wang
,
Yanqing Liu
,
Jinyu Li
,
Sheng Zhao
Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation.
CoRR
(2023)
Yuang Li
,
Yu Wu
,
Jinyu Li
,
Shujie Liu
Accelerating Transducers through Adjacent Token Merging.
CoRR
(2023)
Yuang Li
,
Yu Wu
,
Jinyu Li
,
Shujie Liu
Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition.
CoRR
(2023)
Xiaoqiang Wang
,
Yanqing Liu
,
Jinyu Li
,
Sheng Zhao
Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation.
ICASSP
(2023)
Chengyi Wang
,
Sanyuan Chen
,
Yu Wu
,
Ziqiang Zhang
,
Long Zhou
,
Shujie Liu
,
Zhuo Chen
,
Yanqing Liu
,
Huaming Wang
,
Jinyu Li
,
Lei He
,
Sheng Zhao
,
Furu Wei
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers.
CoRR
(2023)
Muqiao Yang
,
Naoyuki Kanda
,
Xiaofei Wang
,
Jian Wu
,
Sunit Sivasankaran
,
Zhuo Chen
,
Jinyu Li
,
Takuya Yoshioka
Simulating Realistic Speech Overlaps Improves Multi-Talker ASR.
ICASSP
(2023)
Zhuo Chen
,
Naoyuki Kanda
,
Jian Wu
,
Yu Wu
,
Xiaofei Wang
,
Takuya Yoshioka
,
Jinyu Li
,
Sunit Sivasankaran
,
Sefik Emre Eskimez
Speech Separation with Large-Scale Self-Supervised Learning.
ICASSP
(2023)
Sara Papi
,
Peidong Wang
,
Junkun Chen
,
Jian Xue
,
Naoyuki Kanda
,
Jinyu Li
,
Yashesh Gaur
Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation.
CoRR
(2023)
Xiaofei Wang
,
Manthan Thakker
,
Zhuo Chen
,
Naoyuki Kanda
,
Sefik Emre Eskimez
,
Sanyuan Chen
,
Min Tang
,
Shujie Liu
,
Jinyu Li
,
Takuya Yoshioka
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
CoRR
(2023)
Naoyuki Kanda
,
Jian Wu
,
Xiaofei Wang
,
Zhuo Chen
,
Jinyu Li
,
Takuya Yoshioka
Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.
ICASSP
(2023)
Sara Papi
,
Peidong Wang
,
Junkun Chen
,
Jian Xue
,
Jinyu Li
,
Yashesh Gaur
Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments.
ASRU
(2023)
Mu Yang
,
Naoyuki Kanda
,
Xiaofei Wang
,
Junkun Chen
,
Peidong Wang
,
Jian Xue
,
Jinyu Li
,
Takuya Yoshioka
DiariST: Streaming Speech Translation with Speaker Diarization.
CoRR
(2023)
Jian Xue
,
Peidong Wang
,
Jinyu Li
,
Matt Post
,
Yashesh Gaur
Large-Scale Streaming End-to-End Speech Translation with Neural Transducers.
CoRR
(2022)
Muqiao Yang
,
Naoyuki Kanda
,
Xiaofei Wang
,
Jian Wu
,
Sunit Sivasankaran
,
Zhuo Chen
,
Jinyu Li
,
Takuya Yoshioka
Simulating realistic speech overlaps improves multi-talker ASR.
CoRR
(2022)
Junyi Ao
,
Ziqiang Zhang
,
Long Zhou
,
Shujie Liu
,
Haizhou Li
,
Tom Ko
,
Lirong Dai
,
Jinyu Li
,
Yao Qian
,
Furu Wei
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.
INTERSPEECH
(2022)
Jian Xue
,
Peidong Wang
,
Jinyu Li
,
Matt Post
,
Yashesh Gaur
Large-Scale Streaming End-to-End Speech Translation with Neural Transducers.
INTERSPEECH
(2022)
Desh Raj
,
Liang Lu
,
Zhuo Chen
,
Yashesh Gaur
,
Jinyu Li
Continuous Streaming Multi-Talker ASR with Dual-Path Transducers.
ICASSP
(2022)
Peidong Wang
,
Eric Sun
,
Jian Xue
,
Yu Wu
,
Long Zhou
,
Yashesh Gaur
,
Shujie Liu
,
Jinyu Li
LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers.
CoRR
(2022)
Zhuo Chen
,
Naoyuki Kanda
,
Jian Wu
,
Yu Wu
,
Xiaofei Wang
,
Takuya Yoshioka
,
Jinyu Li
,
Sunit Sivasankaran
,
Sefik Emre Eskimez
Speech separation with large-scale self-supervised learning.
CoRR
(2022)
Ziqiang Zhang
,
Long Zhou
,
Junyi Ao
,
Shujie Liu
,
Lirong Dai
,
Jinyu Li
,
Furu Wei
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training.
CoRR
(2022)
Heming Wang
,
Yao Qian
,
Xiaofei Wang
,
Yiming Wang
,
Chengyi Wang
,
Shujie Liu
,
Takuya Yoshioka
,
Jinyu Li
,
DeLiang Wang
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.
ICASSP
(2022)
Ruchao Fan
,
Guoli Ye
,
Yashesh Gaur
,
Jinyu Li
Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding.
CoRR
(2022)
Yashesh Gaur
,
Nick Kibre
,
Jian Xue
,
Kangyuan Shu
,
Yuhui Wang
,
Issac Alphonso
,
Jinyu Li
,
Yifan Gong
Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition.
CoRR
(2022)
Zhong Meng
,
Yashesh Gaur
,
Naoyuki Kanda
,
Jinyu Li
,
Xie Chen
,
Yu Wu
,
Yifan Gong
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.
INTERSPEECH
(2022)
Wangyou Zhang
,
Zhuo Chen
,
Naoyuki Kanda
,
Shujie Liu
,
Jinyu Li
,
Sefik Emre Eskimez
,
Takuya Yoshioka
,
Xiong Xiao
,
Zhong Meng
,
Yanmin Qian
,
Furu Wei
Separating Long-Form Speech with Group-wise Permutation Invariant Training.
INTERSPEECH
(2022)
Ruchao Fan
,
Yiming Wang
,
Yashesh Gaur
,
Jinyu Li
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives.
CoRR
(2022)
Guoli Ye
,
Vadim Mazalov
,
Jinyu Li
,
Yifan Gong
Have Best of Both Worlds: Two-Pass Hybrid and E2E Cascading Framework for Speech Recognition.
ICASSP
(2022)
Xiaoqiang Wang
,
Yanqing Liu
,
Jinyu Li
,
Veljko Miljanic
,
Sheng Zhao
,
Hosam Khalil
Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems.
CoRR
(2022)
Junyi Ao
,
Ziqiang Zhang
,
Long Zhou
,
Shujie Liu
,
Haizhou Li
,
Tom Ko
,
Lirong Dai
,
Jinyu Li
,
Yao Qian
,
Furu Wei
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.
CoRR
(2022)
Rui Zhao
,
Jian Xue
,
Partha Parthasarathy
,
Veljko Miljanic
,
Jinyu Li
Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models.
CoRR
(2022)
Junyi Ao
,
Rui Wang
,
Long Zhou
,
Chengyi Wang
,
Shuo Ren
,
Yu Wu
,
Shujie Liu
,
Tom Ko
,
Qing Li
,
Yu Zhang
,
Zhihua Wei
,
Yao Qian
,
Jinyu Li
,
Furu Wei
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing.
ACL (1)
(2022)
Sanyuan Chen
,
Yu Wu
,
Zhuo Chen
,
Jian Wu
,
Takuya Yoshioka
,
Shujie Liu
,
Jinyu Li
,
Xiangzhan Yu
Ultra Fast Speech Separation Model with Teacher Student Learning.
CoRR
(2022)
Sanyuan Chen
,
Yu Wu
,
Chengyi Wang
,
Shujie Liu
,
Zhuo Chen
,
Peidong Wang
,
Gang Liu
,
Jinyu Li
,
Jian Wu
,
Xiangzhan Yu
,
Furu Wei
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
CoRR
(2022)
Ziqiang Zhang
,
Junyi Ao
,
Long Zhou
,
Shujie Liu
,
Furu Wei
,
Jinyu Li
The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task.
CoRR
(2022)
Liang Lu
,
Jinyu Li
,
Yifan Gong
Endpoint Detection for Streaming End-to-End Multi-Talker ASR.
ICASSP
(2022)
Xie Chen
,
Zhong Meng
,
Sarangarajan Parthasarathy
,
Jinyu Li
Factorized Neural Transducer for Efficient Language Model Adaptation.
ICASSP
(2022)
Naoyuki Kanda
,
Jian Wu
,
Yu Wu
,
Xiong Xiao
,
Zhong Meng
,
Xiaofei Wang
,
Yashesh Gaur
,
Zhuo Chen
,
Jinyu Li
,
Takuya Yoshioka
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
INTERSPEECH
(2022)
Chengyi Wang
,
Yiming Wang
,
Yu Wu
,
Sanyuan Chen
,
Jinyu Li
,
Shujie Liu
,
Furu Wei
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training.
CoRR
(2022)
Naoyuki Kanda
,
Jian Wu
,
Yu Wu
,
Xiong Xiao
,
Zhong Meng
,
Xiaofei Wang
,
Yashesh Gaur
,
Zhuo Chen
,
Jinyu Li
,
Takuya Yoshioka
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
CoRR
(2022)
Yixuan Zhang
,
Zhuo Chen
,
Jian Wu
,
Takuya Yoshioka
,
Peidong Wang
,
Zhong Meng
,
Jinyu Li
Continuous Speech Separation with Recurrent Selective Attention Network.
ICASSP
(2022)
Naoyuki Kanda
,
Jian Wu
,
Yu Wu
,
Xiong Xiao
,
Zhong Meng
,
Xiaofei Wang
,
Yashesh Gaur
,
Zhuo Chen
,
Jinyu Li
,
Takuya Yoshioka
Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
INTERSPEECH
(2022)
Liang Lu
,
Jinyu Li
,
Yifan Gong
Endpoint Detection for Streaming End-to-End Multi-talker ASR.
CoRR
(2022)
Ziqiang Zhang
,
Sanyuan Chen
,
Long Zhou
,
Yu Wu
,
Shuo Ren
,
Shujie Liu
,
Zhuoyuan Yao
,
Xun Gong
,
Lirong Dai
,
Jinyu Li
,
Furu Wei
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data.
CoRR
(2022)
Sanyuan Chen
,
Chengyi Wang
,
Zhengyang Chen
,
Yu Wu
,
Shujie Liu
,
Zhuo Chen
,
Jinyu Li
,
Naoyuki Kanda
,
Takuya Yoshioka
,
Xiong Xiao
,
Jian Wu
,
Long Zhou
,
Shuo Ren
,
Yanmin Qian
,
Yao Qian
,
Jian Wu
,
Michael Zeng
,
Xiangzhan Yu
,
Furu Wei
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.
IEEE J. Sel. Top. Signal Process.
16 (6) (2022)
Ziqiang Zhang
,
Long Zhou
,
Junyi Ao
,
Shujie Liu
,
Lirong Dai
,
Jinyu Li
,
Furu Wei
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training.
EMNLP
(2022)
Chengyi Wang
,
Yiming Wang
,
Yu Wu
,
Sanyuan Chen
,
Jinyu Li
,
Shujie Liu
,
Furu Wei
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training.
INTERSPEECH
(2022)
Chengyi Wang
,
Yu Wu
,
Sanyuan Chen
,
Shujie Liu
,
Jinyu Li
,
Yao Qian
,
Zhenglu Yang
Improving Self-Supervised Learning for Speech Recognition with Intermediate Layer Supervision.
ICASSP
(2022)
Xiaoqiang Wang
,
Yanqing Liu
,
Jinyu Li
,
Veljko Miljanic
,
Sheng Zhao
,
Hosam Khalil
Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems.
IEEE ACM Trans. Audio Speech Lang. Process.
30 (2022)
Naoyuki Kanda
,
Jian Wu
,
Xiaofei Wang
,
Zhuo Chen
,
Jinyu Li
,
Takuya Yoshioka
VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.
CoRR
(2022)
Kun Wei
,
Long Zhou
,
Ziqiang Zhang
,
Liping Chen
,
Shujie Liu
,
Lei He
,
Jinyu Li
,
Furu Wei
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation.
CoRR
(2022)
Sanyuan Chen
,
Yu Wu
,
Chengyi Wang
,
Zhengyang Chen
,
Zhuo Chen
,
Shujie Liu
,
Jian Wu
,
Yao Qian
,
Furu Wei
,
Jinyu Li
,
Xiangzhan Yu
Unispeech-Sat: Universal Speech Representation Learning With Speaker Aware Pre-Training.
ICASSP
(2022)
Qiu-Shi Zhu
,
Long Zhou
,
Ziqiang Zhang
,
Shujie Liu
,
Binxing Jiao
,
Jie Zhang
,
Lirong Dai
,
Daxin Jiang
,
Jinyu Li
,
Furu Wei
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning.
CoRR
(2022)
Xun Gong
,
Yu Wu
,
Jinyu Li
,
Shujie Liu
,
Rui Zhao
,
Xie Chen
,
Yanmin Qian
LongFNT: Long-form Speech Recognition with Factorized Neural Transducer.
CoRR
(2022)
Zili Huang
,
Zhuo Chen
,
Naoyuki Kanda
,
Jian Wu
,
Yiming Wang
,
Jinyu Li
,
Takuya Yoshioka
,
Xiaofei Wang
,
Peidong Wang
Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition.
CoRR
(2022)
Yashesh Gaur
,
Nick Kibre
,
Jian Xue
,
Kangyuan Shu
,
Yuhui Wang
,
Issac Alphanso
,
Jinyu Li
,
Yifan Gong
Streaming, Fast and Accurate on-Device Inverse Text Normalization for Automatic Speech Recognition.
SLT
(2022)
Yiming Wang
,
Jinyu Li
,
Heming Wang
,
Yao Qian
,
Chengyi Wang
,
Yu Wu
Wav2vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech Recognition.
ICASSP
(2022)
Sanyuan Chen
,
Yu Wu
,
Chengyi Wang
,
Shujie Liu
,
Zhuo Chen
,
Peidong Wang
,
Gang Liu
,
Jinyu Li
,
Jian Wu
,
Xiangzhan Yu
,
Furu Wei
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
INTERSPEECH
(2022)
Jian Xue
,
Peidong Wang
,
Jinyu Li
,
Eric Sun
A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability.
CoRR
(2022)