Sign in
Xie Chen
ORCID
Publication Activity (10 Years)
Years Active: 2022-2024
Publications (10 Years): 33
Top Topics
Prosodic Features
Language Modeling
Text To Speech
Speech Recognition
Top Venues
CoRR
ICASSP
ACM Multimedia
IEEE ACM Trans. Audio Speech Lang. Process.
</>
Publications
</>
Zhisheng Zheng
,
Puyuan Peng
,
Ziyang Ma
,
Xie Chen
,
Eunsol Choi
,
David Harwath
BAT: Learning to Reason about Spatial Sounds with Large Language Models.
CoRR
(2024)
Wenxi Chen
,
Yuzhe Liang
,
Ziyang Ma
,
Zhisheng Zheng
,
Xie Chen
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer.
CoRR
(2024)
Yakun Song
,
Zhuo Chen
,
Xiaofei Wang
,
Ziyang Ma
,
Xie Chen
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering.
CoRR
(2024)
Ziyang Ma
,
Guanrou Yang
,
Yifan Yang
,
Zhifu Gao
,
Jiaming Wang
,
Zhihao Du
,
Fan Yu
,
Qian Chen
,
Siqi Zheng
,
Shiliang Zhang
,
Xie Chen
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.
CoRR
(2024)
Zheng Liang
,
Zheshu Song
,
Ziyang Ma
,
Chenpeng Du
,
Kai Yu
,
Xie Chen
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation.
CoRR
(2023)
Xun Gong
,
Yu Wu
,
Jinyu Li
,
Shujie Liu
,
Rui Zhao
,
Xie Chen
,
Yanmin Qian
LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer.
ICASSP
(2023)
Ziyang Ma
,
Wen Wu
,
Zhisheng Zheng
,
Yiwei Guo
,
Qian Chen
,
Shiliang Zhang
,
Xie Chen
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition.
CoRR
(2023)
Qi Chen
,
Ziyang Ma
,
Tao Liu
,
Xu Tan
,
Qu Lu
,
Xie Chen
,
Kai Yu
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation.
CoRR
(2023)
Hanglei Zhang
,
Yiwei Guo
,
Sen Liu
,
Xie Chen
,
Kai Yu
Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations.
CoRR
(2023)
Qi Chen
,
Ziyang Ma
,
Tao Liu
,
Xu Tan
,
Qu Lu
,
Kai Yu
,
Xie Chen
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation.
ICASSP
(2023)
Xun Gong
,
Wei Wang
,
Hang Shao
,
Xie Chen
,
Yanmin Qian
Factorized AED: Factorized Attention-Based Encoder-Decoder for Text-Only Domain Adaptive ASR.
ICASSP
(2023)
Chenpeng Du
,
Yiwei Guo
,
Xie Chen
,
Kai Yu
Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature.
IEEE ACM Trans. Audio Speech Lang. Process.
31 (2023)
Guanrou Yang
,
Ziyang Ma
,
Zhisheng Zheng
,
Yakun Song
,
Zhikang Niu
,
Xie Chen
Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning.
CoRR
(2023)
Yiwei Guo
,
Chenpeng Du
,
Ziyang Ma
,
Xie Chen
,
Kai Yu
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching.
CoRR
(2023)
Sen Liu
,
Yiwei Guo
,
Chenpeng Du
,
Xie Chen
,
Kai Yu
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech.
CoRR
(2023)
Yifan Yang
,
Feiyu Shen
,
Chenpeng Du
,
Ziyang Ma
,
Kai Yu
,
Daniel Povey
,
Xie Chen
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS.
CoRR
(2023)
Guanrou Yang
,
Ziyang Ma
,
Zhisheng Zheng
,
Yakun Song
,
Zhikang Niu
,
Xie Chen
Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning.
ASRU
(2023)
Xie Chen
,
Ziyang Ma
,
Changli Tang
,
Yujin Wang
,
Zhisheng Zheng
Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition.
CoRR
(2023)
Ziyang Ma
,
Zhisheng Zheng
,
Jiaxin Ye
,
Jinchao Li
,
Zhifu Gao
,
Shiliang Zhang
,
Xie Chen
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation.
CoRR
(2023)
Zhisheng Zheng
,
Ziyang Ma
,
Yu Wang
,
Xie Chen
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition.
CoRR
(2023)
Ziyang Ma
,
Zhisheng Zheng
,
Guanrou Yang
,
Yu Wang
,
Chao Zhang
,
Xie Chen
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation.
CoRR
(2023)
Yiwei Guo
,
Chenpeng Du
,
Xie Chen
,
Kai Yu
Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance.
ICASSP
(2023)
Chenpeng Du
,
Qi Chen
,
Tianyu He
,
Xu Tan
,
Xie Chen
,
Kai Yu
,
Sheng Zhao
,
Jiang Bian
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder.
ACM Multimedia
(2023)
Chenpeng Du
,
Qi Chen
,
Tianyu He
,
Xu Tan
,
Xie Chen
,
Kai Yu
,
Sheng Zhao
,
Jiang Bian
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder.
CoRR
(2023)
Junzhe Liu
,
Jianwei Yu
,
Xie Chen
Improved Factorized Neural Transducer Model For text-only Domain Adaptation.
CoRR
(2023)
Xie Chen
,
Ziyang Ma
,
Changli Tang
,
Yujin Wang
,
Zhisheng Zheng
Front-End Adapter: Adapting Front-End Input of Speech Based Self-Supervised Learning for Speech Recognition.
ICASSP
(2023)
Mingyu Cui
,
Jiawen Kang
,
Jiajun Deng
,
Xi Yin
,
Yutao Xie
,
Xie Chen
,
Xunying Liu
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems.
CoRR
(2023)
Feiyu Shen
,
Yiwei Guo
,
Chenpeng Du
,
Xie Chen
,
Kai Yu
Acoustic BPE for Speech Generation with Discrete Tokens.
CoRR
(2023)
Chenpeng Du
,
Yiwei Guo
,
Feiyu Shen
,
Zhijun Liu
,
Zheng Liang
,
Xie Chen
,
Shuai Wang
,
Hui Zhang
,
Kai Yu
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding.
CoRR
(2023)
Yiwei Guo
,
Chenpeng Du
,
Xie Chen
,
Kai Yu
EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance.
CoRR
(2022)
Ziyang Ma
,
Zhisheng Zheng
,
Changli Tang
,
Yujin Wang
,
Xie Chen
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets.
CoRR
(2022)
Changli Tang
,
Yujin Wang
,
Xie Chen
,
Wei-Qiang Zhang
Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning Models.
CoRR
(2022)
Xun Gong
,
Yu Wu
,
Jinyu Li
,
Shujie Liu
,
Rui Zhao
,
Xie Chen
,
Yanmin Qian
LongFNT: Long-form Speech Recognition with Factorized Neural Transducer.
CoRR
(2022)