​
Login / Signup
Dan Su
ORCID
Publication Activity (10 Years)
Years Active: 2018-2024
Publications (10 Years): 143
Top Topics
Speaker Verification
Diffusion Models
Speech Synthesis
Neural Network
Top Venues
CoRR
ICASSP
INTERSPEECH
Interspeech
</>
Publications
</>
Duzhen Zhang
,
Yahan Yu
,
Jiahua Dong
,
Chenxing Li
,
Dan Su
,
Chenhui Chu
,
Dong Yu
MM-LLMs: Recent Advances in MultiModal Large Language Models.
ACL (Findings)
(2024)
Manjie Xu
,
Chenxing Li
,
Duzhen Zhang
,
Dan Su
,
Wei Liang
,
Dong Yu
Prompt-guided Precise Audio Editing with Diffusion Models.
CoRR
(2024)
Yu Gu
,
Qiushi Zhu
,
Guangzhi Lei
,
Chao Weng
,
Dan Su
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis.
ICASSP
(2024)
Yongxin Zhu
,
Dan Su
,
Liqiang He
,
Linli Xu
,
Dong Yu
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer.
ACL (1)
(2024)
Chong Peng
,
Liqiang He
,
Dan Su
Fuse after Align: Improving Face-Voice Association Learning via Multimodal Encoder.
CoRR
(2024)
Jiaxu Zhu
,
Weinan Tong
,
Yaoxun Xu
,
Changhe Song
,
Zhiyong Wu
,
Zhao You
,
Dan Su
,
Dong Yu
,
Helen M. Meng
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation.
CoRR
(2023)
Wei Xiao
,
Wenzhe Liu
,
Meng Wang
,
Shan Yang
,
Yupeng Shi
,
Yuyong Kang
,
Dan Su
,
Shidong Shang
,
Dong Yu
Multi-mode Neural Speech Coding Based on Deep Generative Networks.
INTERSPEECH
(2023)
Yuping Yuan
,
Zhao You
,
Shulin Feng
,
Dan Su
,
Yanchun Liang
,
Xiaohu Shi
,
Dong Yu
Compressed MoE ASR Model Based on Knowledge Distillation and Quantization.
INTERSPEECH
(2023)
Yi Lei
,
Shan Yang
,
Xinsheng Wang
,
Qicong Xie
,
Jixun Yao
,
Lei Xie
,
Dan Su
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis.
AAAI
(2023)
Wenzhe Liu
,
Wei Xiao
,
Meng Wang
,
Shan Yang
,
Yupeng Shi
,
Yuyong Kang
,
Dan Su
,
Shidong Shang
,
Dong Yu
A High Fidelity and Low Complexity Neural Audio Coding.
CoRR
(2023)
Yu Gu
,
Yianrao Bian
,
Guangzhi Lei
,
Chao Weng
,
Dan Su
DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis.
CoRR
(2023)
Jiaxu Zhu
,
Weinan Tong
,
Yaoxun Xu
,
Changhe Song
,
Zhiyong Wu
,
Zhao You
,
Dan Su
,
Dong Yu
,
Helen Meng
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation.
INTERSPEECH
(2023)
Kun Song
,
Heyang Xue
,
Xinsheng Wang
,
Jian Cong
,
Yongmao Zhang
,
Lei Xie
,
Bing Yang
,
Xiong Zhang
,
Dan Su
AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation.
ISCSLP
(2022)
Xiaoyi Qin
,
Na Li
,
Chao Weng
,
Dan Su
,
Ming Li
Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings.
CoRR
(2022)
Liumeng Xue
,
Shan Yang
,
Na Hu
,
Dan Su
,
Lei Xie
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers.
INTERSPEECH
(2022)
Songxiang Liu
,
Shan Yang
,
Dan Su
,
Dong Yu
Referee: Towards Reference-Free Cross-Speaker Style Transfer with Low-Quality Data for Expressive Speech Synthesis.
ICASSP
(2022)
Dongpeng Ma
,
Yiwen Wang
,
Liqiang He
,
Mingjie Jin
,
Dan Su
,
Dong Yu
DP-DWA: Dual-Path Dynamic Weight Attention Network With Streaming Dfsmn-San For Automatic Speech Recognition.
ICASSP
(2022)
Yixuan Zhou
,
Changhe Song
,
Xiang Li
,
Luwen Zhang
,
Zhiyong Wu
,
Yanyao Bian
,
Dan Su
,
Helen Meng
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis.
INTERSPEECH
(2022)
Xiaoyi Qin
,
Na Li
,
Chao Weng
,
Dan Su
,
Ming Li
Simple Attention Module Based Speaker Verification with Iterative Noisy Label Detection.
ICASSP
(2022)
Disong Wang
,
Shan Yang
,
Dan Su
,
Xunying Liu
,
Dong Yu
,
Helen Meng
VCVTS: Multi-Speaker Video-to-Speech Synthesis Via Cross-Modal Knowledge Transfer from Voice Conversion.
ICASSP
(2022)
Zhao You
,
Shulin Feng
,
Dan Su
,
Dong Yu
Speechmoe2: Mixture-of-Experts Model with Improved Routing.
ICASSP
(2022)
Naijun Zheng
,
Na Li
,
Xixin Wu
,
Lingwei Meng
,
Jiawen Kang
,
Haibin Wu
,
Chao Weng
,
Dan Su
,
Helen Meng
The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
ICASSP
(2022)
Qicong Xie
,
Shan Yang
,
Yi Lei
,
Lei Xie
,
Dan Su
End-to-End Voice Conversion with Information Perturbation.
CoRR
(2022)
Disong Wang
,
Shan Yang
,
Dan Su
,
Xunying Liu
,
Dong Yu
,
Helen Meng
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion.
CoRR
(2022)
Qicong Xie
,
Shan Yang
,
Yi Lei
,
Lei Xie
,
Dan Su
End-to-End Voice Conversion with Information Perturbation.
ISCSLP
(2022)
Zhao You
,
Shulin Feng
,
Dan Su
,
Dong Yu
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition.
CoRR
(2022)
Yixuan Zhou
,
Changhe Song
,
Jingbei Li
,
Zhiyong Wu
,
Yanyao Bian
,
Dan Su
,
Helen Meng
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis.
INTERSPEECH
(2022)
Xiaoyi Qin
,
Na Li
,
Yuke Lin
,
Yiwei Ding
,
Chao Weng
,
Dan Su
,
Ming Li
The DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge 2022.
CoRR
(2022)
Max W. Y. Lam
,
Jun Wang
,
Dan Su
,
Dong Yu
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis.
ICLR
(2022)
Kun Song
,
Heyang Xue
,
Xinsheng Wang
,
Jian Cong
,
Yongmao Zhang
,
Lei Xie
,
Bing Yang
,
Xiong Zhang
,
Dan Su
AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation.
CoRR
(2022)
Yixuan Zhou
,
Changhe Song
,
Xiang Li
,
Luwen Zhang
,
Zhiyong Wu
,
Yanyao Bian
,
Dan Su
,
Helen Meng
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis.
CoRR
(2022)
Yi Lei
,
Shan Yang
,
Jian Cong
,
Lei Xie
,
Dan Su
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion.
CoRR
(2022)
Yi Lei
,
Shan Yang
,
Jian Cong
,
Lei Xie
,
Dan Su
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion.
INTERSPEECH
(2022)
Rongjie Huang
,
Max W. Y. Lam
,
Jun Wang
,
Dan Su
,
Dong Yu
,
Yi Ren
,
Zhou Zhao
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis.
IJCAI
(2022)
Yi Lei
,
Shan Yang
,
Xinfa Zhu
,
Lei Xie
,
Dan Su
Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis.
IEEE Signal Process. Lett.
29 (2022)
Liumeng Xue
,
Shan Yang
,
Na Hu
,
Dan Su
,
Lei Xie
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers.
CoRR
(2022)
Rongjie Huang
,
Max W. Y. Lam
,
Jun Wang
,
Dan Su
,
Dong Yu
,
Yi Ren
,
Zhou Zhao
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis.
CoRR
(2022)
Songxiang Liu
,
Dan Su
,
Dong Yu
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs.
CoRR
(2022)
Max W. Y. Lam
,
Jun Wang
,
Dan Su
,
Dong Yu
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis.
CoRR
(2022)
Xiaoyi Qin
,
Na Li
,
Chao Weng
,
Dan Su
,
Ming Li
Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings.
INTERSPEECH
(2022)
Jinchuan Tian
,
Jianwei Yu
,
Chao Weng
,
Shi-Xiong Zhang
,
Dan Su
,
Dong Yu
,
Yuexian Zou
Consistent Training and Decoding for End-to-End Speech Recognition Using Lattice-Free MMI.
ICASSP
(2022)
Zhao You
,
Shulin Feng
,
Dan Su
,
Dong Yu
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition.
ISCSLP
(2022)
Naijun Zheng
,
Na Li
,
Jianwei Yu
,
Chao Weng
,
Dan Su
,
Xunying Liu
,
Helen Meng
Multi-Channel Speaker Diarization Using Spatial Features for Meetings.
ICASSP
(2022)
Jingbei Li
,
Yi Meng
,
Chenyi Li
,
Zhiyong Wu
,
Helen Meng
,
Chao Weng
,
Dan Su
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.
ICASSP
(2022)
Yi Lei
,
Shan Yang
,
Xinsheng Wang
,
Qicong Xie
,
Jixun Yao
,
Lei Xie
,
Dan Su
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis.
CoRR
(2022)
Naijun Zheng
,
Na Li
,
Xixin Wu
,
Lingwei Meng
,
Jiawen Kang
,
Haibin Wu
,
Chao Weng
,
Dan Su
,
Helen Meng
The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge.
CoRR
(2022)
Helin Wang
,
Bo Wu
,
Lianwu Chen
,
Meng Yu
,
Jianwei Yu
,
Yong Xu
,
Shi-Xiong Zhang
,
Chao Weng
,
Dan Su
,
Dong Yu
TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.
Interspeech
(2021)
Liqiang He
,
Shulin Feng
,
Dan Su
,
Dong Yu
Latency-Controlled Neural Architecture Search for Streaming Speech Recognition.
ASRU
(2021)
Xingchen Song
,
Zhiyong Wu
,
Yiheng Huang
,
Chao Weng
,
Dan Su
,
Helen M. Meng
Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input.
ICASSP
(2021)
Guoguo Chen
,
Shuzhou Chai
,
Guan-Bo Wang
,
Jiayu Du
,
Wei-Qiang Zhang
,
Chao Weng
,
Dan Su
,
Daniel Povey
,
Jan Trmal
,
Junbo Zhang
,
Mingjie Jin
,
Sanjeev Khudanpur
,
Shinji Watanabe
,
Shuaijiang Zhao
,
Wei Zou
,
Xiangang Li
,
Xuchen Yao
,
Yongqing Wang
,
Zhao You
,
Zhiyong Yan
GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio.
Interspeech
(2021)
Jinchuan Tian
,
Jianwei Yu
,
Chao Weng
,
Shi-Xiong Zhang
,
Dan Su
,
Dong Yu
,
Yuexian Zou
Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI.
CoRR
(2021)
Peng Liu
,
Yuewen Cao
,
Songxiang Liu
,
Na Hu
,
Guangzhi Li
,
Chao Weng
,
Dan Su
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention.
CoRR
(2021)
Naijun Zheng
,
Na Li
,
Bo Wu
,
Meng Yu
,
Jianwei Yu
,
Chao Weng
,
Dan Su
,
Xunying Liu
,
Helen Meng
A Joint Training Framework of Multi-Look Separator and Speaker Embedding Extractor for Overlapped Speech.
ICASSP
(2021)
Liqiang He
,
Shulin Feng
,
Dan Su
,
Dong Yu
Latency-Controlled Neural Architecture Search for Streaming Speech Recognition.
CoRR
(2021)
Jun Wang
,
Max W. Y. Lam
,
Dan Su
,
Dong Yu
Tune-In: Training Under Negative Environments with Interference for Attention Networks Simulating Cocktail Party Effect.
AAAI
(2021)
Max W. Y. Lam
,
Jun Wang
,
Chao Weng
,
Dan Su
,
Dong Yu
Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition.
Interspeech
(2021)
Jian Cong
,
Shan Yang
,
Na Hu
,
Guangzhi Li
,
Lei Xie
,
Dan Su
Controllable Context-Aware Conversational Speech Synthesis.
Interspeech
(2021)
Jun Wang
,
Max W. Y. Lam
,
Dan Su
,
Dong Yu
Contrastive Separative Coding for Self-Supervised Representation Learning.
ICASSP
(2021)
Xu Li
,
Na Li
,
Chao Weng
,
Xunying Liu
,
Dan Su
,
Dong Yu
,
Helen Meng
Replay and Synthetic Speech Detection with Res2Net Architecture.
ICASSP
(2021)
Songxiang Liu
,
Yuewen Cao
,
Dan Su
,
Helen Meng
DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion.
CoRR
(2021)
Songxiang Liu
,
Shan Yang
,
Dan Su
,
Dong Yu
Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis.
CoRR
(2021)
Jian Cong
,
Shan Yang
,
Na Hu
,
Guangzhi Li
,
Lei Xie
,
Dan Su
Controllable Context-aware Conversational Speech Synthesis.
CoRR
(2021)
Huirong Huang
,
Zhiyong Wu
,
Shiyin Kang
,
Dongyang Dai
,
Jia Jia
,
Tianxiao Fu
,
Deyi Tuo
,
Guangzhi Lei
,
Peng Liu
,
Dan Su
,
Dong Yu
,
Helen Meng
Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams.
APSIPA ASC
(2021)
Jian Cong
,
Shan Yang
,
Lei Xie
,
Dan Su
Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis.
CoRR
(2021)
Max W. Y. Lam
,
Jun Wang
,
Dan Su
,
Dong Yu
Sandglasset: A Light Multi-Granularity Self-attentive Network For Time-Domain Speech Separation.
CoRR
(2021)
Max W. Y. Lam
,
Jun Wang
,
Rongjie Huang
,
Dan Su
,
Dong Yu
Bilateral Denoising Diffusion Models.
CoRR
(2021)
Jun Wang
,
Max W. Y. Lam
,
Dan Su
,
Dong Yu
Contrastive Separative Coding for Self-supervised Representation Learning.
CoRR
(2021)
Yi Chen
,
Shan Yang
,
Na Hu
,
Lei Xie
,
Dan Su
TeNC: Low Bit-Rate Speech Coding with VQ-VAE and GAN.
ICMI Companion
(2021)
Max W. Y. Lam
,
Jun Wang
,
Dan Su
,
Dong Yu
Sandglasset: A Light Multi-Granularity Self-Attentive Network for Time-Domain Speech Separation.
ICASSP
(2021)
Songxiang Liu
,
Yuewen Cao
,
Dan Su
,
Helen Meng
DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion.
ASRU
(2021)
Max W. Y. Lam
,
Jun Wang
,
Dan Su
,
Dong Yu
Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks.
CoRR
(2021)
Jingbei Li
,
Yi Meng
,
Chenyi Li
,
Zhiyong Wu
,
Helen Meng
,
Chao Weng
,
Dan Su
Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis.
CoRR
(2021)
Jun Wang
,
Max W. Y. Lam
,
Dan Su
,
Dong Yu
Tune-In: Training Under Negative Environments with Interference for Attention Networks Simulating Cocktail Party Effect.
CoRR
(2021)
Zhao You
,
Shulin Feng
,
Dan Su
,
Dong Yu
SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts.
CoRR
(2021)
Songxiang Liu
,
Dan Su
,
Dong Yu
Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning.
CoRR
(2021)
Max W. Y. Lam
,
Jun Wang
,
Dan Su
,
Dong Yu
Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks.
SLT
(2021)
Helin Wang
,
Bo Wu
,
Lianwu Chen
,
Meng Yu
,
Jianwei Yu
,
Yong Xu
,
Shi-Xiong Zhang
,
Chao Weng
,
Dan Su
,
Dong Yu
TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation.
CoRR
(2021)
Yuewen Cao
,
Songxiang Liu
,
Shiyin Kang
,
Na Hu
,
Peng Liu
,
Xunying Liu
,
Dan Su
,
Dong Yu
,
Helen Meng
Exploring Cross-lingual Singing Voice Synthesis Using Speech Data.
ISCSLP
(2021)
Jian Cong
,
Shan Yang
,
Lei Xie
,
Dan Su
Glow-WaveGAN: Learning Speech Representations from GAN-Based Variational Auto-Encoder for High Fidelity Flow-Based Speech Synthesis.
Interspeech
(2021)
Songxiang Liu
,
Yuewen Cao
,
Na Hu
,
Dan Su
,
Helen Meng
Fastsvc: Fast Cross-Domain Singing Voice Conversion With Feature-Wise Linear Modulation.
ICME
(2021)
Liqiang He
,
Dan Su
,
Dong Yu
Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition.
ICASSP
(2021)
Max W. Y. Lam
,
Jun Wang
,
Chao Weng
,
Dan Su
,
Dong Yu
Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition.
CoRR
(2021)
Rongzhi Gu
,
Shi-Xiong Zhang
,
Lianwu Chen
,
Yong Xu
,
Meng Yu
,
Dan Su
,
Yuexian Zou
,
Dong Yu
Enhancing End-to-End Multi-Channel Speech Separation Via Spatial Feature Learning.
ICASSP
(2020)
Yuewen Cao
,
Songxiang Liu
,
Xixin Wu
,
Shiyin Kang
,
Peng Liu
,
Zhiyong Wu
,
Xunying Liu
,
Dan Su
,
Dong Yu
,
Helen Meng
Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.
ICASSP
(2020)
Meng Yu
,
Xuan Ji
,
Bo Wu
,
Dan Su
,
Dong Yu
End-to-End Multi-Look Keyword Spotting.
INTERSPEECH
(2020)
Jianwei Yu
,
Bo Wu
,
Rongzhi Gu
,
Shi-Xiong Zhang
,
Lianwu Chen
,
Yong Xu
,
Meng Yu
,
Dan Su
,
Dong Yu
,
Xunying Liu
,
Helen Meng
Audio-visual Multi-channel Recognition of Overlapped Speech.
CoRR
(2020)
Weiwei Lin
,
Man-Wai Mak
,
Na Li
,
Dan Su
,
Dong Yu
A Framework for Adapting DNN Speaker Embedding Across Languages.
IEEE ACM Trans. Audio Speech Lang. Process.
28 (2020)
Xuan Ji
,
Meng Yu
,
Chunlei Zhang
,
Dan Su
,
Tao Yu
,
Xiaoyu Liu
,
Dong Yu
Speaker-Aware Target Speaker Enhancement by Jointly Learning with Speaker Embedding Extraction.
ICASSP
(2020)
Chengzhu Yu
,
Heng Lu
,
Na Hu
,
Meng Yu
,
Chao Weng
,
Kun Xu
,
Peng Liu
,
Deyi Tuo
,
Shiyin Kang
,
Guangzhi Lei
,
Dan Su
,
Dong Yu
DurIAN: Duration Informed Attention Network for Speech Synthesis.
INTERSPEECH
(2020)
Xingchen Song
,
Guangsen Wang
,
Yiheng Huang
,
Zhiyong Wu
,
Dan Su
,
Helen Meng
Speech-XLNet: Unsupervised Acoustic Model Pretraining for Self-Attention Networks.
INTERSPEECH
(2020)
Xingchen Song
,
Zhiyong Wu
,
Yiheng Huang
,
Chao Weng
,
Dan Su
,
Helen Meng
Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input.
CoRR
(2020)
Xu Li
,
Na Li
,
Jinghua Zhong
,
Xixin Wu
,
Xunying Liu
,
Dan Su
,
Dong Yu
,
Helen Meng
Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification.
CoRR
(2020)
Rongzhi Gu
,
Shi-Xiong Zhang
,
Lianwu Chen
,
Yong Xu
,
Meng Yu
,
Dan Su
,
Yuexian Zou
,
Dong Yu
Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning.
CoRR
(2020)
Liqiang He
,
Dan Su
,
Dong Yu
Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition.
CoRR
(2020)
Xuan Ji
,
Meng Yu
,
Jie Chen
,
Jimeng Zheng
,
Dan Su
,
Dong Yu
Integration of Multi-Look Beamformers for Multi-Channel Keyword Spotting.
ICASSP
(2020)
Songxiang Liu
,
Disong Wang
,
Yuewen Cao
,
Lifa Sun
,
Xixin Wu
,
Shiyin Kang
,
Zhiyong Wu
,
Xunying Liu
,
Dan Su
,
Dong Yu
,
Helen Meng
End-To-End Accent Conversion Without Using Native Utterances.
ICASSP
(2020)
Xingcheng Song
,
Zhiyong Wu
,
Yiheng Huang
,
Dan Su
,
Helen Meng
SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition.
INTERSPEECH
(2020)
Songxiang Liu
,
Yuewen Cao
,
Shiyin Kang
,
Na Hu
,
Xunying Liu
,
Dan Su
,
Dong Yu
,
Helen Meng
Transferring Source Style in Non-Parallel Voice Conversion.
INTERSPEECH
(2020)
Shan Yang
,
Heng Lu
,
Shiyin Kang
,
Liumeng Xue
,
Jinba Xiao
,
Dan Su
,
Lei Xie
,
Dong Yu
On the localness modeling for the self-attention based end-to-end speech synthesis.
Neural Networks
125 (2020)
Yiheng Huang
,
Jinchuan Tian
,
Lei Han
,
Guangsen Wang
,
Xingcheng Song
,
Dan Su
,
Dong Yu
A Random Gossip BMUF Process for Neural Language Modeling.
ICASSP
(2020)