Login / Signup
Kaizhi Qian
Publication Activity (10 Years)
Years Active: 2017-2024
Publications (10 Years): 39
Top Topics
Language Model
Speech Synthesis
Information Bottleneck
Speech Recognition
Top Venues
CoRR
ICASSP
ICML
INTERSPEECH
</>
Publications
</>
Junrui Ni
,
Liming Wang
,
Yang Zhang
,
Kaizhi Qian
,
Heting Gao
,
Mark Hasegawa-Johnson
,
Chang D. Yoo
Towards Unsupervised Speech Recognition Without Pronunciation Models.
CoRR
(2024)
Jiaben Chen
,
Xin Yan
,
Yihang Chen
,
Siyuan Cen
,
Qinwei Ma
,
Haoyu Zhen
,
Kaizhi Qian
,
Lie Lu
,
Chuang Gan
RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text.
CoRR
(2024)
Bairu Hou
,
Yujian Liu
,
Kaizhi Qian
,
Jacob Andreas
,
Shiyu Chang
,
Yang Zhang
Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling.
CoRR
(2023)
Zhongzhi Yu
,
Yang Zhang
,
Kaizhi Qian
,
Cheng Wan
,
Yonggan Fu
,
Yongan Zhang
,
Yingyan Celine Lin
Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning.
ICML
(2023)
Kun Su
,
Kaizhi Qian
,
Eli Shlizerman
,
Antonio Torralba
,
Chuang Gan
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos.
CVPR
(2023)
Kun Su
,
Kaizhi Qian
,
Eli Shlizerman
,
Antonio Torralba
,
Chuang Gan
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos.
CoRR
(2023)
Zhongzhi Yu
,
Yang Zhang
,
Kaizhi Qian
,
Yonggan Fu
,
Yingyan Lin
Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning.
CoRR
(2023)
Heting Gao
,
Junrui Ni
,
Kaizhi Qian
,
Yang Zhang
,
Shiyu Chang
,
Mark Hasegawa-Johnson
WavPrompt: Towards Few-Shot Spoken Language Understanding with Frozen Language Models.
INTERSPEECH
(2022)
Cheng-I Jeff Lai
,
Erica Cooper
,
Yang Zhang
,
Shiyu Chang
,
Kaizhi Qian
,
Yi-Lun Liao
,
Yung-Sung Chuang
,
Alexander H. Liu
,
Junichi Yamagishi
,
David D. Cox
,
James R. Glass
On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis.
ICASSP
(2022)
Heting Gao
,
Junrui Ni
,
Kaizhi Qian
,
Yang Zhang
,
Shiyu Chang
,
Mark Hasegawa-Johnson
WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models.
CoRR
(2022)
Kaizhi Qian
,
Yang Zhang
,
Heting Gao
,
Junrui Ni
,
Cheng-I Lai
,
David D. Cox
,
Mark Hasegawa-Johnson
,
Shiyu Chang
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers.
ICML
(2022)
Yonggan Fu
,
Yang Zhang
,
Kaizhi Qian
,
Zhifan Ye
,
Zhongzhi Yu
,
Cheng-I Jeff Lai
,
Celine Lin
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing.
NeurIPS
(2022)
Kaizhi Qian
,
Yang Zhang
,
Heting Gao
,
Junrui Ni
,
Cheng-I Lai
,
David D. Cox
,
Mark Hasegawa-Johnson
,
Shiyu Chang
Improving Self-Supervised Speech Representations by Disentangling Speakers.
CoRR
(2022)
Chak Ho Chan
,
Kaizhi Qian
,
Yang Zhang
,
Mark Hasegawa-Johnson
SpeechSplit2.0: Unsupervised Speech Disentanglement for Voice Conversion without Tuning Autoencoder Bottlenecks.
ICASSP
(2022)
Junrui Ni
,
Liming Wang
,
Heting Gao
,
Kaizhi Qian
,
Yang Zhang
,
Shiyu Chang
,
Mark Hasegawa-Johnson
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition.
INTERSPEECH
(2022)
Heting Gao
,
Junrui Ni
,
Yang Zhang
,
Kaizhi Qian
,
Shiyu Chang
,
Mark Hasegawa-Johnson
Domain Generalization for Language-Independent Automatic Speech Recognition.
Frontiers Artif. Intell.
5 (2022)
Chak Ho Chan
,
Kaizhi Qian
,
Yang Zhang
,
Mark Hasegawa-Johnson
SpeechSplit 2.0: Unsupervised speech disentanglement for voice conversion Without tuning autoencoder Bottlenecks.
CoRR
(2022)
Yonggan Fu
,
Yang Zhang
,
Kaizhi Qian
,
Zhifan Ye
,
Zhongzhi Yu
,
Cheng-I Lai
,
Yingyan Lin
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing.
CoRR
(2022)
Junrui Ni
,
Liming Wang
,
Heting Gao
,
Kaizhi Qian
,
Yang Zhang
,
Shiyu Chang
,
Mark Hasegawa-Johnson
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition.
CoRR
(2022)
Cheng-I Jeff Lai
,
Yang Zhang
,
Alexander H. Liu
,
Shiyu Chang
,
Yi-Lun Liao
,
Yung-Sung Chuang
,
Kaizhi Qian
,
Sameer Khurana
,
David D. Cox
,
Jim Glass
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition.
NeurIPS
(2021)
Cheng-I Jeff Lai
,
Yang Zhang
,
Alexander H. Liu
,
Shiyu Chang
,
Yi-Lun Liao
,
Yung-Sung Chuang
,
Kaizhi Qian
,
Sameer Khurana
,
David D. Cox
,
James R. Glass
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition.
CoRR
(2021)
Kaizhi Qian
,
Yang Zhang
,
Shiyu Chang
,
Jinjun Xiong
,
Chuang Gan
,
David D. Cox
,
Mark Hasegawa-Johnson
Global Rhythm Style Transfer Without Text Transcriptions.
CoRR
(2021)
Mark R. Saddler
,
Andrew Francl
,
Jenelle Feather
,
Kaizhi Qian
,
Yang Zhang
,
Josh H. McDermott
Speech Denoising with Auditory Models.
Interspeech
(2021)
Kaizhi Qian
,
Yang Zhang
,
Shiyu Chang
,
Jinjun Xiong
,
Chuang Gan
,
David Cox
,
Mark Hasegawa-Johnson
Global Prosody Style Transfer Without Text Transcriptions.
ICML
(2021)
Heting Gao
,
Junrui Ni
,
Yang Zhang
,
Kaizhi Qian
,
Shiyu Chang
,
Mark Hasegawa-Johnson
Zero-Shot Cross-Lingual Phonetic Recognition with External Language Embedding.
Interspeech
(2021)
Hui Shi
,
Yang Zhang
,
Hao Wu
,
Shiyu Chang
,
Kaizhi Qian
,
Mark Hasegawa-Johnson
,
Jishen Zhao
Continuous Cnn For Nonuniform Time Series.
ICASSP
(2021)
Cheng-I Jeff Lai
,
Erica Cooper
,
Yang Zhang
,
Shiyu Chang
,
Kaizhi Qian
,
Yi-Lun Liao
,
Yung-Sung Chuang
,
Alexander H. Liu
,
Junichi Yamagishi
,
David Cox
,
James R. Glass
On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis.
CoRR
(2021)
Kaizhi Qian
,
Yang Zhang
,
Shiyu Chang
,
David D. Cox
,
Mark Hasegawa-Johnson
Unsupervised Speech Decomposition via Triple Information Bottleneck.
CoRR
(2020)
Kaizhi Qian
,
Yang Zhang
,
Shiyu Chang
,
Mark Hasegawa-Johnson
,
David D. Cox
Unsupervised Speech Decomposition via Triple Information Bottleneck.
ICML
(2020)
Mark R. Saddler
,
Andrew Francl
,
Jenelle Feather
,
Kaizhi Qian
,
Yang Zhang
,
Josh H. McDermott
Deep Network Perceptual Losses for Speech Denoising.
CoRR
(2020)
Kaizhi Qian
,
Zeyu Jin
,
Mark Hasegawa-Johnson
,
Gautham J. Mysore
F0-Consistent Many-To-Many Non-Parallel Voice Conversion Via Conditional Autoencoder.
ICASSP
(2020)
Kaizhi Qian
,
Zeyu Jin
,
Mark Hasegawa-Johnson
,
Gautham J. Mysore
F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder.
CoRR
(2020)
Kaizhi Qian
,
Yang Zhang
,
Shiyu Chang
,
Xuesong Yang
,
Mark Hasegawa-Johnson
AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss.
ICML
(2019)
Yang Zhang
,
Shiyu Chang
,
Mo Yu
,
Kaizhi Qian
An Efficient and Margin-Approaching Zero-Confidence Adversarial Attack.
CoRR
(2019)
Feng Li
,
Kaizhi Qian
,
Mark Hasegawa-Johnson
,
Masato Akagi
Monaural Singing Voice Separation Using Fusion-Net with Time-Frequency Masking.
APSIPA
(2019)
Kaizhi Qian
,
Yang Zhang
,
Shiyu Chang
,
Xuesong Yang
,
Mark Hasegawa-Johnson
Zero-Shot Voice Style Transfer with Only Autoencoder Loss.
CoRR
(2019)
Kaizhi Qian
,
Yang Zhang
,
Shiyu Chang
,
Xuesong Yang
,
Dinei A. F. Florêncio
,
Mark Hasegawa-Johnson
Deep Learning Based Speech Beamforming.
ICASSP
(2018)
Kaizhi Qian
,
Yang Zhang
,
Shiyu Chang
,
Xuesong Yang
,
Dinei A. F. Florêncio
,
Mark Hasegawa-Johnson
Deep Learning Based Speech Beamforming.
CoRR
(2018)
Kaizhi Qian
,
Yang Zhang
,
Shiyu Chang
,
Xuesong Yang
,
Dinei Florêncio
,
Mark Hasegawa-Johnson
Speech Enhancement Using Bayesian Wavenet.
INTERSPEECH
(2017)