Sign in
Kai Yu
ORCID
Publication Activity (10 Years)
Years Active: 2005-2024
Publications (10 Years): 252
Top Topics
Weakly Supervised
Event Detection
Language Understanding
Neural Network
Top Venues
CoRR
ICASSP
INTERSPEECH
IEEE ACM Trans. Audio Speech Lang. Process.
</>
Publications
</>
Xuenan Xu
,
Ziyang Ma
,
Mengyue Wu
,
Kai Yu
Towards Weakly Supervised Text-to-Audio Grounding.
CoRR
(2024)
Zihan Zhao
,
Da Ma
,
Lu Chen
,
Liangtai Sun
,
Zihao Li
,
Hongshen Xu
,
Zichen Zhu
,
Su Zhu
,
Shuai Fan
,
Guodong Shen
,
Xin Chen
,
Kai Yu
ChemDFM: Dialogue Foundation Model for Chemistry.
CoRR
(2024)
Zichen Zhu
,
Yang Xu
,
Lu Chen
,
Jingkai Yang
,
Yichuan Ma
,
Yiming Sun
,
Hailin Wen
,
Jiaqi Liu
,
Jinyu Cai
,
Yingzi Ma
,
Situo Zhang
,
Zihan Zhao
,
Liangtai Sun
,
Kai Yu
Multi: Multimodal Understanding Leaderboard with Text and Images.
CoRR
(2024)
Xuenan Xu
,
Zeyu Xie
,
Mengyue Wu
,
Kai Yu
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning.
IEEE ACM Trans. Audio Speech Lang. Process.
32 (2024)
Chenpeng Du
,
Yiwei Guo
,
Hankun Wang
,
Yifan Yang
,
Zhikang Niu
,
Shuai Wang
,
Hui Zhang
,
Xie Chen
,
Kai Yu
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech.
CoRR
(2024)
Zheng Liang
,
Zheshu Song
,
Ziyang Ma
,
Chenpeng Du
,
Kai Yu
,
Xie Chen
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation.
CoRR
(2023)
Hanchong Zhang
,
Jieyu Li
,
Lu Chen
,
Ruisheng Cao
,
Yunyan Zhang
,
Yu Huang
,
Yefeng Zheng
,
Kai Yu
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset.
CoRR
(2023)
Junjie Li
,
Yiwei Guo
,
Xie Chen
,
Kai Yu
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention.
CoRR
(2023)
Xuenan Xu
,
Mengyue Wu
,
Kai Yu
Investigating Pooling Strategies and Loss Functions for Weakly-Supervised Text-to-Audio Grounding via Contrastive Learning.
ICASSP Workshops
(2023)
Qi Chen
,
Ziyang Ma
,
Tao Liu
,
Xu Tan
,
Qu Lu
,
Xie Chen
,
Kai Yu
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation.
CoRR
(2023)
Sheng Jiang
,
Su Zhu
,
Ruisheng Cao
,
Qingliang Miao
,
Kai Yu
SPM: A Split-Parsing Method for Joint Multi-Intent Detection and Slot Filling.
ACL (industry)
(2023)
Ruisheng Cao
,
Hanchong Zhang
,
Hongshen Xu
,
Jieyu Li
,
Da Ma
,
Lu Chen
,
Kai Yu
ASTormer: An AST Structure-aware Transformer Decoder for Text-to-SQL.
CoRR
(2023)
Hanxue Zhang
,
Zeyu Xie
,
Xuenan Xu
,
Mengyue Wu
,
Kai Yu
Improving Audio Caption Fluency with Automatic Error Correction.
CoRR
(2023)
Hanglei Zhang
,
Yiwei Guo
,
Sen Liu
,
Xie Chen
,
Kai Yu
Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations.
CoRR
(2023)
Qi Chen
,
Ziyang Ma
,
Tao Liu
,
Xu Tan
,
Qu Lu
,
Kai Yu
,
Xie Chen
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation.
ICASSP
(2023)
Jieyu Li
,
Lu Chen
,
Ruisheng Cao
,
Su Zhu
,
Hongshen Xu
,
Zhi Chen
,
Hanchong Zhang
,
Kai Yu
On the Structural Generalization in Text-to-SQL.
CoRR
(2023)
Tao Liu
,
Chenpeng Du
,
Shuai Fan
,
Feilong Chen
,
Kai Yu
DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder.
CoRR
(2023)
Zhijun Liu
,
Yiwei Guo
,
Kai Yu
DiffVoice: Text-to-Speech with Latent Diffusion.
ICASSP
(2023)
Danyang Zhang
,
Lu Chen
,
Situo Zhang
,
Hongshen Xu
,
Zihan Zhao
,
Kai Yu
Large Language Models Are Semi-Parametric Reinforcement Learning Agents.
NeurIPS
(2023)
Ruisheng Cao
,
Lu Chen
,
Jieyu Li
,
Hanchong Zhang
,
Hongshen Xu
,
Wangyou Zhang
,
Kai Yu
A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL.
IEEE Trans. Pattern Anal. Mach. Intell.
45 (11) (2023)
Chenpeng Du
,
Yiwei Guo
,
Xie Chen
,
Kai Yu
Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature.
IEEE ACM Trans. Audio Speech Lang. Process.
31 (2023)
Guangwei Li
,
Xuenan Xu
,
Lingfeng Dai
,
Mengyue Wu
,
Kai Yu
Diverse and Vivid Sound Generation from Text Descriptions.
CoRR
(2023)
Yiwei Guo
,
Chenpeng Du
,
Ziyang Ma
,
Xie Chen
,
Kai Yu
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching.
CoRR
(2023)
Wenbin Jiang
,
Kai Yu
Speech Enhancement With Integration of Neural Homomorphic Synthesis and Spectral Masking.
IEEE ACM Trans. Audio Speech Lang. Process.
31 (2023)
Hanchong Zhang
,
Ruisheng Cao
,
Lu Chen
,
Hongshen Xu
,
Kai Yu
ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought.
CoRR
(2023)
Sen Liu
,
Yiwei Guo
,
Chenpeng Du
,
Xie Chen
,
Kai Yu
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech.
CoRR
(2023)
Yifan Yang
,
Feiyu Shen
,
Chenpeng Du
,
Ziyang Ma
,
Kai Yu
,
Daniel Povey
,
Xie Chen
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS.
CoRR
(2023)
Hanchong Zhang
,
Jieyu Li
,
Lu Chen
,
Ruisheng Cao
,
Yunyan Zhang
,
Yu Huang
,
Yefeng Zheng
,
Kai Yu
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset.
ACL (Findings)
(2023)
Chenpeng Du
,
Yiwei Guo
,
Feiyu Shen
,
Kai Yu
Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge.
CoRR
(2023)
Hanchong Zhang
,
Ruisheng Cao
,
Lu Chen
,
Hongshen Xu
,
Kai Yu
ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought.
EMNLP (Findings)
(2023)
Guangwei Li
,
Xuenan Xu
,
Lingfeng Dai
,
Mengyue Wu
,
Kai Yu
Diverse and Vivid Sound Generation from Text Descriptions.
ICASSP
(2023)
Yiming Ai
,
Zhiwei He
,
Kai Yu
,
Rui Wang
TeCS: A Dataset and Benchmark for Tense Consistency of Machine Translation.
CoRR
(2023)
Jieyu Li
,
Lu Chen
,
Ruisheng Cao
,
Su Zhu
,
Hongshen Xu
,
Zhi Chen
,
Hanchong Zhang
,
Kai Yu
Exploring Schema Generalizability of Text-to-SQL.
ACL (Findings)
(2023)
Yiwei Guo
,
Chenpeng Du
,
Xie Chen
,
Kai Yu
Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance.
ICASSP
(2023)
Yiming Ai
,
Zhiwei He
,
Kai Yu
,
Rui Wang
TeCS: A Dataset and Benchmark for Tense Consistency of Machine Translation.
ACL (2)
(2023)
Chenpeng Du
,
Qi Chen
,
Tianyu He
,
Xu Tan
,
Xie Chen
,
Kai Yu
,
Sheng Zhao
,
Jiang Bian
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder.
ACM Multimedia
(2023)
Liangtai Sun
,
Yang Han
,
Zihan Zhao
,
Da Ma
,
Zhennan Shen
,
Baocai Chen
,
Lu Chen
,
Kai Yu
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research.
CoRR
(2023)
Chenpeng Du
,
Yiwei Guo
,
Feiyu Shen
,
Kai Yu
Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge.
ICASSP
(2023)
Chenpeng Du
,
Qi Chen
,
Tianyu He
,
Xu Tan
,
Xie Chen
,
Kai Yu
,
Sheng Zhao
,
Jiang Bian
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder.
CoRR
(2023)
Zhi Chen
,
Yuncong Liu
,
Lu Chen
,
Su Zhu
,
Mengyue Wu
,
Kai Yu
OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue.
Trans. Assoc. Comput. Linguistics
11 (2023)
Danyang Zhang
,
Lu Chen
,
Situo Zhang
,
Hongshen Xu
,
Zihan Zhao
,
Kai Yu
Large Language Model Is Semi-Parametric Reinforcement Learning Agent.
CoRR
(2023)
Danyang Zhang
,
Lu Chen
,
Kai Yu
Mobile-Env: A Universal Platform for Training and Evaluation of Mobile Interaction.
CoRR
(2023)
Zeyu Xie
,
Xuenan Xu
,
Mengyue Wu
,
Kai Yu
Enhance Temporal Relations in Audio Captioning with Sound Event Detection.
CoRR
(2023)
Feiyu Shen
,
Yiwei Guo
,
Chenpeng Du
,
Xie Chen
,
Kai Yu
Acoustic BPE for Speech Generation with Discrete Tokens.
CoRR
(2023)
Chenpeng Du
,
Yiwei Guo
,
Feiyu Shen
,
Zhijun Liu
,
Zheng Liang
,
Xie Chen
,
Shuai Wang
,
Hui Zhang
,
Kai Yu
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding.
CoRR
(2023)
Tao Liu
,
Zhengyang Chen
,
Yanmin Qian
,
Kai Yu
Multi-Speaker End-to-End Multi-Modal Speaker Diarization System for the MISP 2022 Challenge.
ICASSP
(2023)
Guangwei Li
,
Xuenan Xu
,
Mengyue Wu
,
Kai Yu
Navigating Audio-Visual Event Detection Across Mismatched Modalities.
ICASSP
(2022)
Yu Xi
,
Tian Tan
,
Wangyou Zhang
,
Baochen Yang
,
Kai Yu
Text Adaptive Detection for Customizable Keyword Spotting.
ICASSP
(2022)
Siyu Lou
,
Xuenan Xu
,
Mengyue Wu
,
Kai Yu
Audio-text Retrieval in Context.
CoRR
(2022)
Xuenan Xu
,
Mengyue Wu
,
Kai Yu
Diversity-Controllable and Accurate Audio Captioning Based on Neural Condition.
ICASSP
(2022)
Zhi Chen
,
Yuncong Liu
,
Lu Chen
,
Su Zhu
,
Mengyue Wu
,
Kai Yu
OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue.
CoRR
(2022)
Siyu Lou
,
Xuenan Xu
,
Mengyue Wu
,
Kai Yu
Audio-Text Retrieval in Context.
ICASSP
(2022)
Yiwei Guo
,
Chenpeng Du
,
Xie Chen
,
Kai Yu
EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance.
CoRR
(2022)
Wen Wu
,
Mengyue Wu
,
Kai Yu
Climate and Weather: Inspecting Depression Detection via Emotion Recognition.
CoRR
(2022)
Zhi Chen
,
Lu Chen
,
Bei Chen
,
Libo Qin
,
Yuncong Liu
,
Su Zhu
,
Jian-Guang Lou
,
Kai Yu
UniDU: Towards A Unified Generative Dialogue Understanding Framework.
SIGDIAL
(2022)
Zihan Zhao
,
Lu Chen
,
Ruisheng Cao
,
Hongshen Xu
,
Xingyu Chen
,
Kai Yu
TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages.
CoRR
(2022)
Chenpeng Du
,
Yiwei Guo
,
Xie Chen
,
Kai Yu
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature.
INTERSPEECH
(2022)
Zhi Chen
,
Jijia Bao
,
Lu Chen
,
Yuncong Liu
,
Da Ma
,
Bei Chen
,
Mengyue Wu
,
Su Zhu
,
Jian-Guang Lou
,
Kai Yu
DialogZoo: Large-Scale Dialog-Oriented Task Learning.
CoRR
(2022)
Qinpei Zhu
,
Renshou Wu
,
Guangfeng Liu
,
Xinyu Zhu
,
Xingyu Chen
,
Yang Zhou
,
Qingliang Miao
,
Rui Wang
,
Kai Yu
The AISP-SJTU Simultaneous Translation System for IWSLT 2022.
IWSLT@ACL
(2022)
Bo Chen
,
Zhihang Xu
,
Kai Yu
Data augmentation based non-parallel voice conversion with frame-level speaker disentangler.
Speech Commun.
136 (2022)
Tao Liu
,
Xu Xiang
,
Zhengyang Chen
,
Bing Han
,
Kai Yu
,
Yanmin Qian
The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022.
ISCSLP
(2022)
Bo Chen
,
Chenpeng Du
,
Kai Yu
Neural Fusion for Voice Cloning.
IEEE ACM Trans. Audio Speech Lang. Process.
30 (2022)
Zhi Chen
,
Bei Chen
,
Lu Chen
,
Kai Yu
,
Jian-Guang Lou
AdapterShare: Task Correlation Modeling with Adapter Differentiation.
EMNLP
(2022)
Binwei Yao
,
Chao Shi
,
Likai Zou
,
Lingfeng Dai
,
Mengyue Wu
,
Lu Chen
,
Zhen Wang
,
Kai Yu
D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat.
CoRR
(2022)
Yiwei Guo
,
Chenpeng Du
,
Kai Yu
Unsupervised Word-Level Prosody Tagging for Controllable Speech Synthesis.
ICASSP
(2022)
Chenpeng Du
,
Yiwei Guo
,
Xie Chen
,
Kai Yu
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature.
CoRR
(2022)
Guangfeng Liu
,
Qinpei Zhu
,
Xingyu Chen
,
Renjie Feng
,
Jianxin Ren
,
Renshou Wu
,
Qingliang Miao
,
Rui Wang
,
Kai Yu
The AISP-SJTU Translation System for WMT 2022.
WMT
(2022)
Tao Liu
,
Shuai Fan
,
Xu Xiang
,
Hongbo Song
,
Shaoxiong Lin
,
Jiaqi Sun
,
Tianyuan Han
,
Siyuan Chen
,
Binwei Yao
,
Sen Liu
,
Yifei Wu
,
Yanmin Qian
,
Kai Yu
MSDWild: Multi-modal Speaker Diarization Dataset in the Wild.
INTERSPEECH
(2022)
Yiwei Guo
,
Chenpeng Du
,
Kai Yu
Unsupervised word-level prosody tagging for controllable speech synthesis.
CoRR
(2022)
Zhi Chen
,
Lu Chen
,
Bei Chen
,
Libo Qin
,
Yuncong Liu
,
Su Zhu
,
Jian-Guang Lou
,
Kai Yu
UniDU: Towards A Unified Generative Dialogue Understanding Framework.
CoRR
(2022)
Zihan Zhao
,
Lu Chen
,
Ruisheng Cao
,
Hongshen Xu
,
Xingyu Chen
,
Kai Yu
TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages.
NAACL-HLT
(2022)
Chenpeng Du
,
Kai Yu
Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process.
30 (2022)
Guangwei Li
,
Xuenan Xu
,
Heinrich Dinkel
,
Mengyue Wu
,
Kai Yu
Category-Adapted Sound Event Enhancement with Weakly Labeled Data.
ICASSP
(2022)
Wen Wu
,
Mengyue Wu
,
Kai Yu
Climate and Weather: Inspecting Depression Detection via Emotion Recognition.
ICASSP
(2022)
Liangtai Sun
,
Xingyu Chen
,
Lu Chen
,
Tianle Dai
,
Zichen Zhu
,
Kai Yu
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI.
CoRR
(2022)
Xuenan Xu
,
Mengyue Wu
,
Kai Yu
A Comprehensive Survey of Automated Audio Captioning.
CoRR
(2022)
Su Zhu
,
Lu Chen
,
Ruisheng Cao
,
Zhi Chen
,
Qingliang Miao
,
Kai Yu
Few-Shot NLU with Vector Projection Distance and Abstract Triangular CRF.
NLPCC (1)
(2021)
Xuenan Xu
,
Heinrich Dinkel
,
Mengyue Wu
,
Kai Yu
Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events.
ICASSP
(2021)
Heinrich Dinkel
,
Shuai Wang
,
Xuenan Xu
,
Mengyue Wu
,
Kai Yu
Voice activity detection in the wild: A data-driven approach using teacher-student training.
CoRR
(2021)
Zhi Chen
,
Lu Chen
,
Yanbin Zhao
,
Ruisheng Cao
,
Zihan Xu
,
Su Zhu
,
Kai Yu
ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser.
CoRR
(2021)
Lu Chen
,
Xingyu Chen
,
Zihan Zhao
,
Danyang Zhang
,
Jiabao Ji
,
Ao Luo
,
Yuxuan Xiong
,
Kai Yu
WebSRC: A Dataset for Web-Based Structural Reading Comprehension.
CoRR
(2021)
Boer Lyu
,
Lu Chen
,
Su Zhu
,
Kai Yu
LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching.
AAAI
(2021)
Zhi Chen
,
Lu Chen
,
Hanqi Li
,
Ruisheng Cao
,
Da Ma
,
Mengyue Wu
,
Kai Yu
Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL.
ACL/IJCNLP (Findings)
(2021)
Heinrich Dinkel
,
Mengyue Wu
,
Kai Yu
Towards Duration Robust Weakly Supervised Sound Event Detection.
IEEE ACM Trans. Audio Speech Lang. Process.
29 (2021)
Xuenan Xu
,
Heinrich Dinkel
,
Mengyue Wu
,
Zeyu Xie
,
Kai Yu
Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning.
ICASSP
(2021)
Ruisheng Cao
,
Lu Chen
,
Zhi Chen
,
Yanbin Zhao
,
Su Zhu
,
Kai Yu
LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations.
CoRR
(2021)
Xuenan Xu
,
Heinrich Dinkel
,
Mengyue Wu
,
Kai Yu
Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events.
CoRR
(2021)
Xingyu Chen
,
Zihan Zhao
,
Lu Chen
,
Jiabao Ji
,
Danyang Zhang
,
Ao Luo
,
Yuxuan Xiong
,
Kai Yu
WebSRC: A Dataset for Web-Based Structural Reading Comprehension.
EMNLP (1)
(2021)
Xuenan Xu
,
Heinrich Dinkel
,
Mengyue Wu
,
Zeyu Xie
,
Kai Yu
Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning.
CoRR
(2021)
Pingyue Zhang
,
Mengyue Wu
,
Heinrich Dinkel
,
Kai Yu
DEPA: Self-Supervised Audio Embedding for Depression Detection.
ACM Multimedia
(2021)
Heinrich Dinkel
,
Shuai Wang
,
Xuenan Xu
,
Mengyue Wu
,
Kai Yu
Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training.
IEEE ACM Trans. Audio Speech Lang. Process.
29 (2021)
Heinrich Dinkel
,
Mengyue Wu
,
Kai Yu
Towards duration robust weakly supervised sound event detection.
CoRR
(2021)
Chenpeng Du
,
Kai Yu
Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody Modelling.
CoRR
(2021)
Shuai Wang
,
Yexin Yang
,
Yanmin Qian
,
Kai Yu
Revisiting the Statistics Pooling Layer in Deep Speaker Embedding Learning.
ISCSLP
(2021)
Boer Lyu
,
Lu Chen
,
Su Zhu
,
Kai Yu
LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching.
CoRR
(2021)
Chenpeng Du
,
Kai Yu
Mixture Density Network for Phone-Level Prosody Modelling in Speech Synthesis.
CoRR
(2021)
Ruisheng Cao
,
Lu Chen
,
Zhi Chen
,
Yanbin Zhao
,
Su Zhu
,
Kai Yu
LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations.
ACL/IJCNLP (1)
(2021)
Zhi Chen
,
Lu Chen
,
Yanbin Zhao
,
Ruisheng Cao
,
Zihan Xu
,
Su Zhu
,
Kai Yu
ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser.
NAACL-HLT
(2021)
Boer Lyu
,
Lu Chen
,
Kai Yu
Glyph Enhanced Chinese Character Pre-Training for Lexical Sememe Prediction.
EMNLP (Findings)
(2021)
Chenpeng Du
,
Kai Yu
Rich Prosody Diversity Modelling with Phone-Level Mixture Density Network.
Interspeech
(2021)