​
Login / Signup
Sheng Li
ORCID
Publication Activity (10 Years)
Years Active: 2011-2024
Publications (10 Years): 86
Top Topics
Language Identification
Speech Recognition
Voice Activity Detection
Acoustic Models
Top Venues
ICASSP
INTERSPEECH
CoRR
Odyssey
</>
Publications
</>
Yi Zhao
,
Chunyu Qiang
,
Hao Li
,
Yulan Hu
,
Wangjin Zhou
,
Sheng Li
Enhancing Realism in 3D Facial Animation Using Conformer-Based Generation and Automated Post-Processing.
ICASSP
(2024)
Yankun Wu
,
Yuta Nakashima
,
Noa Garcia
,
Sheng Li
,
Zhaoyang Zeng
Reproducibility Companion Paper: Stable Diffusion for Content-Style Disentanglement in Art Analysis.
ICMR
(2024)
Nan Li
,
Longbiao Wang
,
Meng Ge
,
Masashi Unoki
,
Sheng Li
,
Jianwu Dang
Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network.
Speech Commun.
157 (2024)
Lele Zheng
,
Yang Cao
,
Renhe Jiang
,
Kenjiro Taura
,
Yulong Shen
,
Sheng Li
,
Masatoshi Yoshikawa
Enhancing Privacy of Spatiotemporal Federated Learning against Gradient Inversion Attacks.
CoRR
(2024)
Wangjin Zhou
,
Zhengdong Yang
,
Chenhui Chu
,
Sheng Li
,
Raj Dabre
,
Yi Zhao
,
Tatsuya Kawahara
MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction.
ICASSP
(2024)
Sheng Li
,
Jiyi Li
,
Yang Cao
Phantom in the opera: adversarial music attack for robot dialogue system.
Frontiers Comput. Sci.
6 (2024)
Soky Kak
,
Sheng Li
,
Chenhui Chu
,
Tatsuya Kawahara
Domain and Language Adaptation Using Heterogeneous Datasets for Wav2vec2.0-Based Speech Recognition of Low-Resource Language.
ICASSP
(2023)
Longfei Yang
,
Jiyi Li
,
Sheng Li
,
Takahiro Shinozaki
Multi-Domain Dialogue State Tracking with Disentangled Domain-Slot Attention.
ACL (Findings)
(2023)
Shuichiro Shimizu
,
Chenhui Chu
,
Sheng Li
,
Sadao Kurohashi
Towards Speech Dialogue Translation Mediating Speakers of Different Languages.
ACL (Findings)
(2023)
Sheng Li
,
Jiyi Li
Correction while Recognition: Combining Pretrained Language Model for Taiwan-Accented Speech Recognition.
ICANN (7)
(2023)
Zili Qi
,
Xinhui Hu
,
Wangjin Zhou
,
Sheng Li
,
Hao Wu
,
Jian Lu
,
Xinkang Xu
LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement.
ASRU
(2023)
Zhengdong Yang
,
Shuichiro Shimizu
,
Wangjin Zhou
,
Sheng Li
,
Chenhui Chu
The Kyoto Speech-to-Speech Translation System for IWSLT 2023.
IWSLT@ACL
(2023)
Chao Tan
,
Yang Cao
,
Sheng Li
,
Masatoshi Yoshikawa
General or Specific? Investigating Effective Privacy Protection in Federated Learning for Speech Emotion Recognition.
ICASSP
(2023)
Wenqing Wei
,
Zhengdong Yang
,
Yuan Gao
,
Jiyi Li
,
Chenhui Chu
,
Shogo Okada
,
Sheng Li
FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimers Speech Detection.
ASRU
(2023)
Soky Kak
,
Sheng Li
,
Chenhui Chu
,
Tatsuya Kawahara
Finetuning Pretrained Model with Embedding of Domain and Language Information for ASR of Very Low-Resource Settings.
Int. J. Asian Lang. Process.
33 (4) (2023)
Xiaojiao Chen
,
Sheng Li
,
Jiyi Li
,
Hao Huang
,
Yang Cao
,
Liang He
Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization.
MMAsia
(2023)
Qianying Liu
,
Zhuo Gong
,
Zhengdong Yang
,
Yuhang Yang
,
Sheng Li
,
Chenchen Ding
,
Nobuaki Minematsu
,
Hao Huang
,
Fei Cheng
,
Chenhui Chu
,
Sadao Kurohashi
Hierarchical Softmax for End-To-End Low-Resource Multilingual Speech Recognition.
ICASSP
(2023)
Kai Wang
,
Yuhang Yang
,
Hao Huang
,
Ying Hu
,
Sheng Li
Speakeraugment: Data Augmentation for Generalizable Source Separation via Speaker Parameter Manipulation.
ICASSP
(2023)
Yuhang Yang
,
Haihua Xu
,
Hao Huang
,
Eng Siong Chng
,
Sheng Li
Speech-Text Based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition.
ICASSP
(2023)
Yuqin Lin
,
Jianwu Dang
,
Longbiao Wang
,
Sheng Li
,
Chenchen Ding
Disordered speech recognition considering low resources and abnormal articulation.
Speech Commun.
155 (2023)
Shuichiro Shimizu
,
Chenhui Chu
,
Sheng Li
,
Sadao Kurohashi
Towards Speech Dialogue Translation Mediating Speakers of Different Languages.
CoRR
(2023)
Nan Li
,
Meng Ge
,
Longbiao Wang
,
Masashi Unoki
,
Sheng Li
,
Jianwu Dang
Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network.
INTERSPEECH
(2022)
Xiaojiao Chen
,
Sheng Li
,
Hao Huang
GhostVec: Directly Extracting Speaker Embedding from End-to-End Speech Recognition Model Using Adversarial Examples.
ICONIP (6)
(2022)
Zhengdong Yang
,
Wangjin Zhou
,
Chenhui Chu
,
Sheng Li
,
Raj Dabre
,
Raphael Rubino
,
Yi Zhao
Fusion of Self-supervised Learned Models for MOS Prediction.
INTERSPEECH
(2022)
Kai Li
,
Xugang Lu
,
Masato Akagi
,
Jianwu Dang
,
Sheng Li
,
Masashi Unoki
Relationship Between Speakers' Physiological Structure and Acoustic Speech Signals: Data-Driven Study Based on Frequency-Wise Attentional Neural Network.
EUSIPCO
(2022)
Qianying Liu
,
Yuhang Yang
,
Zhuo Gong
,
Sheng Li
,
Chenchen Ding
,
Nobuaki Minematsu
,
Hao Huang
,
Fei Cheng
,
Sadao Kurohashi
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition.
CoRR
(2022)
Zhuo Gong
,
Daisuke Saito
,
Longfei Yang
,
Takahiro Shinozaki
,
Sheng Li
,
Hisashi Kawai
,
Nobuaki Minematsu
Self-Adaptive Multilingual ASR Rescoring with Language Identification and Unified Language Model.
Odyssey
(2022)
Soky Kak
,
Sheng Li
,
Masato Mimura
,
Chenhui Chu
,
Tatsuya Kawahara
Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism.
INTERSPEECH
(2022)
Kai Li
,
Sheng Li
,
Xugang Lu
,
Masato Akagi
,
Meng Liu
,
Lin Zhang
,
Chang Zeng
,
Longbiao Wang
,
Jianwu Dang
,
Masashi Unoki
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection.
INTERSPEECH
(2022)
Kai Wang
,
Yizhou Peng
,
Hao Huang
,
Ying Hu
,
Sheng Li
Mining Hard Samples Locally And Globally For Improved Speech Separation.
ICASSP
(2022)
Yongjie Lv
,
Longbiao Wang
,
Meng Ge
,
Sheng Li
,
Chenchen Ding
,
Lixin Pan
,
Yuguang Wang
,
Jianwu Dang
,
Kiyoshi Honda
Compressing Transformer-Based ASR Model by Task-Driven Loss and Attention-Based Multi-Level Feature Distillation.
ICASSP
(2022)
Soky Kak
,
Zhuo Gong
,
Sheng Li
Nict-Tib1: A Public Speech Corpus Of Lhasa Dialect For Benchmarking Tibetan Language Speech Recognition Systems.
O-COCOSDA 2022
(2022)
Longfei Yang
,
Wenqing Wei
,
Sheng Li
,
Jiyi Li
,
Takahiro Shinozaki
Augmented Adversarial Self-Supervised Learning for Early-Stage Alzheimer's Speech Detection.
INTERSPEECH
(2022)
Hao Shi
,
Longbiao Wang
,
Sheng Li
,
Jianwu Dang
,
Tatsuya Kawahara
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction.
INTERSPEECH
(2022)
Yuhang Yang
,
Haihua Xu
,
Hao Huang
,
Eng Siong Chng
,
Sheng Li
Speech-text based multi-modal training with bidirectional attention for improved speech recognition.
CoRR
(2022)
Siqing Qin
,
Longbiao Wang
,
Sheng Li
,
Yuqin Lin
,
Jianwu Dang
Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition.
INTERSPEECH
(2022)
Longfei Yang
,
Jiyi Li
,
Sheng Li
,
Takahiro Shinozaki
Multi-Domain Dialogue State Tracking with Top-K Slot Self Attention.
SIGDIAL
(2022)
Siqing Qin
,
Longbiao Wang
,
Sheng Li
,
Jianwu Dang
,
Lixin Pan
Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling.
EURASIP J. Audio Speech Music. Process.
2022 (1) (2022)
Sheng Li
,
Jiyi Li
,
Qianying Liu
,
Zhuo Gong
Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection.
LREC
(2022)
Zhengdong Yang
,
Wangjin Zhou
,
Chenhui Chu
,
Sheng Li
,
Raj Dabre
,
Raphael Rubino
,
Yi Zhao
Fusion of Self-supervised Learned Models for MOS Prediction.
CoRR
(2022)
Shunfei Chen
,
Xinhui Hu
,
Sheng Li
,
Xinkang Xu
An Investigation of Using Hybrid Modeling Units for Improving End-to-End Speech Recognition System.
ICASSP
(2021)
Ding Wang
,
Shuaishuai Ye
,
Xinhui Hu
,
Sheng Li
,
Xinkang Xu
An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model.
Interspeech
(2021)
Haoran Yin
,
Hao Shi
,
Longbiao Wang
,
Luya Qiang
,
Sheng Li
,
Meng Ge
,
Gaoyan Zhang
,
Jianwu Dang
Simultaneous Progressive Filtering-Based Monaural Speech Enhancement.
ICONIP (5)
(2021)
Nan Li
,
Longbiao Wang
,
Masashi Unoki
,
Sheng Li
,
Rui Wang
,
Meng Ge
,
Jianwu Dang
Robust Voice Activity Detection Using a Masked Auditory Encoder Based Convolutional Neural Network.
ICASSP
(2021)
Kai Wang
,
Hao Huang
,
Ying Hu
,
Zhihua Huang
,
Sheng Li
End-to-End Speech Separation Using Orthogonal Representation in Complex and Real Time-Frequency Domain.
Interspeech
(2021)
Soky Kak
,
Masato Mimura
,
Tatsuya Kawahara
,
Chenhui Chu
,
Sheng Li
,
Chenchen Ding
,
Sethserey Sam
TriECCC: Trilingual Corpus of the Extraordinary Chambers in the Courts of Cambodia for Speech Recognition and Translation Studies.
Int. J. Asian Lang. Process.
31 (3&4) (2021)
Luya Qiang
,
Hao Shi
,
Meng Ge
,
Haoran Yin
,
Nan Li
,
Longbiao Wang
,
Sheng Li
,
Jianwu Dang
Speech Dereverberation Based on Scale-Aware Mean Square Error Loss.
ICONIP (5)
(2021)
Soky Kak
,
Sheng Li
,
Masato Mimura
,
Chenhui Chu
,
Tatsuya Kawahara
On the Use of Speaker Information for Automatic Speech Recognition in Speaker-imbalanced Corpora.
APSIPA ASC
(2021)
Hao Shi
,
Longbiao Wang
,
Sheng Li
,
Cunhang Fan
,
Jianwu Dang
,
Tatsuya Kawahara
Spectrograms Fusion-based End-to-end Robust Automatic Speech Recognition.
APSIPA ASC
(2021)
Soky Kak
,
Masato Mimura
,
Tatsuya Kawahara
,
Sheng Li
,
Chenchen Ding
,
Chenhui Chu
,
Sethserey Sam
Khmer Speech Translation Corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC).
O-COCOSDA
(2021)
Dawei Liu
,
Longbiao Wang
,
Sheng Li
,
Haoyu Li
,
Chenchen Ding
,
Ju Zhang
,
Jianwu Dang
Exploring Effective Speech Representation via ASR for High-Quality End-to-End Multispeaker TTS.
ICONIP (6)
(2021)
Hao Huang
,
Kai Wang
,
Ying Hu
,
Sheng Li
Encoder-Decoder Based Pitch Tracking and Joint Model Training for Mandarin Tone Classification.
ICASSP
(2021)
Yizhou Peng
,
Jicheng Zhang
,
Haobo Zhang
,
Haihua Xu
,
Hao Huang
,
Sheng Li
,
Eng Siong Chng
Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework.
APSIPA ASC
(2021)
Yuqin Lin
,
Longbiao Wang
,
Sheng Li
,
Jianwu Dang
,
Chenchen Ding
Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription.
INTERSPEECH
(2020)
Yaowei Han
,
Sheng Li
,
Yang Cao
,
Qiang Ma
,
Masatoshi Yoshikawa
Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release.
CoRR
(2020)
Yuqin Lin
,
Longbiao Wang
,
Jianwu Dang
,
Sheng Li
,
Chenchen Ding
End-to-End Articulatory Modeling for Dysarthric Articulatory Attribute Detection.
ICASSP
(2020)
Peng Shen
,
Xugang Lu
,
Sheng Li
,
Hisashi Kawai
Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification.
IEEE ACM Trans. Audio Speech Lang. Process.
28 (2020)
Aye Thida
,
Nway Nway Han
,
Sheinn Thawtar Oo
,
Sheng Li
,
Chenchen Ding
VOIS: The First Speech Therapy App Specifically Designed for Myanmar Hearing-Impaired Children.
O-COCOSDA
(2020)
Hao Shi
,
Longbiao Wang
,
Meng Ge
,
Sheng Li
,
Jianwu Dang
Spectrograms Fusion with Minimum Difference Masks Estimation for Monaural Speech Dereverberation.
ICASSP
(2020)
Yaowei Han
,
Sheng Li
,
Yang Cao
,
Qiang Ma
,
Masatoshi Yoshikawa
Voice-Indistinguishability: Protecting Voiceprint In Privacy-Preserving Speech Data Release.
ICME
(2020)
Shaotong Guo
,
Longbiao Wang
,
Sheng Li
,
Ju Zhang
,
Cheng Gong
,
Yuguang Wang
,
Jianwu Dang
,
Kiyoshi Honda
Investigation of Effectively Synthesizing Code-Switched Speech Using Highly Imbalanced Mix-Lingual Data.
ICONIP (1)
(2020)
Hao Shi
,
Longbiao Wang
,
Sheng Li
,
Chenchen Ding
,
Meng Ge
,
Nan Li
,
Jianwu Dang
,
Hiroshi Seki
Singing Voice Extraction with Attention-Based Spectrograms Fusion.
INTERSPEECH
(2020)
Sheng Li
,
Xugang Lu
,
Raj Dabre
,
Peng Shen
,
Hisashi Kawai
Joint Training End-to-End Speech Recognition Systems with Speaker Attributes.
Odyssey
(2020)
Yaowei Han
,
Yang Cao
,
Sheng Li
,
Qiang Ma
,
Masatoshi Yoshikawa
Voice-Indistinguishability - Protecting Voiceprint with Differential Privacy under an Untrusted Server.
CCS
(2020)
Peng Shen
,
Xugang Lu
,
Komei Sugiura
,
Sheng Li
,
Hisashi Kawai
Compensation on x-vector for Short Utterance Spoken Language Identification.
Odyssey
(2020)
Lixin Pan
,
Sheng Li
,
Longbiao Wang
,
Jianwu Dang
Effective Training End-to-End ASR systems for Low-resource Lhasa Dialect of Tibetan Language.
APSIPA
(2019)
Xugang Lu
,
Peng Shen
,
Sheng Li
,
Yu Tsao
,
Hisashi Kawai
Deep progressive multi-scale attention for acoustic event classification.
CoRR
(2019)
Ryoichi Takashima
,
Sheng Li
,
Hisashi Kawai
Investigation of Sequence-level Knowledge Distillation Methods for CTC Acoustic Models.
ICASSP
(2019)
Soky Kak
,
Sheng Li
,
Tatsuya Kawahara
,
Sopheap Seng
Multi-lingual Transformer Training for Khmer Automatic Speech Recognition.
APSIPA
(2019)
Sheng Li
,
Xugang Lu
,
Chenchen Ding
,
Peng Shen
,
Tatsuya Kawahara
,
Hisashi Kawai
Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese.
INTERSPEECH
(2019)
Sheng Li
,
Raj Dabre
,
Xugang Lu
,
Peng Shen
,
Tatsuya Kawahara
,
Hisashi Kawai
Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation.
INTERSPEECH
(2019)
Sheng Li
,
Chenchen Ding
,
Xugang Lu
,
Peng Shen
,
Tatsuya Kawahara
,
Hisashi Kawai
End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition.
INTERSPEECH
(2019)
Peng Shen
,
Xugang Lu
,
Sheng Li
,
Hisashi Kawai
Interactive Learning of Teacher-student Model for Short Utterance Spoken Language Identification.
ICASSP
(2019)
Xugang Lu
,
Peng Shen
,
Sheng Li
,
Yu Tsao
,
Hisashi Kawai
Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection.
INTERSPEECH
(2019)
Peng Shen
,
Xugang Lu
,
Sheng Li
,
Hisashi Kawai
Feature Representation of Short Utterances Based on Knowledge Distillation for Spoken Language Identification.
INTERSPEECH
(2018)
Ryoichi Takashima
,
Sheng Li
,
Hisashi Kawai
CTC Loss Function with a Unit-Level Ambiguity Penalty.
ICASSP
(2018)
Ryoichi Takashima
,
Sheng Li
,
Hisashi Kawai
An Investigation of a Knowledge Distillation Method for CTC Acoustic Models.
ICASSP
(2018)
Xugang Lu
,
Peng Shen
,
Sheng Li
,
Yu Tsao
,
Hisashi Kawai
Temporal Attentive Pooling for Acoustic Event Detection.
INTERSPEECH
(2018)
Sheng Li
,
Xugang Lu
,
Ryoichi Takashima
,
Peng Shen
,
Tatsuya Kawahara
,
Hisashi Kawai
Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks.
INTERSPEECH
(2018)
Sheng Li
,
Xugang Lu
,
Ryoichi Takashima
,
Peng Shen
,
Tatsuya Kawahara
,
Hisashi Kawai
Improving Very Deep Time-Delay Neural Network With Vertical-Attention For Effectively Training CTC-Based ASR Systems.
SLT
(2018)
Peng Shen
,
Xugang Lu
,
Sheng Li
,
Hisashi Kawai
Conditional Generative Adversarial Nets Classifier for Spoken Language Identification.
INTERSPEECH
(2017)
Sheng Li
,
Xugang Lu
,
Shinsuke Sakai
,
Masato Mimura
,
Tatsuya Kawahara
Semi-supervised ensemble DNN acoustic model training.
ICASSP
(2017)
Sheng Li
,
Xugang Lu
,
Peng Shen
,
Ryoichi Takashima
,
Tatsuya Kawahara
,
Hisashi Kawai
Incremental training and constructing the very deep convolutional residual network acoustic models.
ASRU
(2017)
Sheng Li
,
Yuya Akita
,
Tatsuya Kawahara
Data selection from multiple ASR systems' hypotheses for unsupervised acoustic model training.
ICASSP
(2016)
Sheng Li
,
Xugang Lu
,
Shinsuke Mori
,
Yuya Akita
,
Tatsuya Kawahara
Confidence estimation for speech recognition systems using conditional random fields trained with partially annotated data.
ISCSLP
(2016)
Sheng Li
,
Yuya Akita
,
Tatsuya Kawahara
Semi-Supervised Acoustic Model Training by Discriminative Data Selection From Multiple ASR Systems' Hypotheses.
IEEE ACM Trans. Audio Speech Lang. Process.
24 (9) (2016)
Sheng Li
,
Yuya Akita
,
Tatsuya Kawahara
Automatic Lecture Transcription Based on Discriminative Data Selection for Lightly Supervised Acoustic Model Training.
IEICE Trans. Inf. Syst.
(8) (2015)
Sheng Li
,
Yuya Akita
,
Tatsuya Kawahara
Discriminative data selection for lightly supervised training of acoustic model using closed caption texts.
INTERSPEECH
(2015)
Sheng Li
,
Xugang Lu
,
Yuya Akita
,
Tatsuya Kawahara
Ensemble speaker modeling using speaker adaptive training deep neural network for speaker adaptation.
INTERSPEECH
(2015)
Sheng Li
,
Yuya Akita
,
Tatsuya Kawahara
Corpus and transcription system of Chinese Lecture Room.
ISCSLP
(2014)
Sheng Li
,
Lan Wang
Cross Linguistic Comparison of Mandarin and English EMA Articulatory Data.
INTERSPEECH
(2012)
Lan Wang
,
Hui Chen
,
Sheng Li
,
Helen M. Meng
Phoneme-level articulatory animation in pronunciation training.
Speech Commun.
54 (7) (2012)
Sheng Li
,
Lan Wang
,
En Qi
The Phoneme-Level Articulator Dynamics for Pronunciation Animation.
IALP
(2011)