Login / Signup
Zexu Pan
ORCID
Publication Activity (10 Years)
Years Active: 2020-2024
Publications (10 Years): 41
Top Topics
Speech Recognition
Speaker Diarization
Diffusion Models
Doa Estimation
Top Venues
CoRR
ICASSP
INTERSPEECH
IEEE Signal Process. Lett.
</>
Publications
</>
Yoshiki Masuyama
,
Gordon Wichern
,
François G. Germain
,
Zexu Pan
,
Sameer Khurana
,
Chiori Hori
,
Jonathan Le Roux
NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization.
CoRR
(2024)
Dimitrios Bralios
,
Gordon Wichern
,
François G. Germain
,
Zexu Pan
,
Sameer Khurana
,
Chiori Hori
,
Jonathan Le Roux
Generation or Replication: Auscultating Audio Latent Diffusion Models.
ICASSP
(2024)
Xinyuan Qian
,
Zexu Pan
,
Qiquan Zhang
,
Kainan Chen
,
Shoufeng Lin
GLMB 3D Speaker Tracking with Video-Assisted Multi-Channel Audio Optimization Functions.
ICASSP
(2024)
Shuo Zhang
,
Zexu Pan
,
Yichang Lv
,
Youfang Lin
Hierarchical Edge Refinement Network for Guided Depth Map Super-Resolution.
IEEE Trans. Computational Imaging
10 (2024)
Jiadong Wang
,
Zexu Pan
,
Malu Zhang
,
Robby T. Tan
,
Haizhou Li
Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition.
AAAI
(2024)
Zexu Pan
,
Gordon Wichern
,
François G. Germain
,
Sameer Khurana
,
Jonathan Le Roux
NeuroHeed+: Improving Neuro-Steered Speaker Extraction with Joint Auditory Attention Detection.
ICASSP
(2024)
Yu Chen
,
Xinyuan Qian
,
Zexu Pan
,
Kainan Chen
,
Haizhou Li
LOCSELECT: Target Speaker Localization with an Auditory Selective Hearing Mechanism.
ICASSP
(2024)
Junjie Li
,
Ruijie Tao
,
Zexu Pan
,
Meng Ge
,
Shuai Wang
,
Haizhou Li
Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-Talker Speech.
ICASSP
(2024)
Yoshiki Masuyama
,
Gordon Wichern
,
François G. Germain
,
Zexu Pan
,
Sameer Khurana
,
Chiori Hori
,
Jonathan Le Roux
NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization.
ICASSP
(2024)
Yidi Jiang
,
Ruijie Tao
,
Zexu Pan
,
Haizhou Li
Target Active Speaker Detection with Audio-visual Cues.
INTERSPEECH
(2023)
Junjie Li
,
Meng Ge
,
Zexu Pan
,
Rui Cao
,
Longbiao Wang
,
Jianwu Dang
,
Shiliang Zhang
Rethinking the visual cues in audio-visual speaker extraction.
CoRR
(2023)
Zexu Pan
,
Wupeng Wang
,
Marvin Borsdorf
,
Haizhou Li
ImagineNet: Target Speaker Extraction with Intermittent Visual Cue Through Embedding Inpainting.
ICASSP
(2023)
Yu Chen
,
Xinyuan Qian
,
Zexu Pan
,
Kainan Chen
,
Haizhou Li
LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism.
CoRR
(2023)
Ke Zhang
,
Marvin Borsdorf
,
Zexu Pan
,
Haizhou Li
,
Yangjie Wei
,
Yi Wang
Speaker Extraction with Detection of Presence and Absence of Target Speakers.
INTERSPEECH
(2023)
Zexu Pan
,
Gordon Wichern
,
François G. Germain
,
Sameer Khurana
,
Jonathan Le Roux
NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection.
CoRR
(2023)
Yidi Jiang
,
Ruijie Tao
,
Zexu Pan
,
Haizhou Li
Target Active Speaker Detection with Audio-visual Cues.
CoRR
(2023)
Junjie Li
,
Ruijie Tao
,
Zexu Pan
,
Meng Ge
,
Shuai Wang
,
Haizhou Li
Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech.
CoRR
(2023)
Dimitrios Bralios
,
Gordon Wichern
,
François G. Germain
,
Zexu Pan
,
Sameer Khurana
,
Chiori Hori
,
Jonathan Le Roux
Generation or Replication: Auscultating Audio Latent Diffusion Models.
CoRR
(2023)
Zexu Pan
,
Gordon Wichern
,
Yoshiki Masuyama
,
François G. Germain
,
Sameer Khurana
,
Chiori Hori
,
Jonathan Le Roux
Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction.
CoRR
(2023)
Tingting Wang
,
Zexu Pan
,
Meng Ge
,
Zhen Yang
,
Haizhou Li
Time-Domain Speech Separation Networks With Graph Encoding Auxiliary.
IEEE Signal Process. Lett.
30 (2023)
Junjie Li
,
Meng Ge
,
Zexu Pan
,
Rui Cao
,
Longbiao Wang
,
Jianwu Dang
,
Shiliang Zhang
Rethinking the Visual Cues in Audio-Visual Speaker Extraction.
INTERSPEECH
(2023)
Zexu Pan
,
Gordon Wichern
,
Yoshiki Masuyama
,
François G. Germain
,
Sameer Khurana
,
Chiori Hori
,
Jonathan Le Roux
Scenario-Aware Audio-Visual TF-Gridnet for Target Speech Extraction.
ASRU
(2023)
Zexu Pan
,
Meng Ge
,
Haizhou Li
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction.
CoRR
(2022)
Zexu Pan
,
Wupeng Wang
,
Marvin Borsdorf
,
Haizhou Li
ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting.
CoRR
(2022)
Zexu Pan
,
Meng Ge
,
Haizhou Li
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction.
INTERSPEECH
(2022)
Zexu Pan
,
Meng Ge
,
Haizhou Li
USEV: Universal Speaker Extraction With Visual Cue.
IEEE ACM Trans. Audio Speech Lang. Process.
30 (2022)
Zexu Pan
,
Ruijie Tao
,
Chenglin Xu
,
Haizhou Li
Selective Listening by Synchronizing Speech With Lips.
IEEE ACM Trans. Audio Speech Lang. Process.
30 (2022)
Junjie Li
,
Meng Ge
,
Zexu Pan
,
Longbiao Wang
,
Jianwu Dang
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network.
INTERSPEECH
(2022)
Junjie Li
,
Meng Ge
,
Zexu Pan
,
Longbiao Wang
,
Jianwu Dang
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network.
CoRR
(2022)
Zexu Pan
,
Xinyuan Qian
,
Haizhou Li
Speaker Extraction With Co-Speech Gestures Cue.
IEEE Signal Process. Lett.
29 (2022)
Zexu Pan
,
Xinyuan Qian
,
Haizhou Li
Speaker Extraction with Co-Speech Gestures Cue.
CoRR
(2022)
Zexu Pan
,
Gordon Wichern
,
François G. Germain
,
Aswin Shanmugam Subramanian
,
Jonathan Le Roux
Towards End-to-end Speaker Diarization in the Wild.
CoRR
(2022)
Xinyuan Qian
,
Maulik C. Madhavi
,
Zexu Pan
,
Jiadong Wang
,
Haizhou Li
Multi-Target DoA Estimation with an Audio-Visual Fusion Mechanism.
ICASSP
(2021)
Zexu Pan
,
Ruijie Tao
,
Chenglin Xu
,
Haizhou Li
Muse: Multi-Modal Target Speaker Extraction with Visual Cues.
ICASSP
(2021)
Zexu Pan
,
Meng Ge
,
Haizhou Li
USEV: Universal Speaker Extraction with Visual Cue.
CoRR
(2021)
Ruijie Tao
,
Zexu Pan
,
Rohan Kumar Das
,
Xinyuan Qian
,
Mike Zheng Shou
,
Haizhou Li
Is Someone Speaking?: Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection.
ACM Multimedia
(2021)
Ruijie Tao
,
Zexu Pan
,
Rohan Kumar Das
,
Xinyuan Qian
,
Mike Zheng Shou
,
Haizhou Li
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection.
CoRR
(2021)
Xinyuan Qian
,
Maulik C. Madhavi
,
Zexu Pan
,
Jiadong Wang
,
Haizhou Li
Multi-target DoA Estimation with an Audio-visual Fusion Mechanism.
CoRR
(2021)
Zexu Pan
,
Zhaojie Luo
,
Jichen Yang
,
Haizhou Li
Multi-modal Attention for Speech Emotion Recognition.
CoRR
(2020)
Zexu Pan
,
Ruijie Tao
,
Chenglin Xu
,
Haizhou Li
Muse: Multi-modal target speaker extraction with visual cues.
CoRR
(2020)
Zexu Pan
,
Zhaojie Luo
,
Jichen Yang
,
Haizhou Li
Multi-Modal Attention for Speech Emotion Recognition.
INTERSPEECH
(2020)