Sign in
Zexu Pan
ORCID
Publication Activity (10 Years)
Years Active: 2020-2023
Publications (10 Years): 29
Top Topics
Visual Cues
Speaker Diarization
Diffusion Models
Doa Estimation
Top Venues
CoRR
ICASSP
INTERSPEECH
IEEE Signal Process. Lett.
</>
Publications
</>
Junjie Li
,
Meng Ge
,
Zexu Pan
,
Rui Cao
,
Longbiao Wang
,
Jianwu Dang
,
Shiliang Zhang
Rethinking the visual cues in audio-visual speaker extraction.
CoRR
(2023)
Zexu Pan
,
Wupeng Wang
,
Marvin Borsdorf
,
Haizhou Li
ImagineNet: Target Speaker Extraction with Intermittent Visual Cue Through Embedding Inpainting.
ICASSP
(2023)
Yu Chen
,
Xinyuan Qian
,
Zexu Pan
,
Kainan Chen
,
Haizhou Li
LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism.
CoRR
(2023)
Zexu Pan
,
Gordon Wichern
,
François G. Germain
,
Sameer Khurana
,
Jonathan Le Roux
NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection.
CoRR
(2023)
Yidi Jiang
,
Ruijie Tao
,
Zexu Pan
,
Haizhou Li
Target Active Speaker Detection with Audio-visual Cues.
CoRR
(2023)
Junjie Li
,
Ruijie Tao
,
Zexu Pan
,
Meng Ge
,
Shuai Wang
,
Haizhou Li
Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech.
CoRR
(2023)
Dimitrios Bralios
,
Gordon Wichern
,
François G. Germain
,
Zexu Pan
,
Sameer Khurana
,
Chiori Hori
,
Jonathan Le Roux
Generation or Replication: Auscultating Audio Latent Diffusion Models.
CoRR
(2023)
Zexu Pan
,
Gordon Wichern
,
Yoshiki Masuyama
,
François G. Germain
,
Sameer Khurana
,
Chiori Hori
,
Jonathan Le Roux
Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction.
CoRR
(2023)
Tingting Wang
,
Zexu Pan
,
Meng Ge
,
Zhen Yang
,
Haizhou Li
Time-Domain Speech Separation Networks With Graph Encoding Auxiliary.
IEEE Signal Process. Lett.
30 (2023)
Zexu Pan
,
Gordon Wichern
,
Yoshiki Masuyama
,
François G. Germain
,
Sameer Khurana
,
Chiori Hori
,
Jonathan Le Roux
Scenario-Aware Audio-Visual TF-Gridnet for Target Speech Extraction.
ASRU
(2023)
Zexu Pan
,
Meng Ge
,
Haizhou Li
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction.
CoRR
(2022)
Zexu Pan
,
Wupeng Wang
,
Marvin Borsdorf
,
Haizhou Li
ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting.
CoRR
(2022)
Zexu Pan
,
Meng Ge
,
Haizhou Li
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction.
INTERSPEECH
(2022)
Zexu Pan
,
Meng Ge
,
Haizhou Li
USEV: Universal Speaker Extraction With Visual Cue.
IEEE ACM Trans. Audio Speech Lang. Process.
30 (2022)
Zexu Pan
,
Ruijie Tao
,
Chenglin Xu
,
Haizhou Li
Selective Listening by Synchronizing Speech With Lips.
IEEE ACM Trans. Audio Speech Lang. Process.
30 (2022)
Junjie Li
,
Meng Ge
,
Zexu Pan
,
Longbiao Wang
,
Jianwu Dang
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network.
INTERSPEECH
(2022)
Junjie Li
,
Meng Ge
,
Zexu Pan
,
Longbiao Wang
,
Jianwu Dang
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network.
CoRR
(2022)
Zexu Pan
,
Xinyuan Qian
,
Haizhou Li
Speaker Extraction With Co-Speech Gestures Cue.
IEEE Signal Process. Lett.
29 (2022)
Zexu Pan
,
Xinyuan Qian
,
Haizhou Li
Speaker Extraction with Co-Speech Gestures Cue.
CoRR
(2022)
Zexu Pan
,
Gordon Wichern
,
François G. Germain
,
Aswin Shanmugam Subramanian
,
Jonathan Le Roux
Towards End-to-end Speaker Diarization in the Wild.
CoRR
(2022)
Xinyuan Qian
,
Maulik C. Madhavi
,
Zexu Pan
,
Jiadong Wang
,
Haizhou Li
Multi-Target DoA Estimation with an Audio-Visual Fusion Mechanism.
ICASSP
(2021)
Zexu Pan
,
Ruijie Tao
,
Chenglin Xu
,
Haizhou Li
Muse: Multi-Modal Target Speaker Extraction with Visual Cues.
ICASSP
(2021)
Zexu Pan
,
Meng Ge
,
Haizhou Li
USEV: Universal Speaker Extraction with Visual Cue.
CoRR
(2021)
Ruijie Tao
,
Zexu Pan
,
Rohan Kumar Das
,
Xinyuan Qian
,
Mike Zheng Shou
,
Haizhou Li
Is Someone Speaking?: Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection.
ACM Multimedia
(2021)
Ruijie Tao
,
Zexu Pan
,
Rohan Kumar Das
,
Xinyuan Qian
,
Mike Zheng Shou
,
Haizhou Li
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection.
CoRR
(2021)
Xinyuan Qian
,
Maulik C. Madhavi
,
Zexu Pan
,
Jiadong Wang
,
Haizhou Li
Multi-target DoA Estimation with an Audio-visual Fusion Mechanism.
CoRR
(2021)
Zexu Pan
,
Zhaojie Luo
,
Jichen Yang
,
Haizhou Li
Multi-modal Attention for Speech Emotion Recognition.
CoRR
(2020)
Zexu Pan
,
Ruijie Tao
,
Chenglin Xu
,
Haizhou Li
Muse: Multi-modal target speaker extraction with visual cues.
CoRR
(2020)
Zexu Pan
,
Zhaojie Luo
,
Jichen Yang
,
Haizhou Li
Multi-Modal Attention for Speech Emotion Recognition.
INTERSPEECH
(2020)