Login / Signup

PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network.

Qinghua LiuMeng GeZhizheng WuHaizhou Li
Published in: INTERSPEECH (2023)
Keyphrases
  • audio visual
  • pose invariant
  • multi modal
  • visual information
  • multimedia
  • visual data
  • emotion recognition
  • data sets
  • spatio temporal
  • video sequences
  • image data
  • facial expression recognition