Login / Signup

Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer.

Haoxu WangMing ChengQiang FuMing Li
Published in: CoRR (2024)
Keyphrases
  • audio visual
  • cross modal
  • multi modal
  • visual data
  • visual information
  • multimedia
  • video sequences
  • digital libraries
  • image classification
  • visual features
  • contextual information
  • high dimensional data