Login / Signup
Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer.
Haoxu Wang
Ming Cheng
Qiang Fu
Ming Li
Published in:
ICASSP (2024)
Keyphrases
</>
audio visual
cross modal
multi modal
visual data
visual information
multimedia
multimedia databases
image annotation
information retrieval
document images
video frames
high level
image sequences
feature extraction
high dimensional
contextual information