Login / Signup
Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer.
Haoxu Wang
Ming Cheng
Qiang Fu
Ming Li
Published in:
CoRR (2024)
Keyphrases
</>
audio visual
cross modal
multi modal
visual data
visual information
multimedia
video sequences
digital libraries
image classification
visual features
contextual information
high dimensional data