Login / Signup
AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization.
Tanvir Mahmud
Diana Marculescu
Published in:
CoRR (2022)
Keyphrases
</>
audio visual
temporal context
multi modal
visual information
visual data
multimedia
spatial and temporal
multi stream
spatio temporal
temporal information
person authentication
passage retrieval
audio visual speech recognition
event detection
space time
video summarization