Login / Signup
AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization.
Tanvir Mahmud
Diana Marculescu
Published in:
WACV (2023)
Keyphrases
</>
audio visual
temporal context
multi modal
visual information
person authentication
multi stream
multimedia
visual data
audio visual speech recognition
video summarization
event detection
spatio temporal
spatial and temporal
temporal information
e learning
passage retrieval
image data