Fusion of Audio and Visual Embeddings for Sound Event Localization and Detection.
Davide BerghiPeipei WuJinzheng ZhaoWenwu WangPhilip J. B. JacksonPublished in: CoRR (2023)
Keyphrases
- soccer video
- visual information
- event detection
- thermal images
- visual features
- activity detection
- cross modal
- video analysis
- detection algorithm
- data fusion
- detection method
- false positives
- multimedia
- multi modal
- visual data
- reliable detection
- multi modal fusion
- news articles
- audio signal
- single modality
- visual analysis
- signal processing
- low level
- binary codes
- object detection
- high dimensional