Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline.
Tiantian GengTeng WangJinming DuanRunmin CongFeng ZhengPublished in: CVPR (2023)
Keyphrases
- audio visual
- sports video
- video summarization
- multi modal
- visual data
- event recognition
- audio features
- temporal context
- visual information
- human activities
- video clips
- event detection
- person authentication
- multimedia
- video analysis
- multimodal fusion
- surveillance videos
- multi stream
- audio visual speech recognition
- visual content
- video content
- image data
- video sequences