Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline.
Tiantian GengTeng WangJinming DuanRunmin CongFeng ZhengPublished in: CoRR (2023)
Keyphrases
- audio visual
- sports video
- video summarization
- multi modal
- visual data
- audio features
- visual information
- event recognition
- video clips
- temporal context
- human activities
- multi stream
- multimedia
- event detection
- video sequences
- person authentication
- audio visual speech recognition
- video data
- surveillance videos
- image data
- domain knowledge
- high dimensional