Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization.
Yuanyuan JiangJianqin YinYonghao DangPublished in: CoRR (2022)
Keyphrases
- audio visual
- video event
- event detection
- temporal context
- sports video
- soccer video
- audio visual content
- multi modal
- video summarization
- visual data
- meeting room
- multimedia
- temporal information
- video scene
- video streams
- video content
- video analysis
- video sequences
- audio features
- audio visual speech recognition
- video retrieval
- semantic search
- data sets
- visual information
- video data
- image data
- domain knowledge
- natural language
- computer vision