Label-anticipated Event Disentanglement for Audio-Visual Video Parsing.
Jinxing ZhouDan GuoYuxin MaoYiran ZhongXiaojun ChangMeng WangPublished in: CoRR (2024)
Keyphrases
- audio visual
- video summarization
- visual data
- multimedia
- meeting room
- video scene
- multi modal
- audio visual content
- event detection
- audio features
- sports video
- visual information
- temporal context
- video data
- video content
- multi stream
- multimodal fusion
- video sequences
- video analysis
- video streams
- natural language processing
- person authentication
- audio visual speech recognition
- natural language
- multimedia data
- key frames
- video retrieval
- temporal information
- space time
- low level
- mobile devices