CM-PIE: Cross-Modal Perception for Interactive-Enhanced Audio-Visual Video Parsing.
Yaru ChenRuohao GuoXubo LiuPeipei WuGuangyao LiZhenbo LiWenwu WangPublished in: ICASSP (2024)
Keyphrases
- audio visual
- cross modal
- visual data
- multi modal
- perceptual information
- video data
- visual information
- semantic concepts
- multimedia
- video sequences
- multimedia data
- image data
- contextual information
- visual features
- natural language
- human actions
- high dimensional
- video frames
- video analysis
- natural language processing
- video streams
- human motion
- multimedia retrieval
- video content
- visual concepts
- computer vision
- image sequences
- visual content
- data sets
- object recognition
- spatio temporal