CASP-Net: Rethinking Video Saliency Prediction from an Audio-Visual Consistency Perceptual Perspective.
Junwen XiongGanglai WangPeng ZhangWei HuangYufei ZhaGuangtao ZhaiPublished in: CVPR (2023)
Keyphrases
- audio visual
- video summarization
- visual data
- meeting room
- multimedia
- multi modal
- audio features
- audio visual content
- visual saliency
- visual information
- temporal context
- video data
- multimodal fusion
- multi stream
- video content
- video sequences
- video streams
- audio visual speech recognition
- visual attention
- human visual system
- video frames
- space time
- multimedia data
- visual features
- low level
- spatio temporal
- video retrieval
- saliency map
- spatial and temporal
- contextual information
- high dimensional
- object recognition