CM-PIE: Cross-modal perception for interactive-enhanced audio-visual video parsing.
Yaru ChenRuohao GuoXubo LiuPeipei WuGuangyao LiZhenbo LiWenwu WangPublished in: CoRR (2023)
Keyphrases
- audio visual
- cross modal
- visual data
- multi modal
- perceptual information
- visual information
- video data
- video sequences
- semantic concepts
- multimedia
- multimedia data
- high dimensional
- image sequences
- contextual information
- visual features
- high dimensional data
- natural language
- human motion
- object recognition
- low level
- video retrieval
- visual content
- image data
- human actions
- web pages
- high level
- text data
- key frames
- video frames
- data sets
- document collections