AV-MaskEnhancer: Enhancing Video Representations through Audio-Visual Masked Autoencoder.
Xingjian DiaoMing ChengShitong ChengPublished in: ICTAI (2023)
Keyphrases
- audio visual
- video summarization
- visual data
- multimedia
- meeting room
- multi modal
- audio features
- audio visual content
- sports video
- visual information
- video data
- temporal context
- multi stream
- multimodal fusion
- person authentication
- video sequences
- image database
- data sets
- audio visual speech recognition
- high dimensional
- video content
- key frames
- video streams
- surveillance videos
- visual features
- video retrieval
- event detection
- multimedia data