Enhancing Multimodal Alignment with Momentum Augmentation for Dense Video Captioning.
Yiwei WeiShaozu YuanMeng ChenLongbiao WangPublished in: ICASSP (2023)
Keyphrases
- video sequences
- multimedia
- video data
- video clips
- real time video
- video content
- multi modal
- video streams
- online video
- video database
- video segmentation
- video frames
- multimodal fusion
- story segmentation
- learning rate
- real time
- spatial and temporal
- dynamic scenes
- video analysis
- video retrieval
- key frames
- image alignment
- video images
- spatio temporal
- multimodal information
- combining information from multiple
- multimodal interaction
- event detection
- human computer interaction