Enhancing Multimodal Alignment with Momentum Augmentation for Dense Video Captioning.

Yiwei Wei Shaozu Yuan Meng Chen Longbiao Wang

Published in: ICASSP (2023)

Keyphrases

video sequences
multimedia
video data
video clips
real time video
video content
multi modal
video streams
online video
video database
video segmentation
video frames
multimodal fusion
story segmentation
learning rate
real time
spatial and temporal
dynamic scenes
video analysis
video retrieval
key frames
image alignment
video images
spatio temporal
multimodal information
combining information from multiple
multimodal interaction
event detection
human computer interaction