CMGNet: Collaborative multi-modal graph network for video captioning.

Qi Rao Xin Yu Guang Li Linchao Zhu

Published in: Comput. Vis. Image Underst. (2024)

Keyphrases

multi modal
semantic concepts
video search
multi modality
video streams
video data
cross modal
multiple modalities
high dimensional
video content
video shots
audio visual
complex networks
video frames
image annotation
video sequences
spatial and temporal
information theoretic
higher level
fusing multiple