CapFormer: A Space-Time Video Description Model using Joint-Attention Transformer.

Mahamat Moussa Chern Hong Lim KokSheik Wong

Published in: APSIPA ASC (2023)

Keyphrases

space time
video sequences
spatio temporal
spatial and temporal
video analysis
temporal domain
visual features
motion model
dynamic scenes
video annotation
video representation
input video