STVGBert: A Visual-linguistic Transformer based Framework for Spatio-temporal Video Grounding.

Rui Su Qian Yu Dong Xu

Published in: ICCV (2021)

Keyphrases

spatio temporal
video sequences
spatial temporal
video classification
temporal context
video content
space time
neural network
spatial and temporal
visual information
spatio temporally
visual cues
video database
multimedia data
video streams
visual features
fuzzy logic
probabilistic model
low level
moving objects
image sequences
multimedia