Sketch, Ground, and Refine: Top-Down Dense Video Captioning.

Chaorui Deng Shizhe Chen Da Chen Yuan He Qi Wu

Published in: CVPR (2021)

Keyphrases

video data
video content
video sequences
real time
video frames
multimedia
video surveillance
computational complexity
high level
video streams
video database
key frames
real time video
video processing
spatial and temporal
temporal information
video analysis
event recognition
video segmentation
online video
event detection
generative model
feature vectors
object recognition
multiscale
data sets