CLIP Meets Video Captioners: Attribute-Aware Representation Learning Promotes Accurate Captioning.

Bang Yang Yuexian Zou

Published in: CoRR (2021)

Keyphrases

supervised learning
learning systems
learning process
real time
learning algorithm
motion estimation
video sequences
prior knowledge
dynamic bayesian networks
high accuracy
active learning
learning environment
multiscale
high quality
training data
learning objects
knowledge acquisition
computationally efficient
neural network
video surveillance
data sets