CLIP Meets Video Captioners: Attribute-Aware Representation Learning Promotes Accurate Captioning.
Bang YangYuexian ZouPublished in: CoRR (2021)
Keyphrases
- supervised learning
- learning systems
- learning process
- real time
- learning algorithm
- motion estimation
- video sequences
- prior knowledge
- dynamic bayesian networks
- high accuracy
- active learning
- learning environment
- multiscale
- high quality
- training data
- learning objects
- knowledge acquisition
- computationally efficient
- neural network
- video surveillance
- data sets