Learning Video-Text Aligned Representations for Video Captioning.
Yaya ShiHaiyang XuChunfeng YuanBing LiWeiming HuZheng-Jun ZhaPublished in: ACM Trans. Multim. Comput. Commun. Appl. (2023)
Keyphrases
- video data
- video content
- video sequences
- video streams
- interactive video
- multimedia
- learning process
- real time
- prior knowledge
- learning algorithm
- video frames
- reinforcement learning
- video shots
- space time
- automatically discovering
- video analysis
- video clips
- multimedia documents
- learning tasks
- learning systems
- active learning
- metadata
- digital video
- neural network
- multimedia search
- machine learning
- semantic representations
- text detection
- multiple representations
- video segments
- information retrieval
- spatial and temporal
- video database
- supervised learning
- video retrieval
- image classification
- human activities