Global2Local: A Joint-Hierarchical Attention for Video Captioning.
Chengpeng DaiFuhai ChenXiaoshuai SunRongrong JiQixiang YeYongjian WuPublished in: CoRR (2022)
Keyphrases
- real time
- global image statistics
- video streams
- video sequences
- multimedia
- video data
- video clips
- video database
- real time video
- video surveillance
- video retrieval
- digital video
- human activities
- multimedia data
- visual saliency
- global information
- multi view
- space time
- body motions
- video indexing
- event recognition
- focus of attention
- video shots
- video content
- visual attention
- temporal information
- video frames
- hierarchical structure