Hierarchical Multimodal Attention Network Based on Semantically Textual Guidance for Video Captioning.
Caihua LiuXiaoyi MaXinyu HeTao XuPublished in: ICONIP (3) (2022)
Keyphrases
- multimedia
- video sequences
- video data
- natural language
- video content
- multi modal
- video frames
- video streams
- real time
- multimodal information
- digital video
- real time video
- video clips
- keywords
- story segmentation
- visual attention
- spatial and temporal
- multiple modalities
- online video
- video surveillance
- human activities
- visual data
- video database
- user generated
- multimedia databases
- video processing
- space time
- video retrieval
- human actions
- coarse to fine
- key frames
- event detection