Multimodal-enhanced hierarchical attention network for video captioning.
Maosheng ZhongYoude ChenHao ZhangHao XiongZhixiang WangPublished in: Multim. Syst. (2023)
Keyphrases
- multimedia
- network structure
- video data
- video delivery
- network conditions
- video analysis
- multi modal
- video sequences
- video frames
- online video
- computer networks
- wireless networks
- real time
- video processing
- video content
- peer to peer
- wireless sensor networks
- video database
- video images
- scalable video
- real time video
- visual attention
- spatial and temporal