Enhancing Ocean Scene Video Captioning with Multimodal Pre-Training and Video-Swin-Transformer.
Xinyu ChenMeng ZhaoFan ShiMeng'en ZhangYu HeShengyong ChenPublished in: IECON (2023)
Keyphrases
- video sequences
- multimedia
- dynamic scenes
- video data
- video images
- space time
- multi modal
- video scene
- video database
- video content
- three dimensional
- multiple video streams
- moving camera
- video clips
- video frames
- multimedia data
- human activities
- spatial and temporal
- video streams
- surveillance videos
- training examples
- motion features
- high resolution
- input video
- video footage
- moving objects