Video-Grounded Dialogues with Joint Video and Image Training.
Hangyu ZhangYingming LiZhongfei ZhangPublished in: ICIP (2022)
Keyphrases
- video images
- pre trained
- video data
- multimedia
- video sequences
- image data
- video frames
- image segmentation
- images and video sequences
- visual data
- video content
- temporal continuity
- image content
- image frames
- image classification
- visual cues
- video streams
- input image
- space time
- image retrieval
- video files
- multiscale
- static images
- high resolution
- image analysis
- video clips
- real time
- single image
- camera movement
- image features
- low level
- weakly labeled
- layered representation
- textual descriptions
- higher resolution
- video analysis
- dynamic scenes
- key frames
- segmentation algorithm
- multiresolution
- training set
- computer vision