VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions.
Yuxuan WangZilong ZhengXueliang ZhaoJinpeng LiYueqian WangDongyan ZhaoPublished in: ACL (1) (2023)
Keyphrases
- video sequences
- human actions
- dynamic scenes
- natural language
- video images
- video data
- semantic video retrieval
- video scene
- semantic concepts
- visual data
- image frames
- scene change detection
- video frames
- semantic context
- video event
- three dimensional
- surveillance videos
- image sequences
- input video
- stationary camera
- video content
- video streams
- d scene
- image mosaics
- moving camera
- object detectors
- live video
- space time
- scene analysis
- dynamic textures
- high level
- single image
- action recognition
- photo collections
- semantic information
- outdoor images
- scene categories
- sports video
- input image
- multimedia
- single frame
- motion features
- topic models
- video database
- video analysis
- video clips
- cognitive systems
- video annotation
- multimedia data
- scene categorization
- distributed cognition
- moving objects
- dialogue system
- video surveillance
- key frames