VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation.
Jialu LiAishwarya PadmakumarGaurav S. SukhatmeMohit BansalPublished in: CoRR (2024)
Keyphrases
- video frames
- video content
- video sequences
- video data
- video database
- video analysis
- video clips
- video editing
- key frames
- real time
- video surveillance
- youtube videos
- video event
- video images
- temporal coherence
- video dataset
- video indexing
- input video
- video streams
- outdoor environments
- spatiotemporal features
- natural language descriptions
- online video
- high definition
- semantic concept detection
- computer vision
- video representation
- dynamic scenes
- event recognition
- content based copy detection
- programming language
- surveillance system
- video search
- video material
- moving camera
- motion features
- human actions
- video browsing
- successive frames
- sports video
- video annotation
- instructional videos
- vision system
- video sharing
- visual analysis
- spatial and temporal
- lecture videos
- video shots
- natural language
- video segments
- video retrieval
- dynamic textures
- video scene
- video classification
- video processing
- human activities
- video copy detection
- foreground background segmentation
- news video
- tv series
- spatio temporal
- surveillance videos
- stationary camera
- space time
- stereoscopic video
- action recognition
- web videos
- temporal domain
- indoor environments
- low frame rate
- video objects
- closed captions
- camera motion
- video summarization