VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation.
Jialu LiAishwarya PadmakumarGaurav S. SukhatmeMohit BansalPublished in: AAAI (2024)
Keyphrases
- video content
- video sequences
- video frames
- video data
- video database
- video clips
- video analysis
- key frames
- online video
- input video
- temporal coherence
- natural language descriptions
- real time
- video editing
- youtube videos
- outdoor environments
- video surveillance
- event recognition
- video representation
- content based copy detection
- video dataset
- video event
- video images
- natural language
- video search
- video retrieval
- dynamic scenes
- human activities
- high definition
- video indexing
- video streams
- computer vision
- spatiotemporal features
- space time
- video annotation
- video segments
- vision system
- moving camera
- video sharing
- surveillance system
- successive frames
- video material
- video shots
- spatial and temporal
- semantic concept detection
- instructional videos
- video copy detection
- web videos
- traffic scenes
- video classification
- human actions
- temporal domain
- video objects
- video scene
- motion features
- action recognition
- low frame rate
- sports video
- stationary camera
- video summarization
- spatio temporal
- video browsing
- programming language
- dynamic textures
- video signals
- video collections
- visual features
- event detection
- image sequences
- lecture videos
- semantic concepts
- textual descriptions
- multimedia
- indoor environments
- user generated
- foreground background segmentation
- news video
- background subtraction
- action classification
- camera motion