L-STAP: Learned Spatio-Temporal Adaptive Pooling for Video Captioning.
Danny FrancisBenoit HuetPublished in: AI4TV@MM (2019)
Keyphrases
- spatio temporal
- spatial and temporal
- space time
- spatial temporal
- video representation
- human actions
- computer simulation
- video streams
- video sequences
- video data
- video frames
- video content
- unsupervised manner
- spatio temporally
- video database
- real time video
- key frames
- real time
- action recognition
- scalable video
- video clips
- video analysis
- video processing
- temporal structure
- spatio temporal data
- temporal domain
- motion patterns
- dynamic textures
- digital video
- video surveillance
- image classification
- motion trajectories
- video images
- video indexing
- dynamic scenes
- multimedia data
- temporal segmentation
- moving objects
- image sequences
- temporal filtering
- video copy detection
- three dimensional