Efficient Video Representation Learning via Masked Video Modeling with Motion-centric Token Selection.
Sunil HwangJaehong YoonYoungwan LeeSung Ju HwangPublished in: CoRR (2022)
Keyphrases
- video representation
- space time
- spatio temporal
- dynamic textures
- video content
- video streams
- video analysis
- key frames
- video frames
- video database
- video data
- video sequences
- video summarization
- spatial and temporal
- generative model
- prior knowledge
- moving objects
- image sequences
- motion analysis
- motion patterns
- motion estimation
- moving camera
- optical flow
- video synopsis