It Takes Two: Masked Appearance-Motion Modeling for Self-supervised Video Transformer Pre-training.
Yuxin SongMin YangWenhao WuDongliang HeFu LiJingdong WangPublished in: CoRR (2022)
Keyphrases
- object motion
- static images
- space time
- dynamic textures
- video sequences
- appearance features
- temporal filtering
- image sequences
- moving objects
- video data
- motion features
- spatial and temporal
- visual cues
- motion analysis
- motion cues
- layered representation
- key frames
- optical flow
- video frames
- motion model
- motion segmentation
- single frame
- multimedia
- surveillance videos
- shot change detection
- visual data
- motion estimation
- camera motion
- video content
- moving camera
- video analysis
- motion parameters
- fuzzy logic
- low frame rate
- temporal continuity
- fault diagnosis
- input video
- successive frames
- video signals
- dynamic scenes
- visual input
- target object
- temporal coherence
- human motion
- video summarization
- motion trajectories