SAM: Modeling Scene, Object and Action With Semantics Attention Modules for Video Recognition.
Xing ZhangZuxuan WuYu-Gang JiangPublished in: IEEE Trans. Multim. (2022)
Keyphrases
- object models
- human actions
- video sequences
- human object interactions
- atomic actions
- action recognition
- video scene
- object detection and tracking
- moving objects
- object motion
- human activities
- dynamic scenes
- multiple objects
- stationary camera
- static images
- object hypotheses
- occluded objects
- visual data
- objects in cluttered scenes
- video images
- image sequences
- partial occlusion
- video data
- focus of attention
- object model
- target object
- complex scenes
- range finder
- ground plane
- combining information from multiple
- object recognition
- d objects
- d scene
- object tracking
- space time
- single image
- images depicting
- visual input
- video streams
- human motion
- image frames
- object representation
- camera movement
- motion features
- motion history images
- video analysis
- scene interpretation
- range data
- moving camera
- action classification
- real world objects
- video content
- three dimensional objects
- object classes
- visual features
- multiple images
- scene understanding
- video shots
- three dimensional
- computer vision