VicTR: Video-conditioned Text Representations for Activity Recognition.
Kumara KahatapitiyaAnurag ArnabArsha NagraniMichael S. RyooPublished in: CoRR (2023)
Keyphrases
- activity recognition
- human activities
- motion features
- event recognition
- event detection
- human actions
- body motions
- human activity recognition
- action recognition
- visual surveillance
- smart home
- video sequences
- video data
- video content
- video frames
- wearable sensors
- smart environments
- activity analysis
- sensory data
- video database
- multimedia
- video analysis
- text mining
- accelerometer data
- daily activities
- space time
- video surveillance
- video shots
- dynamic scenes
- data mining
- key frames
- transfer learning
- computer vision
- information retrieval