OWL (Observe, Watch, Listen): Audiovisual Temporal Context for Localizing Actions in Egocentric Videos.
Merey RamazanovaVictor EscorciaFabian Caba HeilbronChen ZhaoBernard GhanemPublished in: CVPR Workshops (2023)
Keyphrases
- temporal context
- audio visual
- human actions
- human activities
- human behaviour
- activity recognition
- temporal information
- visual data
- spatial context
- spatio temporal
- video clips
- visual context
- visual information
- action recognition
- video sequences
- multi modal
- video content
- video frames
- spatial and temporal
- video retrieval
- image sequences
- pose estimation
- video data