OWL (Observe, Watch, Listen): Localizing Actions in Egocentric Video via Audiovisual Temporal Context.
Merey RamazanovaVictor EscorciaFabian Caba HeilbronChen ZhaoBernard GhanemPublished in: CoRR (2022)
Keyphrases
- temporal context
- audio visual
- temporal information
- human behaviour
- human actions
- spatial context
- visual context
- activity recognition
- human activities
- spatio temporal
- video sequences
- multimedia
- video retrieval
- video content
- visual data
- action recognition
- visual information
- video frames
- video data
- multi modal
- spatial and temporal
- multimedia data
- key frames
- contextual information
- state space