Multi-Modal Few-Shot Temporal Action Detection via Vision-Language Meta-Adaptation.
Sauradip NagMengmeng XuXiatian ZhuJuan-Manuel Pérez-RúaBernard GhanemYi-Zhe SongTao XiangPublished in: CoRR (2022)
Keyphrases
- multi modal
- action detection
- spatial and temporal
- pattern search
- computer vision
- audio visual
- action recognition
- spatio temporal
- temporal information
- natural language
- image annotation
- temporal reasoning
- video shots
- atomic actions
- visual features
- temporal patterns
- low level
- action classification
- high dimensional
- medical images
- machine learning
- object detection
- key frames
- semantic concepts
- feature selection