Deconfounded Multimodal Learning for Spatio-temporal Video Grounding.
Jiawei WangZhanchang MaDa CaoYuquan LeJunbin XiaoTat-Seng ChuaPublished in: ACM Multimedia (2023)
Keyphrases
- spatio temporal
- learning algorithm
- learning process
- learning systems
- spatial and temporal
- reinforcement learning
- learning community
- learning problems
- video data
- supervised learning
- prior knowledge
- video sequences
- visual features
- multimedia
- temporal information
- learning tasks
- computer vision
- video content
- neural network