Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos.
Teng WangJinrui ZhangFeng ZhengWenhao JiangRan ChengPing LuoPublished in: CoRR (2023)
Keyphrases
- learning algorithm
- supervised learning
- visual representation
- computer vision
- learning process
- spatio temporal
- semi supervised
- real time
- online learning
- high level
- reinforcement learning
- prior knowledge
- programming language
- knowledge acquisition
- image representation
- learning tasks
- learning analytics
- deeper understanding
- language acquisition