VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning.
Kashu YamazakiSang TruongKhoa VoMichael KiddChase RainwaterKhoa LuuNgan LePublished in: CoRR (2022)
Keyphrases
- real time
- learning algorithm
- language acquisition
- programming language
- learning process
- learning problems
- video content
- computer vision
- video sequences
- decision trees
- prior knowledge
- active learning
- object oriented
- image processing
- unsupervised learning
- background knowledge
- mobile learning
- multimedia data
- inductive inference
- positive examples
- hybrid learning