Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation.
Wenliang DaiLu HouLifeng ShangXin JiangQun LiuPascale FungPublished in: ACL (Findings) (2022)
Keyphrases
- image processing
- computer vision
- multi modal
- domain knowledge
- data mining techniques
- neural network
- knowledge extraction
- language learning
- background knowledge
- learning systems
- vision system
- formal languages
- real time
- expert systems
- video sequences
- knowledge base
- programming language
- knowledge representation
- prior knowledge
- natural language
- multimedia
- knowledge sources
- video clips
- generation process
- representation language
- specification language
- data mining