CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor.
Shuyang SunRunjia LiPhilip H. S. TorrXiuye GuSiyang LiPublished in: CoRR (2023)
Keyphrases
- visual concepts
- learning tasks
- nearest neighbor
- image collections
- supervised learning
- image content
- visual content
- training set
- training data
- semantic gap
- positive examples
- semantic concepts
- object categories
- image annotation
- video content
- training examples
- training samples
- visual information
- video clips
- visual features
- visual data
- multi modal
- semi supervised
- video sequences
- similarity measure
- video segments