Textual Tokens Classification for Multi-Modal Alignment in Vision-Language Tracking.
Zhongjie MaoYucheng WangXi ChenJia YanPublished in: ICASSP (2024)
Keyphrases
- multi modal
- cross modal
- natural language
- computer vision
- multi modality
- feature vectors
- classification accuracy
- audio visual
- feature space
- high dimensional
- image classification
- single modality
- image annotation
- machine learning
- semantic concepts
- keywords
- high level
- text classification
- particle filter
- humanoid robot
- medical images
- mutual information
- uni modal