LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models.
Yanwei LiChengyao WangJiaya JiaPublished in: CoRR (2023)
Keyphrases
- language model
- language modeling
- image data
- image features
- language modelling
- probabilistic model
- n gram
- image content
- speech recognition
- low level
- retrieval model
- image representation
- document retrieval
- statistical language models
- information retrieval
- image regions
- query expansion
- statistical model
- test collection
- image classification
- vector space model
- image retrieval
- co occurrence