LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models.

Yanwei Li Chengyao Wang Jiaya Jia

Published in: CoRR (2023)

Keyphrases

language model
language modeling
image data
image features
language modelling
probabilistic model
n gram
image content
speech recognition
low level
retrieval model
image representation
document retrieval
statistical language models
information retrieval
image regions
query expansion
statistical model
test collection
image classification
vector space model
image retrieval
co occurrence