Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding.
Peng JinRyuichi TakanobuCaiwan ZhangXiaochun CaoLi YuanPublished in: CoRR (2023)
Keyphrases
- language model
- visual representation
- image data
- language modeling
- image classification
- n gram
- image features
- retrieval model
- image content
- probabilistic model
- speech recognition
- low level
- image retrieval
- language modelling
- document retrieval
- key frames
- statistical language models
- video sequences
- image representation
- query expansion
- smoothing methods
- multimedia
- video data
- test collection
- web search
- video content
- context sensitive
- user interface
- visual data
- query specific
- information retrieval