Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration.
Chenyang LyuMinghao WuLongyue WangXinting HuangBingshuai LiuZefeng DuShuming ShiZhaopeng TuPublished in: CoRR (2023)
Keyphrases
- multi modal
- language modeling
- language model
- audio video
- multiple modalities
- information retrieval
- uni modal
- video search
- image data
- image classification
- image content
- low level
- image retrieval
- image representation
- multi modality
- image segmentation
- audio visual
- query expansion
- image annotation
- retrieval model
- mean shift
- probabilistic model
- active learning
- high dimensional
- keywords
- single modality
- metadata