DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention.
Zhewei YaoXiaoxia WuConglong LiMinjia ZhangHeyang QinOlatunji RuwaseAmmar Ahmad AwanSamyam RajbhandariYuxiong HePublished in: CoRR (2023)
Keyphrases
- multi modal
- uni modal
- image segmentation
- image data
- image analysis
- multi modality
- image collections
- input image
- image annotation
- auto annotation
- single modality
- cross modal
- image representation
- image classification
- image features
- image retrieval
- high dimensional
- multiscale
- edge detection
- fusing multiple
- markov random field
- segmentation method
- image content
- audio visual
- semantic concepts
- video search
- similarity measure
- multimedia