Login / Signup
BVA-Transformer: Image-text multimodal classification and dialogue model architecture based on Blip and visual attention mechanism.
Kaiyu Zhang
Fei Wu
Guowei Zhang
Jiawei Liu
Min Li
Published in:
Displays (2024)
Keyphrases
</>
attention mechanism
similarity measure
image classification
image segmentation
multiscale
image features
multi modal
high level
keypoints
real time
information retrieval
low level
image representation
video data
image content