BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions.

Wenbo Hu Yifan Xu Yi Li Weiyue Li Zeyuan Chen Zhuowen Tu

Published in: CoRR (2023)

Keyphrases

high level
open domain
information retrieval
text mining
multimedia
database
semantic content
visual information
web images
medical images
text information
cross modal
visual appearance
key concepts
co occurrence
low level
neural network
data sets