Login / Signup
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions.
Wenbo Hu
Yifan Xu
Yi Li
Weiyue Li
Zeyuan Chen
Zhuowen Tu
Published in:
AAAI (2024)
Keyphrases
</>
high level
web images
information retrieval
multi modal
visual information
text retrieval
string matching
open domain
low level
visual features
free text
news video
reading comprehension