Login / Signup

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions.

Wenbo HuYifan XuYi LiWeiyue LiZeyuan ChenZhuowen Tu
Published in: AAAI (2024)
Keyphrases
  • high level
  • web images
  • information retrieval
  • multi modal
  • visual information
  • text retrieval
  • string matching
  • open domain
  • low level
  • visual features
  • free text
  • news video
  • reading comprehension