Sign in

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs.

Shengbang TongZhuang LiuYuexiang ZhaiYi MaYann LeCunSaining Xie
Published in: CoRR (2024)
Keyphrases
  • visual features
  • cross modal
  • multi modal
  • visual information
  • neural network
  • visual perception
  • multimodal information
  • narrow field of view
  • low level
  • wide range
  • feature space
  • feature vectors
  • multimodal data