Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models.
Mingwei ZhuLeigang ShaYu ShuKangjia ZhaoTiancheng ZhaoJianwei YinPublished in: CoRR (2023)
Keyphrases
- language model
- visual input
- language modeling
- n gram
- probabilistic model
- query expansion
- vision system
- information retrieval
- visual information
- multi modal
- test collection
- visual attention
- visual perception
- visual field
- relevance model
- smoothing methods
- visual data
- feature selection
- computer vision
- ego motion
- low level