Visual Cropping Improves Zero-Shot Question Answering of Multimodal Large Language Models.
Jiarui ZhangMahyar KhayatkhoeiPrateek ChhikaraFilip IlievskiPublished in: CoRR (2023)
Keyphrases
- question answering
- language model
- passage retrieval
- information retrieval
- language modeling
- document retrieval
- n gram
- retrieval model
- question answering systems
- speech recognition
- cross language
- test collection
- information extraction
- probabilistic model
- question classification
- audio visual
- sentence retrieval
- natural language processing
- visual information
- query expansion
- natural language
- named entities
- multi modal
- vector space model
- translation model
- visual features
- relevance model
- query terms
- pseudo relevance feedback
- low level