Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models.
Navid RajabiJana KoseckaPublished in: CoRR (2023)
Keyphrases
- multi modal
- language model
- spatial reasoning
- spatial relations
- cross modal
- language modeling
- document retrieval
- n gram
- probabilistic model
- video search
- query expansion
- retrieval model
- information retrieval
- computer vision
- single modality
- multi modality
- test collection
- speech recognition
- temporal reasoning
- image annotation
- high dimensional
- smoothing methods
- mixture model
- visual information
- audio visual
- spatial information
- visual cues
- multiple modalities
- translation model
- image classification
- visual features
- relevance model
- text retrieval
- closely related
- multiscale