GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models.
Haicheng LiaoHuanming ShenZhenning LiChengyue WangGuofa LiYiming BieChengZhong XuPublished in: CoRR (2023)
Keyphrases
- cross modal
- language model
- autonomous driving
- multi modal
- language modeling
- grand challenge
- n gram
- probabilistic model
- multimedia retrieval
- document retrieval
- information retrieval
- retrieval model
- query expansion
- stereo vision
- image retrieval
- visual recognition
- test collection
- multimedia databases
- visual data
- relevance model
- text retrieval
- visual similarity
- low level
- feature space
- vision algorithms
- keywords
- similarity measure