Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond.
Liang ChenYichi ZhangShuhuai RenHaozhe ZhaoZefan CaiYuchi WangPeiyi WangTianyu LiuBaobao ChangPublished in: CoRR (2023)
Keyphrases
- end to end
- multi modal
- language model
- language modeling
- n gram
- probabilistic model
- information retrieval
- computer vision
- multi modality
- speech recognition
- context sensitive
- document retrieval
- congestion control
- retrieval model
- smoothing methods
- high dimensional
- image annotation
- mixture model
- test collection
- uni modal
- query expansion
- image processing
- translation model
- audio visual
- ad hoc information retrieval
- relevance model
- dirichlet prior
- multimedia