Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models.
Hao ShaoShengju QianHan XiaoGuanglu SongZhuofan ZongLetian WangYu LiuHongsheng LiPublished in: CoRR (2024)
Keyphrases
- multi modal
- language model
- cross modal
- language modeling
- n gram
- probabilistic model
- speech recognition
- video search
- query expansion
- document retrieval
- retrieval model
- statistical language models
- language modelling
- multi modality
- single modality
- information retrieval
- auto annotation
- visual information
- relevance model
- audio visual
- pseudo relevance feedback
- test collection
- high dimensional
- translation model
- smoothing methods
- language models for information retrieval
- semantic concepts
- image annotation
- uni modal
- high level