Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts.
Haodong HongSen WangZi HuangQi WuJiajun LiuPublished in: CoRR (2024)
Keyphrases
- multi modal
- language generation
- video search
- multiple modalities
- computer vision
- multi modality
- high dimensional
- natural language
- audio visual
- text mining
- cross modal
- text retrieval
- uni modal
- information retrieval
- text data
- semantic concepts
- image annotation
- text documents
- image processing
- mutual information
- co occurrence
- fusing multiple