Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information.
Jialu LiHao TanMohit BansalPublished in: CoRR (2021)
Keyphrases
- cross modal
- syntactic information
- multi modal
- question answering
- semantic information
- multimedia retrieval
- part of speech
- semantic role labeling
- natural language
- image retrieval
- computer vision
- visual recognition
- multimedia databases
- visual similarity
- machine learning
- visual data
- parse tree
- text documents
- knowledge base