Cross-modal Semantic Alignment Pre-training for Vision-and-Language Navigation.
Siying WuXueyang FuFeng WuZheng-Jun ZhaPublished in: ACM Multimedia (2022)
Keyphrases
- cross modal
- semantic representations
- multi modal
- natural language
- computer vision
- multimedia retrieval
- image retrieval
- visual recognition
- semantic concepts
- high level
- supervised learning
- multimedia databases
- perceptual information
- semantic gap
- semantic information
- training examples
- visual features
- high dimensional
- training set
- feature extraction
- image sequences