Sign in

A Cross-Modal Object-Aware Transformer for Vision-and-Language Navigation.

Han NiJia ChenDaYong ZhuDianxi Shi
Published in: ICTAI (2022)
Keyphrases
  • cross modal
  • multi modal
  • natural language
  • d objects
  • spatial relationships
  • visual recognition
  • multimedia retrieval
  • visual similarity
  • computer vision
  • multimedia databases
  • low level
  • low dimensional