Sign in

Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation.

Chuang LinYi JiangJianfei CaiLizhen QuGholamreza HaffariZehuan Yuan
Published in: ECCV (36) (2022)
Keyphrases
  • variable length
  • fixed length
  • n gram
  • computer vision
  • vision system
  • text compression
  • statistical dependencies
  • bitstream
  • machine learning
  • image processing
  • natural language
  • fault diagnosis
  • convolutional codes