Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation.

Published in: CoRR (2021)

Keyphrases