Sign in

X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers.

Jaemin ChoJiasen LuDustin SchwenkHannaneh HajishirziAniruddha Kembhavi
Published in: EMNLP (1) (2020)
Keyphrases
  • multi modal
  • answer questions
  • visual features
  • multi modality
  • audio visual
  • cross modal
  • high dimensional
  • video retrieval
  • video search
  • computer vision
  • image processing
  • uni modal