Login / Signup
TMT: A Transformer-based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-aware Dialog.
Wubo Li
Dongwei Jiang
Wei Zou
Xiangang Li
Published in:
CoRR (2020)
Keyphrases
</>
visual scene
visual information
audio visual
multimedia
visual attention
multi modal
object recognition
vision system
multimodal fusion
complex scenes
natural images
low level
user interface
natural language
domain knowledge
knowledge base
machine learning
real time