Login / Signup
Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances.
Thao Le Minh
Nobuyuki Shimizu
Takashi Miyazaki
Koichi Shinoda
Published in:
IJCAI (2018)
Keyphrases
</>
multi modal
deep learning
visual scene
unsupervised learning
object recognition
audio visual
machine learning
mental models
high dimensional
object detection
visual information
complex scenes
visual attention
image understanding
image annotation
weakly supervised
pairwise