Login / Signup
Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances.
Thao Le Minh
Nobuyuki Shimizu
Takashi Miyazaki
Koichi Shinoda
Published in:
CoRR (2018)
Keyphrases
</>
multi modal
deep learning
visual scene
object recognition
unsupervised learning
machine learning
audio visual
mental models
weakly supervised
natural images
image annotation
high dimensional
vision system
visual information
complex scenes
natural language
feature extraction
text mining
pairwise
spatial relations