Login / Signup
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning.
Tanzila Rahman
Bicheng Xu
Leonid Sigal
Published in:
CoRR (2019)
Keyphrases
</>
multi modal
weakly supervised
object class
superpixels
relation extraction
topic models
semi supervised
object detectors
named entities
high dimensional
high level
object detection
viewpoint
domain specific
text mining
image annotation
input image
feature extraction
information retrieval