From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning.
Jingkuan SongYuyu GuoLianli GaoXuelong LiAlan HanjalicHeng Tao ShenPublished in: CoRR (2017)
Keyphrases
- multi modal
- semantic concepts
- video search
- recurrent neural networks
- multiple modalities
- audio visual
- generative model
- multi modality
- video data
- multimedia
- video sequences
- video streams
- video clips
- cross modal
- video analysis
- video content
- video retrieval
- video frames
- high dimensional
- neural network
- fusing multiple
- video shots
- face recognition
- metadata