Hierarchical attention-based multimodal fusion for video captioning.
Chunlei WuYiwei WeiXiaoliang ChuWeichen SunFei SuLeiquan WangPublished in: Neurocomputing (2018)
Keyphrases
- multimodal fusion
- audio visual
- video data
- high robustness
- relevance feedback
- video sequences
- video content
- multimedia
- multimodal interfaces
- gait recognition
- video retrieval
- video frames
- space time
- real time
- human computer interaction
- learning algorithm
- spatio temporal
- three dimensional
- high accuracy
- multimodal interaction
- neural network