Multimodal Deep Neural Network with Image Sequence Features for Video Captioning.
Soichiro OuraTetsu MatsukawaEinoshin SuzukiPublished in: IJCNN (2018)
Keyphrases
- neural network
- video sequences
- image sequences
- real time
- multimedia
- video clips
- spatio temporal
- feature extraction
- low level
- video retrieval
- video data
- temporal information
- multimodal fusion
- depth map
- artificial neural networks
- video streams
- key frames
- neural network model
- feature set
- optical flow
- moving camera
- audio visual
- video database
- image frames
- audio features
- wire frame