Multimodal Deep Neural Network with Image Sequence Features for Video Captioning.

Soichiro Oura Tetsu Matsukawa Einoshin Suzuki

Published in: IJCNN (2018)

Keyphrases

neural network
video sequences
image sequences
real time
multimedia
video clips
spatio temporal
feature extraction
low level
video retrieval
video data
temporal information
multimodal fusion
depth map
artificial neural networks
video streams
key frames
neural network model
feature set
optical flow
moving camera
audio visual
video database
image frames
audio features
wire frame