Video captioning with recurrent networks based on frame- and video-level features and visual content classification.
Rakshith ShettyJorma LaaksonenPublished in: CoRR (2015)
Keyphrases
- key frames
- visual content
- feature vectors
- video retrieval
- video data
- video frames
- video content
- video clips
- video shots
- video sequences
- semantic concepts
- feature extraction
- video search
- visual features
- video streams
- instructional videos
- class labels
- multi modal
- multimedia
- low level
- feature selection
- machine learning
- visual information
- video summarization
- low level features
- image classification
- text classification
- multimedia content
- high dimensional
- training set
- feature space
- visual concepts
- recurrent networks
- multimedia data