Research on Feature Extraction and Multimodal Fusion of Video Caption Based on Deep Learning.
Hongjun ChenHengyi LiXueqin WuPublished in: ICMSS (2020)
Keyphrases
- deep learning
- multimodal fusion
- feature extraction
- unsupervised learning
- machine learning
- high robustness
- video retrieval
- audio visual
- video data
- relevance feedback
- video sequences
- feature selection
- visual features
- gait recognition
- weakly supervised
- multimodal interfaces
- feature vectors
- principal component analysis
- image classification
- image processing
- video frames
- pattern recognition
- multimedia data
- mental models
- multimedia
- high accuracy
- learning algorithm
- feature space
- data sets