A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions.
Jack HesselBo PangZhenhai ZhuRadu SoricutPublished in: CoRR (2019)
Keyphrases
- visual features
- key frames
- semantic concepts
- video shots
- visual content
- visual data
- human actions
- image classification
- visual information
- multimedia
- content based video retrieval
- motion features
- image retrieval
- image search
- video content
- low level features
- low level
- image annotation
- visual appearance
- semantic gap
- video data
- video sequences
- video database
- keywords
- video streams
- video clips
- audio features
- image collections
- visual properties
- web images
- speech recognition
- semantic features
- automatic speech recognition
- bag of features
- visual patterns
- image representation
- visual descriptors
- video frames
- global features
- noisy environments
- multimedia data
- temporal information