Automated Audio Captioning via Fusion of Low- and High- Dimensional Features.
Jianyuan SunXubo LiuXinhao MeiMark D. PlumbleyVolkan KilicWenwu WangPublished in: CoRR (2022)
Keyphrases
- high dimensional
- multimodal fusion
- feature space
- feature extraction
- classification accuracy
- high dimensional data
- visual data
- high dimensionality
- data fusion
- multi modal
- feature set
- dimensionality reduction
- multimedia
- keypoints
- principal component analysis
- visual information
- image features
- hidden markov models
- soccer video
- audio features
- semantic context
- person authentication
- support vector
- cepstral features