Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning.
Jianyuan SunXubo LiuXinhao MeiVolkan KiliçMark D. PlumbleyWenwu WangPublished in: CoRR (2023)
Keyphrases
- multimodal fusion
- multiple features
- multimedia
- network model
- low level
- feature extraction
- feature space
- image features
- feature set
- feature vectors
- peer to peer
- network traffic
- computational complexity
- multiresolution
- wireless sensor networks
- fuzzy logic
- fault diagnosis
- data fusion
- visual information
- audio features
- neural network