Login / Signup
Towards Generating Diverse Audio Captions via Adversarial Training.
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
Published in:
IEEE ACM Trans. Audio Speech Lang. Process. (2024)
Keyphrases
</>
multimedia
training process
multi agent
training set
multimedia information
wide variety
supervised learning
visual data
visual features
cross modal
data sets
training phase
training algorithm
test set
signal processing
training data
high level
metadata
real world