Towards cross-modal pre-training and learning tempo-spatial characteristics for audio recognition with convolutional and recurrent neural networks.
Shahin AmiriparianMaurice GerczukSandra OttlLukas StappenAlice BairdLukas KoebeBjörn W. SchullerPublished in: EURASIP J. Audio Speech Music. Process. (2020)
Keyphrases
- recurrent neural networks
- cross modal
- recurrent networks
- visual recognition
- feedforward neural networks
- echo state networks
- cascade correlation
- multi modal
- learning algorithm
- supervised learning
- perceptual information
- neural network
- artificial neural networks
- object recognition
- learning tasks
- image data
- training set