One Perceptron to Rule Them All: Language, Vision, Audio and Speech.
Xavier Giró-i-NietoPublished in: ICMR (2020)
Keyphrases
- text to speech
- human language
- audio visual
- audio stream
- language acquisition
- broadcast news
- text to speech synthesis
- audio signals
- speech recognition
- emotion recognition
- speaker identification
- multimedia
- language learning
- computer vision
- language processing
- speech synthesis
- english text
- speech processing
- neural network
- multi modal
- natural language
- spoken language
- prosodic features
- acoustic signals
- cepstral features
- digital audio
- audio recordings
- spoken dialog systems
- audio video
- learning algorithm
- multimodal interfaces
- audio features
- programming language
- classification rules
- speech music discrimination
- multi stream
- association rules
- automatic transcription
- signal processing
- language generation
- speech signal
- vision system
- machine translation
- spoken documents
- voice activity detection
- dialogue system
- acoustic features