Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech.
Gaoussou Youssouf KebeLuke E. RichardsEdward RaffFrancis FerraroCynthia MatuszekPublished in: CoRR (2021)
Keyphrases
- vowel phonemes
- text to speech
- text to speech synthesis
- language acquisition
- programming language
- english text
- spoken language
- natural language
- language learning
- speech recognition
- deep learning
- human language
- prosodic features
- raw data
- semantic representations
- formant frequencies
- pattern recognition
- speaker independent
- speech recognition systems
- language processing
- speech synthesis
- human communication
- acoustic features
- hidden markov models
- spontaneous speech
- unsupervised feature learning
- emotional speech
- bird species
- lexical features
- audio visual