VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features.
Tomoki KoriyamaPublished in: CoRR (2024)
Keyphrases
- acoustic features
- automatic speech recognition
- speech recognition
- music genre classification
- speech signal
- speaker verification
- visual speech
- semi supervised learning
- hidden markov models
- visual features
- music information retrieval
- environmental sounds
- broadcast news
- cross correlation
- noisy environments
- artificial neural networks
- audio visual
- gaussian mixture model
- audio features
- frequency domain
- image classification
- low level
- feature space
- search engine