Deep Audio-Visual Singing Voice Transcription based on Self-Supervised Learning Models.
Xiangming GuWei ZengJianan ZhangLongshen OuYe WangPublished in: CoRR (2023)
Keyphrases
- learning models
- audio visual
- emotion recognition
- audio features
- multi modal
- machine learning
- learning algorithm
- semi supervised learning
- loss function
- visual information
- multimedia
- learning tasks
- conditional random fields
- speaker verification
- multi stream
- visual data
- acoustic features
- learning problems
- machine learning algorithms
- classification models
- data sets
- music information retrieval
- supervised learning