PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings.
Joonas KaldaClément PagésRicard MarxerTanel AlumäeHervé BredinPublished in: Odyssey (2024)
Keyphrases
- speaker diarization
- speech recognition
- broadcast news
- audio stream
- speech activity detection
- speaker verification
- bayesian information criterion
- speaker identification
- training set
- automatic speech recognition
- audio visual
- supervised learning
- acoustic features
- language model
- noisy environments
- speech signal
- back propagation
- speaker recognition
- artificial neural networks
- learning algorithm