Target Speaker Voice Activity Detection with Transformers and Its Integration with End-To-End Neural Diarization.
Dongmei WangXiong XiaoNaoyuki KandaTakuya YoshiokaJian WuPublished in: ICASSP (2023)
Keyphrases
- end to end
- voice activity detection
- speaker diarization
- speech recognition
- noisy environments
- wireless ad hoc networks
- congestion control
- speaker verification
- ad hoc networks
- network architecture
- multipath
- speaker identification
- admission control
- neural network
- internet protocol
- automatic speech recognition
- real time
- scalable video