Target speaker filtration by mask estimation for source speaker traceability in voice conversion.
Junfei ZhangXiongwei ZhangMeng SunXia ZouChong JiaYihao LiPublished in: Eng. Appl. Artif. Intell. (2024)
Keyphrases
- speech recognition
- synthesized speech
- speaker verification
- speaker recognition
- audio visual
- prosodic features
- automatic speech recognition
- speaker identification
- emotion recognition
- life cycle
- speaker dependent
- data sets
- real time
- mel frequency cepstral coefficients
- neural network
- bounding box
- density estimation
- accurate estimation
- gaussian mixture model
- multi modal