以三元組損失微調時延神經網路語者嵌入函數之語者辨識系統(Time Delay Neural Network-based Speaker Embedding Function Fine-tuned with Triplet Loss for Distance-based Speaker Recognition).
Chih-Ting YehPo-Chin WangSu-Yu ZhangChia-Ping ChenShan-Wen HsiaoBo-Cheng ChanChung-Li LuPublished in: ROCLING (2019)
Keyphrases
- speaker recognition
- fine tuned
- gaussian mixture model
- speaker verification
- vector quantization
- speaker identification
- probabilistic neural network
- fine tuning
- speech recognition
- neural network
- speech signal
- multiresolution
- mixture model
- audio visual
- mel frequency cepstral coefficients
- gaussian process regression
- image compression
- feature space
- multimedia