ASTT: acoustic spatial-temporal transformer for short utterance speaker recognition.
Xing WuRuixuan LiBin DengMing ZhaoXingyue DuJianjia WangKai DingPublished in: Multim. Tools Appl. (2023)
Keyphrases
- spatial temporal
- speaker recognition
- speech recognition
- gaussian mixture model
- vector quantization
- speaker identification
- speaker verification
- probabilistic neural network
- spatio temporal
- action recognition
- temporal information
- video shots
- spatial and temporal
- speech signal
- acoustic features
- human actions
- noisy environments
- computer vision
- machine learning
- image compression
- pattern recognition
- high level
- image processing
- hidden markov models
- automatic speech recognition