VCSE: Time-Domain Visual-Contextual Speaker Extraction Network.
Junjie LiMeng GeZexu PanLongbiao WangJianwu DangPublished in: INTERSPEECH (2022)
Keyphrases
- frequency domain
- speech recognition
- network structure
- high level
- hidden markov models
- wireless sensor networks
- visual features
- link prediction
- visual information
- computer networks
- communication networks
- distributed network
- real time
- speaker recognition
- visual perception
- network management
- eye movements
- peer to peer
- information extraction
- feature extraction
- data sets