Time-domain Transformer-based Audiovisual Speaker Separation.
Vahid Ahmadi KalkhoraniAnurag KumarKe TanBuye XuDeLiang WangPublished in: INTERSPEECH (2023)
Keyphrases
- audio visual
- frequency domain
- visual information
- speaker verification
- fuzzy logic
- multi modal
- sound source
- fault diagnosis
- emotion recognition
- speaker recognition
- speech recognition
- multimedia content
- artificial intelligence
- high voltage
- distribution network
- visual data
- power system
- video retrieval
- multimedia
- image processing
- data sets
- prosodic features