Audio-Visual Speech Separation with Visual Features Enhanced by Adversarial Training.
Peng ZhangJiaming XuJing ShiYunzhe HaoLei QinBo XuPublished in: IJCNN (2021)
Keyphrases
- audio visual
- visual features
- visual information
- visual data
- image classification
- multi modal
- visual content
- audio features
- low level
- image retrieval
- image collections
- low level features
- multi stream
- emotion recognition
- speaker verification
- keywords
- image annotation
- eye movements
- training set
- acoustic features
- multimedia
- image representation
- sound source
- search engine
- human actions
- action recognition