Binaural Sound Localization in Noisy Environments Using Frequency-Based Audio Vision Transformer (FAViT).
Waradon PhokhinananNicolas ObinSylvain ArgentieriPublished in: INTERSPEECH (2023)
Keyphrases
- edge detection
- noisy environments
- noise reduction
- sound source
- speaker identification
- speech signal
- audio features
- audio signal
- environmental sounds
- image processing
- speaker verification
- speech enhancement
- voice activity detection
- acoustic features
- multiscale
- single channel
- audio visual
- background noise
- speech recognition
- digit recognition
- computer vision
- multimedia
- wiener filter
- vision system
- signal processing
- fault diagnosis
- automatic speech recognition
- audio content
- feature transformation
- vocal tract
- mel frequency cepstral coefficients
- transfer function
- additive noise
- visual data
- visual information
- gaussian mixture model
- non stationary
- multi modal
- pattern recognition
- broadcast news
- emotion recognition
- neural network
- feature extraction
- information retrieval
- machine learning