Following the Embedding: Identifying Transition Phenomena in Wav2vec 2.0 Representations of Speech Audio.
Patrick Cormac EnglishErfan A. ShamsJohn D. KelleherJulie Carson-BerndsenPublished in: ICASSP (2024)
Keyphrases
- audio stream
- audio visual
- broadcast news
- speaker identification
- audio signals
- text to speech
- audio features
- speech recognition
- cepstral features
- speech processing
- emotion recognition
- multimedia
- digital audio
- multi modal
- automatic transcription
- data embedding
- speech synthesis
- multi stream
- content based video retrieval
- speech signal
- recognition engine
- audio recordings
- prosodic features
- acoustic features
- linear predictive coding
- human language
- audio files
- visual speech
- speech music discrimination
- automatic speech recognition
- digital video
- vector space
- visual information
- feature extraction
- speaker verification
- hidden markov models
- voice activity detection
- feature set
- signal processing
- spoken documents
- low dimensional
- acoustic signals
- human computer interaction
- visual data
- mel frequency cepstral coefficients
- symbolic representation
- language acquisition
- audio video
- spoken document retrieval