Synthetic speech attribution using self supervised audio spectrogram transformer.
Amit Kumar Singh YadavEmily R. BartusiakKratika BhagtaniEdward J. DelpPublished in: Media Watermarking, Security, and Forensics (2023)
Keyphrases
- speech signal
- audio stream
- speaker identification
- audio visual
- automatic speech recognition
- broadcast news
- speech recognition
- audio signals
- emotion recognition
- acoustic features
- text to speech
- audio features
- speech music discrimination
- cepstral features
- digital audio
- linear predictive coding
- speech processing
- multimedia
- automatic transcription
- pattern analysis
- audio recordings
- prosodic features
- spoken documents
- visual information
- speech synthesis
- mel frequency cepstral coefficients
- audio video
- power system
- spoken language
- noisy environments
- multi stream
- acoustic signals
- visual data
- real world
- content based video retrieval
- speaker recognition
- non stationary
- spontaneous speech
- fuzzy logic
- multi modal
- human language
- signal processing
- multimedia data
- voice activity detection
- speaker diarization
- visual speech
- audio files
- multimodal interfaces
- hidden markov models
- visual features
- audio signal
- power transformers