Towards Audio Codec-based Speech Separation.
Jia Qi YipShengkui ZhaoDianwen NgEng Siong ChngBin MaPublished in: CoRR (2024)
Keyphrases
- audio stream
- audio visual
- broadcast news
- cepstral features
- audio signals
- speaker identification
- audio features
- emotion recognition
- digital audio
- text to speech
- prosodic features
- speech music discrimination
- audio recordings
- speech recognition
- sound source
- audio video
- speech processing
- automatic transcription
- acoustic signals
- spoken documents
- multimedia
- speech synthesis
- linear predictive coding
- video coding
- multi modal
- speech signal
- automatic speech recognition
- multi stream
- audio signal
- acoustic features
- visual information
- human language
- bitstream
- music information retrieval
- language acquisition
- visual data
- speaker recognition
- voice activity detection
- video streams
- inter frame
- video codec
- spontaneous speech
- content based video retrieval
- coding method
- mel frequency cepstral coefficients
- audio files
- video search
- gaussian mixture model
- speaker diarization
- visual features
- low level