TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition.
Hakan ErdoganScott WisdomXuankai ChangZalán BorsosMarco TagliasacchiNeil ZeghidourJohn R. HersheyPublished in: INTERSPEECH (2023)
Keyphrases
- recognition engine
- speech recognition
- automatic speech recognition systems
- speech corpus
- speech signal
- speech synthesis
- automatic speech recognition
- audio visual
- word recognition
- spoken language
- recognition rate
- phoneme recognition
- hidden markov models
- speech sounds
- noisy environments
- speaker recognition
- information retrieval
- text recognition
- pattern recognition
- speech recognition systems
- speaker independent
- continuous speech recognition
- endpoint detection
- english text
- speaker dependent
- spontaneous speech
- digit recognition
- vocal tract
- recognition process
- gesture recognition
- feature extraction