TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition.
Hakan ErdoganScott WisdomXuankai ChangZalán BorsosMarco TagliasacchiNeil ZeghidourJohn R. HersheyPublished in: CoRR (2023)
Keyphrases
- recognition engine
- speech recognition
- speech corpus
- speech signal
- automatic speech recognition systems
- speech synthesis
- automatic speech recognition
- text to speech
- speech recognition systems
- word recognition
- spoken language
- object recognition
- speech sounds
- noisy environments
- facial gestures
- continuous speech recognition
- digit recognition
- broadcast news
- endpoint detection
- text recognition
- spoken words
- speaker recognition
- image recognition
- dialogue system
- phoneme recognition
- pattern recognition
- speaker dependent
- sign language
- spontaneous speech
- vocal tract
- automatic recognition
- neural network
- emotion recognition