USTED: Improving ASR with a Unified Speech and Text Encoder-Decoder.
Bolaji YusufAnkur GandheAlex SokolovPublished in: CoRR (2022)
Keyphrases
- automatic speech recognition
- spontaneous speech
- speech recognition
- video codec
- conversational speech
- low complexity
- distributed video coding
- decoding process
- speech signal
- spoken words
- text to speech synthesis
- text to speech
- text recognition
- video coding
- wyner ziv video coding
- rate distortion
- mpeg avc
- english text
- hidden markov models
- text input
- error control
- word error rate
- noisy channel
- broadcast news
- speech retrieval
- lexical features
- motion estimation
- turbo codes
- information retrieval
- human machine interaction
- successive approximation
- wyner ziv
- distributed source coding
- spoken language
- noisy environments
- bit rate
- video coding scheme
- computational complexity
- speech synthesis
- macroblock
- video quality
- finite state transducers
- vocal tract
- pixel domain
- spoken document retrieval
- video encoder
- error concealment
- speech corpus
- transform domain
- bit rate reduction