Improving Joint Speech-Text Representations Without Alignment.
Cal PeyserZhong MengKe HuRohit PrabhavalkarAndrew RosenbergTara N. SainathMichael PichenyKyunghyun ChoPublished in: CoRR (2023)
Keyphrases
- text to speech
- text to speech synthesis
- text recognition
- english text
- lexical features
- spontaneous speech
- word level
- multi lingual
- web documents
- information retrieval
- speech synthesis
- automatic speech recognition
- language generation
- text input
- speech recognition
- semantic representations
- speech signal
- free text
- textual data
- dynamic time warping
- question answering
- language acquisition
- automatically discovering
- document analysis
- conversational speech
- image alignment
- keywords