Improving Joint Speech-Text Representations Without Alignment.
Cal PeyserZhong MengRohit PrabhavalkarAndrew RosenbergTara N. SainathMichael PichenyKyunghyun ChoKe HuPublished in: INTERSPEECH (2023)
Keyphrases
- text to speech synthesis
- text to speech
- english text
- text input
- text recognition
- lexical features
- semantic representations
- information retrieval
- database
- speech recognition
- audio visual
- language generation
- speech signal
- word level
- web documents
- speech synthesis
- text mining
- automatically discovering
- free text
- multi lingual
- higher level
- spontaneous speech
- recognition engine
- multimedia
- image alignment
- textual data
- dynamic time warping