Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments.
Sara PapiPeidong WangJunkun ChenJian XueJinyu LiYashesh GaurPublished in: ASRU (2023)
Keyphrases
- data sets
- training samples
- pairwise
- machine learning
- training phase
- real time streaming
- real time
- automatic speech recognition
- test set
- keywords
- multimedia
- natural language
- supervised learning
- higher level
- decision trees
- speech recognition
- training algorithm
- textual data
- levels of abstraction
- metadata
- feature selection
- genetic algorithm