Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
Naoyuki KandaJian WuYu WuXiong XiaoZhong MengXiaofei WangYashesh GaurZhuo ChenJinyu LiTakuya YoshiokaPublished in: INTERSPEECH (2022)
Keyphrases
- automatic speech recognition
- training phase
- real time
- levels of abstraction
- training process
- training examples
- supervised learning
- pattern recognition
- data streams
- training set
- semi supervised
- video sequences
- input data
- training data
- high level
- lower level
- streaming data
- video streaming
- genetic algorithm
- databases
- database
- stream processing