Improving short-video speech recognition using random utterance concatenation.
Haihua XuVan Tung PhamYerbolat KhassanovYist LinTao HanTze Yuan ChongYi HeZejun MaPublished in: CoRR (2022)
Keyphrases
- speech recognition
- spoken language
- video data
- video sequences
- real time
- multimedia
- broadcast news
- video streams
- audio video
- real time video
- video analysis
- digital audio
- automatic speech recognition
- video clips
- video frames
- video content
- speech synthesis
- speech signal
- space time
- multimedia data
- pattern recognition
- key frames
- recognition engine
- content based video retrieval
- online video
- hidden markov models
- digital video
- video database
- event detection
- video retrieval
- temporal coherence
- speaker identification
- visual data
- temporal information
- natural language
- regular expressions
- spontaneous speech