Target Speech Diarization with Multimodal Prompts.
Yidi JiangRuijie TaoZhengyang ChenYanmin QianHaizhou LiPublished in: CoRR (2024)
Keyphrases
- audio visual
- speaker diarization
- multimodal interfaces
- speaker identification
- speech recognition
- multi modal
- multi stream
- speech signal
- target detection
- emotion recognition
- multimodal interaction
- automatic speech recognition systems
- information retrieval
- multimodal information
- recognition engine
- speech synthesis
- noisy environments
- automatic speech recognition
- pattern recognition