Cross-utterance context for multimodal video transcription.
Roshan SharmaBhiksha RajPublished in: IEEECONF (2022)
Keyphrases
- multimedia
- real time
- contextual information
- video sequences
- video processing
- video data
- spatial and temporal
- video frames
- spoken language
- video segmentation
- context sensitive
- speech recognition
- space time
- pattern recognition
- video clips
- dynamic scenes
- context dependent
- video database
- multi modal
- image sequences
- multimodal interaction