Detection of Inconsistency Between Subject and Speaker Based on the Co-occurrence of Lip Motion and Voice Towards Speech Scene Extraction from News Videos.
Shogo KumagaiKeisuke DomanTomokazu TakahashiDaisuke DeguchiIchiro IdeHiroshi MurasePublished in: ISM (2011)
Keyphrases
- co occurrence
- video scene
- audio visual
- news video
- speech recognition
- multi modal
- video data
- video analysis
- automatic speech recognition
- wordnet
- broadcast news
- moving objects
- image sequences
- visual speech
- speaker identification
- visual data
- text to speech
- multimedia
- news stories
- speech signal
- visual information
- named entities
- video sequences
- visual words
- topic models
- video database
- video clips
- human motion
- semantic similarity
- noisy environments
- natural language processing
- knowledge representation
- three dimensional
- computer vision