Multi-modal speaker diarization of real-world meetings using compressed-domain video features.
Gerald FriedlandHayley HungChuohao YeoPublished in: ICASSP (2009)
Keyphrases
- multi modal
- speaker diarization
- compressed domain
- video analysis
- video search
- multiple modalities
- audio visual
- semantic concepts
- broadcast news
- low level
- multimedia
- video data
- feature vectors
- feature space
- video frames
- image features
- high dimensional
- machine learning
- bitstream
- image annotation
- optical flow
- audio features
- feature extraction
- face recognition
- metadata