Multi-modal speaker diarization of real-world meetings using compressed-domain video features.

Gerald Friedland Hayley Hung Chuohao Yeo

Published in: ICASSP (2009)

Keyphrases

multi modal
speaker diarization
compressed domain
video analysis
video search
multiple modalities
audio visual
semantic concepts
broadcast news
low level
multimedia
video data
feature vectors
feature space
video frames
image features
high dimensional
machine learning
bitstream
image annotation
optical flow
audio features
feature extraction
face recognition
metadata