Predicting audio-visual salient events based on visual, audio and text modalities for movie summarization.
Petros KoutrasAthanasia ZlatintsiElias IosifAthanasios KatsamanisPetros MaragosAlexandros PotamianosPublished in: ICIP (2015)
Keyphrases
- audio visual
- visual data
- visual information
- sports video
- video summarization
- multimodal fusion
- multi modal
- multiple modalities
- cross modal
- soccer video
- video search
- visual features
- audio features
- multi stream
- audio visual speech recognition
- visual content
- video sequences
- semantic information
- image data
- temporal information
- text data
- low level
- multimedia data
- emotion recognition
- video data
- contextual information
- multimedia
- eye movements
- event detection
- image retrieval
- information retrieval
- audio visual content
- text documents
- high dimensional data
- high dimensional
- high level
- video analysis
- keywords
- search engine