GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization.
Jia-Hong HuangLuka MurnMarta MrakMarcel WorringPublished in: ICMR (2021)
Keyphrases
- multi modal
- video summarization
- pre trained
- audio visual
- training data
- generative model
- motion vectors
- training examples
- video content
- video sequences
- video retrieval
- video data
- key frames
- image annotation
- high dimensional
- motion estimation
- video streams
- neural network
- reinforcement learning
- event detection
- visual information
- visual features
- semi supervised