A Video Is Worth 4096 Tokens: Verbalize Story Videos To Understand Them In Zero Shot.
Aanisha BhattacharyaYaman Kumar SinglaBalaji KrishnamurthyRajiv Ratn ShahChangyou ChenPublished in: CoRR (2023)
Keyphrases
- user generated
- video sharing
- web videos
- video frames
- video data
- youtube videos
- video database
- video content
- event detection
- image sequences
- video analysis
- video clips
- multimedia
- video streams
- object detection
- video sequences
- video retrieval
- lecture videos
- spatiotemporal features
- video event detection
- video copy detection
- spatio temporal
- video editing
- input video
- high definition
- event recognition
- video shots
- human activities
- key frames