HowToCaption: Prompting LLMs to Transform Video Annotations at Scale.
Nina ShvetsovaAnna KuklevaXudong HongChristian RupprechtBernt SchieleHilde KuehnePublished in: CoRR (2023)
Keyphrases
- video data
- video streams
- video sequences
- natural language descriptions
- video frames
- video content
- video material
- real time video
- multimedia
- scale space
- digital video
- video database
- video analysis
- semantic annotation
- online video
- spatial and temporal
- rotation and scale invariant
- event detection
- scale invariant
- video shots
- neural network
- space time
- multi view
- keywords
- metadata