HowToCaption: Prompting LLMs to Transform Video Annotations at Scale.

Nina Shvetsova Anna Kukleva Xudong Hong Christian Rupprecht Bernt Schiele Hilde Kuehne

Published in: CoRR (2023)

Keyphrases

video data
video streams
video sequences
natural language descriptions
video frames
video content
video material
real time video
multimedia
scale space
digital video
video database
video analysis
semantic annotation
online video
spatial and temporal
rotation and scale invariant
event detection
scale invariant
video shots
neural network
space time
multi view
keywords
metadata