Let's Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought.

Published in: EMNLP (2023)

Keyphrases