VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation.

Xilun Chen Lili Yu Wenhan Xiong Barlas Oguz Yashar Mehdad Wen-tau Yih

Published in: CoRR (2023)

Keyphrases

text generation
natural language generation
video data
video sequences
multimedia
video clips
video analysis
video retrieval
real time
video streams
online video
training phase
video content
video frames
spatial and temporal
training process
video database
training set
test set
training examples
domain knowledge
spatio temporal
digital video
bayesian networks
neural network