VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation.
Xilun ChenLili YuWenhan XiongBarlas OguzYashar MehdadWen-tau YihPublished in: CoRR (2023)
Keyphrases
- text generation
- natural language generation
- video data
- video sequences
- multimedia
- video clips
- video analysis
- video retrieval
- real time
- video streams
- online video
- training phase
- video content
- video frames
- spatial and temporal
- training process
- video database
- training set
- test set
- training examples
- domain knowledge
- spatio temporal
- digital video
- bayesian networks
- neural network