Login / Signup
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization.
Yang Jin
Zhicheng Sun
Kun Xu
Kun Xu
Liwei Chen
Hao Jiang
Quzhe Huang
Chengru Song
Yuliang Liu
Di Zhang
Yang Song
Kun Gai
Yadong Mu
Published in:
CoRR (2024)
Keyphrases
</>
video data
video sequences
video content
multimedia
video streams
video search
visual data
video analysis
real time
video clips
video database
visual analysis
high level
language learning
video frames
digital video
text mining