Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization.

Published in: CoRR (2024)

Keyphrases