Login / Signup

ε-ViLM : Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer.

Jacob Zhiyuan FangSkyler ZhengVasu SharmaRobinson Piramuthu
Published in: WACV (Workshops) (2024)
Keyphrases