Login / Signup
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer.
Jacob Zhiyuan Fang
Skyler Zheng
Vasu Sharma
Robinson Piramuthu
Published in:
CoRR (2023)
Keyphrases
</>
language model
multimedia
video sequences
video data
video content
video frames
language modeling
information retrieval
n gram
query expansion
retrieval model
multiscale
co occurrence
coding scheme
ad hoc information retrieval