UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation.

Published in: CoRR (2020)

Keyphrases