Sign in

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding.

Hang ZhangXin LiLidong Bing
Published in: CoRR (2023)
Keyphrases