Is a Video worth n×n Images? A Highly Efficient Approach to Transformer-based Video Question Answering.

Published in: CoRR (2023)

Keyphrases