Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations.

Published in: CoRR (2022)

Keyphrases