Sign in

Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation.

Philipp HarzigMoritz EinfaltRainer Lienhart
Published in: ICIP (2022)
Keyphrases