Login / Signup
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer.
Vladimir Iashin
Esa Rahtu
Published in:
CoRR (2020)
Keyphrases
</>
visual cues
visual information
lecture videos
low level
visual data
multimedia
key frames
mid level
visual features
scene change detection
audio video
audio visual
audio features
depth cues
video shots
digital video
multimedia information
video frames
multimedia processing
video content
domain knowledge
video sequences