Login / Signup
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer.
Vladimir Iashin
Esa Rahtu
Published in:
BMVC (2020)
Keyphrases
</>
visual cues
visual information
lecture videos
low level
mid level
multimedia
visual data
key frames
visual features
audio video
audio features
audio stream
audio visual
scene change detection
depth cues
video segments
video sequences
multiple modalities
digital video
multiple visual cues
knowledge base