Pano-AVQA: Grounded Audio-Visual Question Answering on 360° Videos.
Heeseung YunYoungjae YuWonsuk YangKangil LeeGunhee KimPublished in: CoRR (2021)
Keyphrases
- audio visual
- question answering
- passage retrieval
- visual data
- multi modal
- visual information
- natural language processing
- video sequences
- video data
- information retrieval
- multimedia
- video search
- information extraction
- named entities
- video frames
- natural language
- video content
- machine learning
- key frames
- human activities
- visual features