AVQA: A Dataset for Audio-Visual Question Answering on Videos.
Pinci YangXin WangXuguang DuanHong ChenRunze HouCong JinWenwu ZhuPublished in: ACM Multimedia (2022)
Keyphrases
- audio visual
- question answering
- passage retrieval
- multi modal
- visual data
- visual information
- human actions
- natural language
- information extraction
- action recognition
- multimedia
- video sequences
- natural language processing
- information retrieval
- named entities
- video data
- video search
- video frames
- visual features
- feature extraction
- feature selection
- artificial intelligence