Progressive Spatio-temporal Perception for Audio-Visual Question Answering.
Guangyao LiWenxuan HouDi HuPublished in: ACM Multimedia (2023)
Keyphrases
- audio visual
- question answering
- spatio temporal
- passage retrieval
- multi modal
- visual information
- information retrieval
- visual data
- image sequences
- natural language
- multimedia
- natural language processing
- space time
- named entities
- human actions
- knowledge representation
- information extraction
- high dimensional
- low level
- language model
- keywords
- data sets