Progressive Spatio-temporal Perception for Audio-Visual Question Answering.
Guangyao LiWenxuan HouDi HuPublished in: CoRR (2023)
Keyphrases
- audio visual
- question answering
- spatio temporal
- passage retrieval
- multi modal
- visual information
- visual data
- information retrieval
- information extraction
- space time
- natural language processing
- multimedia
- named entities
- natural language
- image sequences
- question answering systems
- image data
- human actions
- language model
- low level
- human motion
- databases