Multi-Granularity Relational Attention Network for Audio-Visual Question Answering.
Linjun LiTao JinWang LinHao JiangWenwen PanJian WangShuwen XiaoYan XiaWeihao JiangZhou ZhaoPublished in: IEEE Trans. Circuits Syst. Video Technol. (2024)
Keyphrases
- question answering
- audio visual
- passage retrieval
- multi granularity
- multi modal
- visual information
- natural language
- natural language processing
- visual data
- information extraction
- named entities
- data model
- document retrieval
- information retrieval
- relational databases
- data mining
- document collections
- multimedia
- metadata