Spatio-Temporal Two-stage Fusion for video question answering.
Feifei XuYitao ZhuChun WangYangze CaoZheng ZhongXiongmin LiPublished in: Comput. Vis. Image Underst. (2023)
Keyphrases
- question answering
- spatio temporal
- information extraction
- video sequences
- video data
- cross language
- multimedia
- question classification
- natural language processing
- information retrieval
- video frames
- human actions
- passage retrieval
- named entities
- natural language
- open domain question answering
- video content
- video database
- question answering systems
- answer extraction
- natural language questions
- key frames
- relation extraction
- semantic roles
- answer validation
- data mining
- candidate answers
- sentence retrieval