Towards Video Text Visual Question Answering: Benchmark and Baseline.
Minyi ZhaoBingjia LiJie WangWanqing LiWenjing ZhouLan ZhangShijie XuyangZhihang YuXinkun YuGuangze LiAobotao DaiShuigeng ZhouPublished in: NeurIPS (2022)
Keyphrases
- question answering
- syntactic information
- video search
- information retrieval
- news video
- text summarization
- textual entailment recognition
- information extraction
- visual data
- question classification
- video content
- qa clef
- passage retrieval
- named entities
- natural language processing
- video sequences
- video data
- free text
- natural language
- answer validation
- text mining
- cross language
- natural language questions
- video retrieval
- video frames
- multi modal
- multimedia
- text documents
- text retrieval
- relation extraction
- visual information
- answer extraction
- answering questions
- video shots
- key frames
- question answering systems
- candidate answers
- low level
- visual features
- keywords