Login / Signup
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling.
Hsin-Ying Lee
Hung-Ting Su
Bing-Chen Tsai
Tsung-Han Wu
Jia-Fong Yeh
Winston H. Hsu
Published in:
BMVC (2022)
Keyphrases
</>
fine grained
spatial temporal
question answering
coarse grained
video shots
spatio temporal
natural language processing
information extraction
information retrieval
spatial and temporal
natural language
video data
action recognition
multimedia
access control