LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling.

Dongsheng Chen Chaofan Tao Lu Hou Lifeng Shang Xin Jiang Qun Liu

Published in: CoRR (2022)

Keyphrases

spatial temporal
language learning
temporal information
video shots
spatial and temporal
action recognition
mobile learning
english language
foreign language
computer assisted language learning
language acquisition
mobile language learning
spatio temporal
video retrieval
spatial information
video sequences
vocabulary learning
metadata
human actions
learning experience
video data
human computer interaction
language learners