LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling.
Dongsheng ChenChaofan TaoLu HouLifeng ShangXin JiangQun LiuPublished in: CoRR (2022)
Keyphrases
- spatial temporal
- language learning
- temporal information
- video shots
- spatial and temporal
- action recognition
- mobile learning
- english language
- foreign language
- computer assisted language learning
- language acquisition
- mobile language learning
- spatio temporal
- video retrieval
- spatial information
- video sequences
- vocabulary learning
- metadata
- human actions
- learning experience
- video data
- human computer interaction
- language learners