LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling.
Dongsheng ChenChaofan TaoLu HouLifeng ShangXin JiangQun LiuPublished in: EMNLP (2022)
Keyphrases
- spatial temporal
- language learning
- spatio temporal
- action recognition
- video shots
- temporal information
- foreign language
- spatial and temporal
- language acquisition
- english language
- computer assisted language learning
- human actions
- mobile language learning
- mobile learning
- digital libraries
- video sequences
- spatial information
- space time
- low level
- language skills
- multimedia