Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training.
Chenyi LeiShixian LuoYong LiuWanggui HeJiamang WangGuoxin WangHaihong TangChunyan MiaoHouqiang LiPublished in: CoRR (2021)
Keyphrases
- story segmentation
- multimedia
- english text
- training set
- chinese language
- broadcast news
- video data
- training examples
- video clips
- video sequences
- language learning
- real time
- training phase
- video content
- natural language
- video streams
- multi modal
- video frames
- programming language
- space time
- key frames
- text summarization
- video analysis
- video retrieval
- training process
- action recognition
- news video
- video surveillance
- multimodal interaction
- feature vectors
- human activities