EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model.
Guozhang LiXinpeng DingDe ChengJie LiNannan WangXinbo GaoPublished in: CoRR (2023)
Keyphrases
- language model
- weakly supervised
- n gram
- probabilistic model
- multimedia
- relation extraction
- video sequences
- topic models
- object class
- information retrieval
- superpixels
- video data
- mixture model
- video frames
- semi supervised
- domain specific
- object detection
- object detectors
- expectation maximization
- multi modal
- unsupervised learning
- segmentation algorithm
- image segmentation
- decision trees
- statistical models
- key frames
- feature vectors