Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning.
Antoine YangArsha NagraniPaul Hongsuck SeoAntoine MiechJordi Pont-TusetIvan LaptevJosef SivicCordelia SchmidPublished in: CVPR (2023)
Keyphrases
- language model
- language modeling
- n gram
- language modelling
- probabilistic model
- retrieval model
- document retrieval
- speech recognition
- test collection
- information retrieval
- video data
- ad hoc information retrieval
- query expansion
- multimedia
- visual features
- video search
- context sensitive
- visual information
- video content
- mixture model
- visual analysis
- key frames
- language model for information retrieval
- statistical language models
- document ranking
- news video
- relevance model
- translation model
- smoothing methods
- query terms
- vector space model
- statistical machine translation
- document length
- pseudo relevance feedback