Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning.
Antoine YangArsha NagraniPaul Hongsuck SeoAntoine MiechJordi Pont-TusetIvan LaptevJosef SivicCordelia SchmidPublished in: CoRR (2023)
Keyphrases
- language model
- language modeling
- n gram
- probabilistic model
- document retrieval
- speech recognition
- retrieval model
- query expansion
- information retrieval
- language modelling
- test collection
- video data
- visual analysis
- visual features
- visual information
- video content
- statistical language models
- key frames
- video search
- multimedia
- context sensitive
- query terms
- mixture model
- smoothing methods
- bayesian networks
- translation model
- ad hoc information retrieval
- language model for information retrieval
- vector space model
- relevance model
- news video
- document ranking
- video retrieval
- word clouds
- retrieval effectiveness
- web search
- clustering algorithm
- search engine