LAVA: Language Audio Vision Alignment for Contrastive Video Pre-Training.
Sumanth GurramAndy FangDavid ChanJohn F. CannyPublished in: CoRR (2022)
Keyphrases
- multimedia
- audio video
- digital video
- multimedia processing
- visual data
- video data
- real time
- scene change detection
- multimedia information
- digital audio
- video content analysis
- video content
- video sequences
- audio files
- natural language
- programming language
- video database
- audio signals
- computer vision
- video material
- audio stream
- video analysis
- signal processing
- video frames
- media streams
- video files
- video streams
- training process
- closed captions
- image processing
- story segmentation
- video clips
- training set
- online video
- long video
- video indexing
- audio features
- image sequences
- language learning
- space time
- lecture videos
- video indexing and retrieval
- broadcast news
- event detection
- audio content
- video scene
- content based video retrieval
- video retrieval
- multi modal
- audio visual content