TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation.
Hritik BansalYonatan BittonMichal YaromIdan SzpektorAditya GroverKai-Wei ChangPublished in: CoRR (2024)
Keyphrases
- scene text
- text detection
- video content
- text regions
- natural scene images
- video frames
- video data
- video streams
- video sequences
- video analysis
- connected components
- multimedia
- temporal information
- key frames
- natural scenes
- visual features
- bounding box
- event detection
- document images
- object recognition
- lecture videos
- image search
- text information
- video annotation
- optical character recognition
- video database
- input image
- object detection