CAMERA: A Multimodal Dataset and Benchmark for Ad Text Generation.
Masato MitaSoichiro MurakamiAkihiko KatoPeinan ZhangPublished in: CoRR (2023)
Keyphrases
- text generation
- natural language generation
- hand held
- structure from motion
- field of view
- vision system
- natural language
- benchmark datasets
- real time
- single camera
- video camera
- camera calibration
- multi modal
- camera motion
- multiple cameras
- focal length
- position and orientation
- synthetic datasets
- surveillance system
- camera parameters
- theorem prover
- domain knowledge
- expert systems
- computer vision