CAMERA: A Multimodal Dataset and Benchmark for Ad Text Generation.

Masato Mita Soichiro Murakami Akihiko Kato Peinan Zhang

Published in: CoRR (2023)

Keyphrases

text generation
natural language generation
hand held
structure from motion
field of view
vision system
natural language
benchmark datasets
real time
single camera
video camera
camera calibration
multi modal
camera motion
multiple cameras
focal length
position and orientation
synthetic datasets
surveillance system
camera parameters
theorem prover
domain knowledge
expert systems
computer vision