3D-VLA: A 3D Vision-Language-Action Generative World Model.
Haoyu ZhenXiaowen QiuPeihao ChenJincheng YangXin YanYilun DuYining HongChuang GanPublished in: CoRR (2024)
Keyphrases
- world model
- vision system
- action language
- language learning
- semantic interpretation
- generative model
- programming language
- natural language
- computer vision
- real time
- semantic constraints
- domain knowledge
- specification language
- image processing
- database
- action selection
- representation language
- speech acts
- reasoning about actions