A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter.

Published in: CoRR (2023)

Keyphrases