Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers.
Vidhi JainMaria AttarianNikhil J. JoshiAyzaan WahidDanny DriessQuan VuongPannag R. SanketiPierre SermanetStefan WelkerChristine ChanIgor GilitschenskiYonatan BiskDebidatta DwibediPublished in: CoRR (2024)