Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies.
Jianing QianAnastasios PanagopoulosDinesh JayaramanPublished in: CoRR (2024)
Keyphrases
- multiple objects
- complex scenes
- moving objects
- visual scene
- spatial relations
- target object
- real world objects
- real world scenes
- d scene
- object features
- image regions
- d objects
- camera images
- object tracking
- location and orientation
- uncalibrated images
- object models
- computer vision
- three dimensional
- ground plane
- real objects
- dynamic scenes
- acquired images
- object model
- image sequences
- object motion
- geometric information
- object classes
- video scene
- optimal policy
- single image
- video sequences
- vision system
- wire frame
- background pixels
- image segments
- visual input
- input image
- viewing angle
- rigid body motion
- multiple images
- scene understanding
- viewing position
- laser scanner
- object appearance
- relative position
- real scenes