Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies.
Jianing QianAnastasios PanagopoulosDinesh JayaramanPublished in: ICRA (2024)
Keyphrases
- multiple objects
- moving objects
- complex scenes
- visual scene
- spatial relations
- visual input
- vision system
- target object
- reference object
- d objects
- real world scenes
- object models
- d scene
- image regions
- computer vision
- real world objects
- real time
- video sequences
- real objects
- scene understanding
- input image
- uncalibrated images
- intensity images
- acquired images
- multiple images
- three dimensional
- relative position
- geometric information
- object appearance
- object features
- individual objects
- laser scanner
- viewing direction
- image segments
- location and orientation
- real scenes
- camera images
- geometric constraints
- viewing position
- object recognition
- background subtraction
- spatial relationships
- video scene
- dynamic scenes
- object segmentation
- object model