SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation.
Abhinav MoudgilArjun MajumdarHarsh AgrawalStefan LeeDhruv BatraPublished in: CoRR (2021)
Keyphrases
- multiple objects
- complex scenes
- moving objects
- real world scenes
- visual scene
- d scene
- computer vision
- visual input
- real objects
- fuzzy logic
- d objects
- object models
- real time
- programming language
- rigid body motion
- spatial relations
- object motion
- vision system
- scene understanding
- ground plane
- video sequences
- natural language
- object model
- image regions
- geometric information
- object parts
- relative position
- real world objects
- location and orientation
- multiple images
- object tracking
- uncalibrated images
- single image
- background pixels
- fault diagnosis
- object features
- target object
- individual objects
- viewing angle
- image sequences
- object recognition
- video scene
- three dimensional
- camera images
- image segments
- acquired images
- dynamic scenes
- object segmentation
- viewing position