Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation.
Xin WangQiuyuan HuangAsli CelikyilmazJianfeng GaoDinghan ShenYuan-Fang WangWilliam Yang WangLei ZhangPublished in: CoRR (2018)
Keyphrases
- cross modal
- imitation learning
- multi modal
- image retrieval
- vision system
- computer vision
- multimedia retrieval
- robotic systems
- visual recognition
- humanoid robot
- natural language
- multimedia databases
- visual data
- maximum margin
- relational structures
- reinforcement learning
- information extraction
- low level
- spatio temporal