The Embodied World Model Based on LLM with Visual Information and Prediction-Oriented Prompts.
Wakana HaijimaKou NakakuboMasahiro SuzukiYutaka MatsuoPublished in: CoRR (2024)
Keyphrases
- visual information
- visual features
- low level
- visual content
- audio visual
- eye movements
- visual cues
- visual data
- physical world
- content based image
- content based image retrieval systems
- textual information
- image collections
- human visual system
- visual descriptors
- cognitive processes
- multimedia data
- visual similarity
- high level
- databases