WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs.
Deshun YangLuhui HuYu TianZihao LiChris KellyBang YangCindy YangYuexian ZouPublished in: CoRR (2024)
Keyphrases
- input image
- image data
- image features
- single image
- random fields
- bayesian framework
- multiscale
- image frames
- segmentation method
- test images
- image segmentation
- image retrieval
- image analysis
- textual descriptions
- edge detection
- image classification
- image regions
- image content
- low level
- high level
- semantic labels
- artificial intelligence
- information retrieval
- static images
- machine learning
- autonomous agents
- caption text
- probabilistic model
- high resolution
- multiagent systems
- video frames
- intelligent agents
- image representation
- image collections
- feature points
- visual data
- motion estimation
- text detection
- multimedia
- computer vision