SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs.
Lijun YuYong ChengZhiruo WangVivek KumarWolfgang MachereyYanping HuangDavid A. RossIrfan EssaYonatan BiskMing-Hsuan YangKevin P. MurphyAlexander G. HauptmannLu JiangPublished in: NeurIPS (2023)
Keyphrases
- image pyramids
- semantic information
- semantic analysis
- natural language
- multiscale
- multiresolution
- domain specific
- semantic network
- semantic representation
- information extraction
- input image
- multi modal
- semantically equivalent
- semantic annotation
- semantic similarity
- semantic knowledge
- genetic algorithm
- multimodal interaction
- semantic description
- semantic level
- generation process
- image segmentation
- data sets
- action recognition
- semantic web
- high level