SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs.
Lijun YuYong ChengZhiruo WangVivek KumarWolfgang MachereyYanping HuangDavid A. RossIrfan EssaYonatan BiskMing-Hsuan YangKevin MurphyAlexander G. HauptmannLu JiangPublished in: CoRR (2023)
Keyphrases
- multiresolution
- natural language
- generation process
- semantic information
- semantic knowledge
- high level
- multiscale
- multi modal
- scale space
- semantic search
- coarse to fine
- low level features
- semantic web
- semi supervised
- semantic level
- image pyramids
- semantic representation
- neural network
- co occurrence
- multimedia
- machine learning