DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment.
Shentong MoJing ShiYapeng TianPublished in: CoRR (2023)
Keyphrases
- visual information
- text graphics
- cross modal
- visual data
- text generation
- textual information
- visual features
- cross media retrieval
- audio content
- text retrieval
- e learning
- keywords
- text documents
- audio visual
- database
- semantic content
- multimedia
- multi modal
- free text
- text data
- text to speech
- information retrieval
- visually impaired users
- image alignment
- video search
- web images
- news video
- natural language generation
- semantic context
- human language
- personalized recommendation
- content based video retrieval
- video indexing and retrieval
- document analysis
- multimedia documents