Publication: DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment.