PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation.
Zeyu XieXuenan XuZhizheng WuMengyue WuPublished in: CoRR (2024)
Keyphrases
- text graphics
- multimedia
- audio video
- signal processing
- audio visual
- visual information
- semantic context
- audio signals
- soccer video
- database
- spoken documents
- information retrieval
- human language
- visual data
- event detection
- text generation
- video streams
- audio features
- speaker identification
- cross modal
- video search
- text retrieval