Read, Watch and Scream! Sound Generation from Text and Video.
Yujin JeongYunji KimSanghyuk ChunJiyoung LeePublished in: CoRR (2024)
Keyphrases
- audio content
- text generation
- video sequences
- natural language descriptions
- video streams
- video search
- multimedia
- news video
- video database
- video content
- real time
- free text
- video data
- closed captions
- video segments
- text detection
- video clips
- generation process
- semantic information
- space time
- information retrieval
- natural language
- text data
- wordnet
- database
- semantic labels
- tv programs
- text information
- music information retrieval
- moving objects
- natural language generation
- multimedia documents
- digital video
- web documents
- multimedia content
- multimedia data
- spatial and temporal