Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos.
Chiori HoriPuyuan PengDavid HarwathXinyu LiuKei OtaSiddarth JainRadu CorcodelDevesh K. JhaDiego RomeresJonathan Le RouxPublished in: CoRR (2023)
Keyphrases
- audio visual
- scene understanding
- robot navigation
- vision system
- video surveillance
- video summarization
- visual data
- multi modal
- human actions
- audio features
- object recognition
- object detection
- visual information
- d scene
- multi stream
- sound source
- multimedia
- computer vision
- action recognition
- video sequences
- real time
- audio visual speech recognition
- background subtraction
- video data
- visual features
- moving objects
- key frames
- video content
- event detection
- human activities
- single image
- multi class
- spatio temporal
- machine learning
- data sets