3D-MOV: Audio-Visual LSTM Autoencoder for 3D Reconstruction of Multiple Objects from Video.
Justin WilsonMing C. LinPublished in: CoRR (2021)
Keyphrases
- audio visual
- multiple objects
- video summarization
- visual data
- multimedia
- multi modal
- audio features
- audio visual content
- multiple object tracking
- visual information
- video data
- multimodal fusion
- particle filter
- video content
- multiple images
- multi stream
- video streams
- audio visual speech recognition
- video sequences
- tracking of multiple objects
- multimedia data
- video frames
- computer vision
- three dimensional
- real time
- space time
- video retrieval
- key frames
- single image
- state space
- video surveillance
- visual features
- hidden markov models
- spatio temporal