AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection.
Ammarah HashmiSahibzada Adil ShahzadChia-Wen LinYu TsaoHsin-Min WangPublished in: CoRR (2023)
Keyphrases
- audio visual
- video scene
- video summarization
- visual data
- temporal segmentation
- multimedia
- multi modal
- meeting room
- temporal context
- visual information
- sports video
- video sequences
- multi stream
- audio features
- audio visual content
- event detection
- video streams
- person authentication
- audio visual speech recognition
- neural network
- video content
- video data
- image classification
- feature vectors
- spatio temporal
- feature selection
- search engine
- multimodal fusion