AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection.
Sahibzada Adil ShahzadAmmarah HashmiYan-Tsung PengYu TsaoHsin-Min WangPublished in: CoRR (2023)
Keyphrases
- audio visual
- video scene
- multi modal
- multimedia
- visual data
- tv broadcast
- face detection and tracking
- multimodal fusion
- video sequences
- audio features
- activity detection
- video data
- detection algorithm
- false positives
- sports video
- mouth region
- real time
- shot boundary detection
- shot detection
- object detection and tracking
- detection rate
- event detection
- anomaly detection
- multi stream
- multi modal fusion
- false alarms
- detection method
- video analysis
- space time
- visual information
- video database
- video frames
- video streams
- visual analysis
- video indexing
- video content
- multimodal interaction
- computer vision
- object detection
- story segmentation
- automatic detection
- video surveillance
- video recordings
- text detection
- video clips
- soccer video
- motion features
- detection accuracy