Improving Audiovisual Active Speaker Detection in Egocentric Recordings with the Data-Efficient Image Transformer.
Jason ClarkeYoshihiko GotohStefan GoetzePublished in: ASRU (2023)
Keyphrases
- image data
- data sets
- data sources
- high quality
- image classification
- input image
- video recordings
- audio visual
- image analysis
- digital images
- satellite images
- image regions
- low level
- training data
- image features
- data points
- image quality
- multi modal
- image representation
- single image
- image retrieval
- visual data
- data structure