Causal Interpretation of Self-Attention in Pre-Trained Transformers.

Raanan Y. Rohekar Yaniv Gurwicz Shami Nisimov

Published in: CoRR (2023)

Keyphrases

pre trained
training data
training examples
control signals
focus of attention
data sets
neural network
bayesian networks
reinforcement learning
pairwise
active learning
generative model
face detection