Login / Signup

On the rate of convergence of an over-parametrized Transformer classifier learned by gradient descent.

Michael KohlerAdam Krzyzak
Published in: CoRR (2023)
Keyphrases