Login / Signup

Implicit Bias and Fast Convergence Rates for Self-attention.

Bhavya VasudevaPuneesh DeoraChristos Thrampoulidis
Published in: CoRR (2024)
Keyphrases
  • convergence rate
  • step size
  • learning rate
  • convergence speed
  • primal dual
  • global convergence
  • conjugate gradient
  • numerical stability
  • mutation operator
  • stopping criterion
  • particle swarm optimization