• search
    search
  • reviewers
    reviewers
  • feeds
    feeds
  • assignments
    assignments
  • settings
  • logout

Implicit Bias and Fast Convergence Rates for Self-attention.

Bhavya VasudevaPuneesh DeoraChristos Thrampoulidis
Published in: CoRR (2024)
Keyphrases
  • convergence rate
  • step size
  • learning rate
  • convergence speed
  • primal dual
  • global convergence
  • conjugate gradient
  • numerical stability
  • mutation operator
  • stopping criterion
  • particle swarm optimization