On the rate of convergence of an over-parametrized Transformer classifier learned by gradient descent.
Michael KohlerAdam KrzyzakPublished in: CoRR (2023)
Keyphrases
- learning phase
- training data
- learning stage
- training set
- classifier systems
- fuzzy logic
- feature selection
- classification method
- support vector machine
- cost function
- update rule
- learning algorithm
- classification process
- learning classifier systems
- class labels
- convergence speed
- positive training examples
- objective function
- support vector
- feature space
- loss function
- training samples
- dependent features
- multiple classifiers
- nearest neighbor classifier
- iterative algorithms
- feature set
- fault diagnosis
- training examples
- linear classifiers
- classification rate
- power system
- convergence rate
- svm classifier