weight_optimizer – Selection of weight optimizers¶
Description¶
A weight optimizer is an algorithm that adjusts the synaptic weights in a network during training to minimize the loss function and thus improve the network’s performance on a given task.
This method is an essential part of plasticity rules like e-prop plasticity.
Currently two weight optimizers are implemented: gradient descent and the Adam optimizer.
In gradient descent [1] the weights are optimized via:
whereby \(\eta\) denotes the learning rate and \(g_t\) the gradient of the current time step \(t\).
In the Adam scheme [2] the weights are optimized via:
Parameters¶
The following parameters can be set in the status dictionary.
Common optimizer parameters |
||||
---|---|---|---|---|
Parameter |
Unit |
Math equivalent |
Default |
Description |
batch_size |
1 |
Size of batch |
||
eta |
\(\eta\) |
1e-4 |
Learning rate |
|
Wmax |
pA |
\(W_{ji}^\text{max}\) |
100.0 |
Maximal value for synaptic weight |
Wmin |
pA |
\(W_{ji}^\text{min}\) |
-100.0 |
Minimal value for synaptic weight |
Gradient descent parameters (default optimizer) |
||||
---|---|---|---|---|
Parameter |
Unit |
Math equivalent |
Default |
Description |
type |
gradient_descent |
Optimizer type |
Adam optimizer parameters |
||||
---|---|---|---|---|
Parameter |
Unit |
Math equivalent |
Default |
Description |
type |
adam |
Optimizer type |
||
beta_1 |
\(\beta_1\) |
0.9 |
Exponential decay rate for first moment estimate |
|
beta_2 |
\(\beta_2\) |
0.999 |
Exponential decay rate for second moment estimate |
|
epsilon |
\(\epsilon\) |
1e-8 |
Small constant for numerical stability |
The following state variables evolve during simulation.
Adam optimizer state variables for individual synapses |
||||
---|---|---|---|---|
State variable |
Unit |
Math equivalent |
Initial value |
Description |
m |
\(m\) |
0.0 |
First moment estimate |
|
v |
\(v\) |
0.0 |
Second moment raw estimate |