weight_optimizer – Selection of weight optimizers

Description

A weight optimizer is an algorithm that adjusts the synaptic weights in a network during training to minimize the loss function and thus improve the network’s performance on a given task.

This method is an essential part of plasticity rules like e-prop plasticity.

Currently two weight optimizers are implemented: gradient descent and the Adam optimizer.

In gradient descent [1] the weights are optimized via:

\[W_t = W_{t-1} - \eta \, g_t \,,\]

whereby \(\eta\) denotes the learning rate and \(g_t\) the gradient of the current time step \(t\).

In the Adam scheme [2] the weights are optimized via:

\[\begin{split}m_0 &= 0, v_0 = 0, t = 1 \,, \\ m_t &= \beta_1 \, m_{t-1} + \left(1-\beta_1\right) \, g_t \,, \\ v_t &= \beta_2 \, v_{t-1} + \left(1-\beta_2\right) \, g_t^2 \,, \\ \hat{m}_t &= \frac{m_t}{1-\beta_1^t} \,, \\ \hat{v}_t &= \frac{v_t}{1-\beta_2^t} \,, \\ W_t &= W_{t-1} - \eta\frac{\hat{m_t}}{\sqrt{\hat{v}_t} + \epsilon} \,.\end{split}\]

Parameters

The following parameters can be set in the status dictionary.

Common optimizer parameters

Parameter

Unit

Math equivalent

Default

Description

batch_size

1

Size of batch

eta

\(\eta\)

1e-4

Learning rate

Wmax

pA

\(W_{ji}^\text{max}\)

100.0

Maximal value for synaptic weight

Wmin

pA

\(W_{ji}^\text{min}\)

-100.0

Minimal value for synaptic weight

Gradient descent parameters (default optimizer)

Parameter

Unit

Math equivalent

Default

Description

type

gradient_descent

Optimizer type

Adam optimizer parameters

Parameter

Unit

Math equivalent

Default

Description

type

adam

Optimizer type

beta_1

\(\beta_1\)

0.9

Exponential decay rate for first moment estimate

beta_2

\(\beta_2\)

0.999

Exponential decay rate for second moment estimate

epsilon

\(\epsilon\)

1e-8

Small constant for numerical stability

The following state variables evolve during simulation.

Adam optimizer state variables for individual synapses

State variable

Unit

Math equivalent

Initial value

Description

m

\(m\)

0.0

First moment estimate

v

\(v\)

0.0

Second moment raw estimate

References

See also

E-Prop Plasticity

Examples using this model