Forget gate of an LSTM

Prerequisites

Description

This equation represents the output gate of an LSTM. It transforms an external signal, \(x^{\htmlId{tooltip-forgetGateLSTM}{g^\text{forget}}}\), consisting of outputs of other LSTM blocks and other neurons in the neural network similarly to a typical multi-layer perceptron.

Equation

\[\htmlId{tooltip-forgetGateLSTM}{g^\text{forget}}(\htmlId{tooltip-wholeNumber}{n}+1) = \htmlId{tooltip-sigmoid}{\sigma}(\htmlId{tooltip-weightMatrix}{\mathbf{W}}^{\htmlId{tooltip-forgetGateLSTM}{g^\text{forget}}}[1;x^{\htmlId{tooltip-forgetGateLSTM}{g^\text{forget}}}])\]

Symbols Used

\(g^\text{forget}\)

This symbol represents the state of the forget gate of the LSTM.

\(\mathbf{W}\)

This symbol represents the matrix containing the weights and biases of a layer in a neural network.

\(\sigma\)

This symbol represents the sigmoid function.

\(n\)

This symbol represents any given whole number, \( n \in \htmlId{tooltip-setOfWholeNumbers}{\mathbb{W}}\).

Derivation

Notice that the equation is analogous to the activation of a single layer \[\htmlId{tooltip-layerActivation}{x^\kappa} = \htmlId{tooltip-activationFunction}{\sigma}(\htmlId{tooltip-weightMatrix}{\mathbf{W}}[1;x^{\htmlId{tooltip-integer}{k} - 1}])\].

Derivation of this equation follows the same steps as the Activation of a layer, but the activation function is strictly sigmoid. No other activations can be used.

The task of this gate is to calculate a vector of "weights". These weights judge how important each part of the memory is. The larger the weight, the more important. The smaller the weight, the less important it is and should be forgotten. Naturally, this imposes the weights to be between 0 and 1, forcing the use of sigmoid as the activation function.

References

  1. Jaeger, H. (n.d.). Neural Networks (AI) (WBAI028-05) Lecture Notes BSc program in Artificial Intelligence. Retrieved April 27, 2024, from https://www.ai.rug.nl/minds/uploads/LN_NN_RUG.pdf

Was this page helpful?