This equation represents the output gate of an LSTM. It transforms an external signal, \(x^{\htmlId{tooltip-forgetGateLSTM}{g^\text{forget}}}\), consisting of outputs of other LSTM blocks and other neurons in the neural network similarly to a typical multi-layer perceptron.
\(g^\text{forget}\) | This symbol represents the state of the forget gate of the LSTM. |
\(\mathbf{W}\) | This symbol represents the matrix containing the weights and biases of a layer in a neural network. |
\(\sigma\) | This symbol represents the sigmoid function. |
\(n\) | This symbol represents any given whole number, \( n \in \htmlId{tooltip-setOfWholeNumbers}{\mathbb{W}}\). |
Notice that the equation is analogous to the activation of a single layer \[\htmlId{tooltip-layerActivation}{x^\kappa} = \htmlId{tooltip-activationFunction}{\sigma}(\htmlId{tooltip-weightMatrix}{\mathbf{W}}[1;x^{\htmlId{tooltip-integer}{k} - 1}])\].
Derivation of this equation follows the same steps as the Activation of a layer, but the activation function is strictly sigmoid. No other activations can be used.
The task of this gate is to calculate a vector of "weights". These weights judge how important each part of the memory is. The larger the weight, the more important. The smaller the weight, the less important it is and should be forgotten. Naturally, this imposes the weights to be between 0 and 1, forcing the use of sigmoid as the activation function.
Was this page helpful?