Output of an LSTM

Prerequisites

Description

This equation represents the output of an LSTM. It does not have to be the output of the whole network, but rather can act as an input to other parts of a neural network, e.g. linear or recurrent layers.

Equation

\[\htmlId{tooltip-outputActivationVector}{\mathbf{y}}(\htmlId{tooltip-wholeNumber}{n}) = \htmlId{tooltip-outputGateLSTM}{g^\text{output}}(\htmlId{tooltip-wholeNumber}{n}) \cdot \htmlId{tooltip-memoryCellLSTM}{c}(\htmlId{tooltip-wholeNumber}{n})\]

Symbols Used

\(g^\text{output}\)

This symbol represents the state of the output gate of the LSTM.

\(\mathbf{y}\)

This symbol represents the output activation vector of a neural network.

\(c\)

This symbol represents the memory cell of an LSTM.

\(n\)

This symbol represents any given whole number, \( n \in \htmlId{tooltip-setOfWholeNumbers}{\mathbb{W}}\).

Derivation

Remember that the memory cell, \(\htmlId{tooltip-memoryCellLSTM}{c}(\htmlId{tooltip-wholeNumber}{n})\) is a vector representing the network's memory. It holds the current state of the network:

\[\htmlId{tooltip-memoryCellLSTM}{c}(\htmlId{tooltip-wholeNumber}{n}+1)=\htmlId{tooltip-forgetGateLSTM}{g^\text{forget}}(\htmlId{tooltip-wholeNumber}{n}+1)\cdot \htmlId{tooltip-memoryCellLSTM}{c}(\htmlId{tooltip-wholeNumber}{n}) + \htmlId{tooltip-inputGateLSTM}{g^\text{input}} (\htmlId{tooltip-wholeNumber}{n}+1) \cdot \htmlId{tooltip-inputNeuronLSTM}{u}(\htmlId{tooltip-wholeNumber}{n}+1)\]

Further, \(\htmlId{tooltip-outputGateLSTM}{g^\text{output}}(\htmlId{tooltip-wholeNumber}{n})\) holds the transformed external signal, e.g. from previous layers:

\[\htmlId{tooltip-outputGateLSTM}{g^\text{output}}(\htmlId{tooltip-wholeNumber}{n}+1) = \htmlId{tooltip-sigmoid}{\sigma}(\htmlId{tooltip-weightMatrix}{\mathbf{W}}^{\htmlId{tooltip-outputGateLSTM}{g^\text{output}}}[1;x^{\htmlId{tooltip-outputGateLSTM}{g^\text{output}}}])\]

See Update of a memory cell in an LSTM and Output gate of an LSTM for more details.

Intuitively, we want the output of an LSTM to be influenced by both its state (memory) and newly acquired signal. We do it in the easiest way possible, by simply multiplying these vectors element-wise.

Example

Let the current memory cell be

\[\htmlId{tooltip-memoryCellLSTM}{c}(\htmlId{tooltip-wholeNumber}{n}) = \begin{bmatrix}0.7 \\0.3\end{bmatrix}\]

and the value of the output gate

\[\htmlId{tooltip-outputGateLSTM}{g^\text{output}}(\htmlId{tooltip-wholeNumber}{n}) = \begin{bmatrix}0.4 \\0.6\end{bmatrix}\]

Then, the output of the whole LSTM block is:

\[\htmlId{tooltip-outputGateLSTM}{g^\text{output}}(\htmlId{tooltip-wholeNumber}{n}) \cdot \htmlId{tooltip-memoryCellLSTM}{c}(\htmlId{tooltip-wholeNumber}{n})\]

After substituting the variables, we obtain the result.

\[\begin{bmatrix}0.7 \\0.3\end{bmatrix} \cdot\begin{bmatrix}0.4 \\0.6\end{bmatrix} = \begin{bmatrix}0.28 \\0.18\end{bmatrix}\]

References

  1. Jaeger, H. (n.d.). Neural Networks (AI) (WBAI028-05) Lecture Notes BSc program in Artificial Intelligence. Retrieved April 27, 2024, from https://www.ai.rug.nl/minds/uploads/LN_NN_RUG.pdf

Was this page helpful?