Neural Networks

This is a listing of all the equations in the BSc Artificial Intelligence: Neural Networks course. The equations from the lecture notes, with the exception of Resevoir Computing are here.

Good Luck with the exam!


Below are links to pages explaining the equations, and then pages explaining the symbols used. This website is made to be interactive, so please do play around with that.

You will find: pop-up boxes when hovering over (most) symbols, clickable buttons to symbol pages for more information, and clickable buttons to the prerequisites.

The interactivity is best experienced on a laptop/desktop.


Please do give us feedback with the smiley/sad faces at the bottom of most pages with an optional short message.

We would also really appreciate any more detailed feedback to us at equationtology@gmail.com.


Equations List

Counting loss | \(L(h(u), y) = \begin{cases} 0, \text{if } h(u) = y\\ 1, \text{if } h(u) \ne y \end{cases}\)
General Form of a Loss Function | \(L : \mathbb{R}^{n} \times \mathbb{R}^{n} \rightarrow \mathbb{R}_{\geq 0}\)
Quadratic Loss (L2) | \(L(h(u), y) = \Vert h(u) - y \Vert ^2\)
Risk of a Model | \(R(\hat{f}) = E [L(\hat{f}(U), Y)]\)
Mean Squared Error Loss (MSE) | \(L_\text{MSE}(h(u), y) = \frac{1}{N} \sum_{i=1}^{N} (h(u_i) - y_i)^2\)
MSE Minimization | \(\hat{f} = \argmin_{h \in \mathcal{H}} \frac{1}{N} \sum_{i=1}^{N} \left( h(u_i) - y_i \right)^2\)
Empirical Risk of a Model | \(R^{emp}(h) = \frac{1}{N} \sum^{N}_{i=1} L (h(u_i), y_i)\)
Risk of Optimal Model | \(\hat{f} = h_{opt} = \underset{h \in \mathcal{H}}{argmin} \hspace{0.2cm} \frac{1}{N} \sum^{N}_{i=1} L (h(u_i), y_i)\)
Operationalization of Supervised Learning | \(\mathcal{A}(S) = \underset{h \in \mathcal{H}}{argmin} \hspace{0.2cm} \frac{1}{N} \sum^{N}_{i=1} L (h(u_i), y_i)\)
Loss Minimization with Regularization | \(\hat{f} = \argmin_{h \in \mathcal{H}} \left[ \frac{1}{N} \sum_{i=1}^{N} L\left( h(u_i), y_i \right) + \alpha^2 \textup{reg}(\theta_{h}) \right]\)
General Form of a Regularization Function | \(\textup{reg} : \Theta \rightarrow \mathbb{R}_{\geq 0}\)
Approximation of Performance Landscape | \(R(\theta) = \sum_{i=1}^Dx_i\theta_i^2\)
Update rule of the Gradient Descent | \(\theta \leftarrow \theta - \mu \nabla R (\theta)\)
L2 Regularization | \(\textup{reg}(\theta) = \sum_{\mathbf{W} \in \theta} \mathbf{W}^2 = \Vert \theta \Vert^2\)
Goal of Supervised Learning | \(\hat{f} = h_{opt} = \underset{h \in \mathcal{H}}{argmin} \hspace{0.2cm} E[L(\hat{f}(U), Y)]\)
Polynomial Curve Fitting | \(p(u) = \omega_0 + \omega_1 u + ... + \omega_{k} u^{k}\)
RNN Update Equation | \(\mathbf{h}(n+1)=\sigma\left(\mathbf{W} \mathbf{h}(n)\right)\)
Function Estimated by Perceptron | \(f(u) = \begin{cases} 1, \text{ if } \;\displaystyle \sum_{i=1}^{K} w_{i} u_{i} \geq 0 \\ 0, \text{ otherwise} \end{cases}\)
Risk Minimization for MLPs | \(\theta_\text{opt} = \argmin_{\theta \in \Theta} \frac{1}{N} \sum_{i=1}^{N} L(\mathcal{N}_{\theta}(u_i), y_i)\)
Gradient of the performance surface | \(\nabla R(\theta)=(\frac{\delta R}{\delta \theta_1}, ..., \frac{\delta R}{\delta \theta_D})\)
Activation of a neuron | \(x^\kappa_i = \sigma(\sum_{j=1}^{L^{k - 1}}\theta_{i j}^{k} x_{i}^{k-1} + \theta_{i 0}^{k})\)
General form of an activation function | \(\sigma: \mathbb{R}^{n} \rightarrow \mathbb{R}^{n}\)
Rectified Linear Unit | \(\sigma(u) = max(0,u)\)
Activation of a layer | \(x^\kappa = \sigma(\mathbf{W}[1;x^{k - 1}])\)
Activation of the output layer | \(\mathbf{y}=\mathcal{x}^{k}=\mathbf{W}^{k}\mathcal{x}^{k-1}\)
Recurrent Neural Network Update Equations | \( \mathcal{x}(n) = \sigma (\mathbf{W} \mathcal{x}(n-1) + \mathbf{W}^{in} u(n) + \mathcal{b}) \\ \mathbf{y}(n) = f(\mathbf{W}^{out} \mathcal{x}(n)) \)
Gradient Empirical Risk | \(\nabla R^\text{emp}(\mathcal{N}_{\theta^{(n)}})=\left(\frac{\partial R^\text{emp}}{\partial \theta_1}(\theta^{(n)}),\dots,\frac{\partial R^\text{emp}}{\partial \theta_{L}}(\theta^{(n)})\right)\)
Gradient Empirical Risk (sum of gradients) | \(\nabla R^\text{emp}(\mathcal{N}_{\theta})=\frac{1}{N}\sum_{{i}=1,\dots,N}\nabla L\left(\mathcal{N}_{\theta}(u_{i}), \mathbf{y}_{i}\right)\)
Recurrent Neural Network with Output Feedback | \(\mathcal{x}(n) = \sigma (\mathbf{W} \mathcal{x}(n-1) + \mathbf{W}^{in} u(n) + \mathbf{W}^{fb} \mathbf{y}(n-1) + \mathcal{b})\)
Input neuron of an LSTM | \(u(n+1) = \sigma(\mathbf{W}^u[1;x^u])\)
Output gate of an LSTM | \(g^\text{output}(n+1) = \sigma(\mathbf{W}^{g^\text{output}}[1;x^{g^\text{output}}])\)
Forget gate of an LSTM | \(g^\text{forget}(n+1) = \sigma(\mathbf{W}^{g^\text{forget}}[1;x^{g^\text{forget}}])\)
Input gate of an LSTM | \(g^\text{input}(n+1) = \sigma(\mathbf{W}^{g^\text{input}}[1;x^{g^\text{input}}])\)
Update of a memory cell in an LSTM | \(c(n+1)=g^\text{forget}(n+1)\cdot c(n) + g^\text{input} (n+1) \cdot u(n+1)\)
Recursive Definition of Recurrent Neural Networks | \( \mathcal{x}(n) = \\ = \sigma (\mathbf{W} \mathcal{x}(n - 1) + \mathbf{W}^{in} u(n)) \\ = \sigma (\mathbf{W} ( \sigma (\mathbf{W} \mathcal{x}(n - 2) + \mathbf{W}^{in} u(n - 1))) + \mathbf{W}^{in} u(n)) \\ = \sigma (\mathbf{W} (\sigma (\mathbf{W} ( \sigma (\mathbf{W} \mathcal{x}(n - 3) + \mathbf{W}^{in} u(n-2))) + \mathbf{W}^{in} u(n - 1))) + \mathbf{W}^{in} u(n)) \\ = ... \)
Temporal Evolution of Dynamical System | \(z ' = f(z)\)
General Form of an Update Operator | \(T : \mathcal{X} \rightarrow \mathcal{X}\)
Discrete-Time Update Operator | \(\mathbf{x}(n+1) = T( \mathbf{x}(n) )\)
Stochastic Discrete-Time Update Operator | \(p\left( \mathbf{X}_{n+1} = \mathbf{x}_{j} \,\vert\, \mathbf{X}_{n} = \mathbf{x}_{i} \right) = T(i, j)\)
Continuous-Time Update Operator (ODE) | \(\dot{\mathbf{x}}(t) = \frac{d}{dt}\mathbf{x}(t) = T(\mathbf{x}(t))\)
Discrete-Time System with Input | \(\mathbf{x}(n+1) = T\left( \mathbf{x}(n), \mathbf{u}(n) \right)\)
Discrete-Time Dynamical System | \(\begin{cases} \mathbf{x}(n+1) = T( \mathbf{x}(n), \mathbf{u}(n) ) \\ \mathbf{y}(n) = O( \mathbf{x}(n) ) \end{cases}\)
Stochastic Discrete-Time System with Input | \(p\left( \mathbf{X}_{n+1} = \mathbf{x}_{j} \,\vert\, \mathbf{randomVar}_{n} = \mathbf{x}_{i}, \mathbf{U}_{n} = a \right) = T_a(i, j)\)
Stochastic Discrete-Time Dynamical System | \(\begin{cases} p\left( \mathbf{X}_{n+1} = \mathbf{x}_{j} \,\vert\, \mathbf{X}_{n} = \mathbf{x}_{i}, \mathbf{U}_n = a \right) = T_a(i, j) \\ p\left( \mathbf{Y}_n = \mathbf{y}_k \,\vert\, \mathbf{X}_n = \mathbf{x}_{i} \right) = O( i, k ) \end{cases}\)
Continuous-Time Dynamical System | \(\begin{cases} \dot{\mathbf{x}}(t) = T(\mathbf{x}(t), \mathbf{u}(t)) \\ \mathbf{y}(t) = O(\mathbf{x}(t)) \end{cases}\)
Continuous-Time System with Input | \(\dot{\mathbf{x}}(t) = T(\mathbf{x}(t), \mathbf{u}(t))\)
Backpropagation - Unit Potential | \(a_{i}^\kappa=\sum_{j=1,\dots,L^{\kappa-1}}\mathbf{W}_{ij}^{\kappa}\mathcal{x}_{j}^{\kappa-1}\)
Markov Transition Matrix Entries | \(\left[ T \right]_{i,j} = p\left( \mathbf{X}_{n+1} = \mathbf{x}_{j} \,\vert\, \mathbf{X}_{n} = \mathbf{x}_{i} \right)\)
Output of an LSTM | \(\mathbf{y}(n) = g^\text{output}(n) \cdot c(n)\)
Energy of a state in a Hopfield Network | \(E(\mathbf{x}) = -\sum_{i,j=1,...,L}\mathbf{W}_{i j}\mathbf{x}_{i} \mathbf{x}_{j} = -\frac{1}{2}\mathbf{x}^T \mathbf{W} \mathbf{x}\)
Activation of a neuron in a Hopfield Network | \(\mathcal{x}_{i}(n + 1) = \text{sign}(\sum_{i \not = j} \mathbf{W}_{i j} \mathcal{x}_{j}(n))\)
Weight update of a Hopfield Network | \(\mathbf{W}(n) = \mathbf{W}(n - 1) + \mu (\xi\xi^{T} - I)\)
Analytical solution of a Hopfield Network | \(\mathbf{W} = \frac{1}{L}(\sum_{i=1,...,N} \xi_i \xi_i^{T} - NI)\)
Delta Equation | \(\delta_{i}^\kappa\dot{=}\frac{\partial L\left(\mathcal{N}_{\theta}(u),y\right)}{\partial a_{i}^\kappa}\)
Delta Equation by Backpropagation | \(\delta_{i}^\kappa=\sigma'(a_{i}^\kappa)\sum_{j=1,\dots,L^{\kappa+1}}\delta_{j}^{\kappa+1}\mathbf{W}_{ji}^{\kappa+1}\)
Neuron Potential to Activation | \(\sigma(a)=x\)
Weight update of a Heteroassociative Hopfield Network | \(\mathbf{W}(n) = \mathbf{W}(n - 1) + \mu\xi^{(i + 1)}\xi^{(i)T}\)
Energy of a Specific State in a Boltzmann Machine | \(E(\mathbf{s}) = -\sum_{i < j}w_{i j}\mathbf{s}_{i} \mathbf{s}_{j}\)
Boltzmann Normalization Constant/Partition Function | \(Z = Z(T) = \int_{\mathbf{s} \in S} \exp\left\{ - \frac{ E(\mathbf{s}) }{ T } \right\} d\mathbf{s}\)
Boltzmann Normalization Constant/Partition Function (Discrete) | \(Z = Z(T) = \sum_{\mathbf{s} \in S } \exp\left\{ - \frac{ E(\mathbf{s}) }{ T } \right\}\)
Boltzmann Distribution of Microstates | \(p(\mathbf{s}) = \frac{1}{Z} \exp\left\{ - \frac{ E(\mathbf{s}) }{ T } \right\}\)
Boltzmann Acceptance Function | \(P_{\text{accept}}(\mathbf{x}^* \,\vert\; \mathbf{x}_n) = \frac{ F(\mathbf{x}^*) }{ F(\mathbf{x}^*) + F(\mathbf{x}_n) }\)
Metropolis Acceptance Function | \(P_{\text{accept}}(\mathbf{x}^* \,\vert\; \mathbf{x}_n) = \begin{cases} 1, \textup{ if } F(\mathbf{x}^*) \geq F(\mathbf{x}_n) \\ \frac{F(\mathbf{x}^*)}{F(\mathbf{x}_n)}, \textup{ if } F(\mathbf{x}^*) < F(\mathbf{x}_n)\end{cases}\)
Ratio Metropolis Acceptance Function | \(\exp(E(\mathbf{x}_{n}) - E(\mathbf{x}^*))^{\frac{1}{T}}\)
Energy Change When One Unit in Boltzmann Machine Changes | \(- \Delta E_{i} = -\sum_{j} w_{i j} \mathbf{s}_{j}\)
Kullback-Leibler Divergence | \(KL(P, \hat{P}) = \sum_{\mathbf{s} \in S} P(\mathbf{s}) \frac{\log(P(\mathbf{s}))}{\log(\hat{P}(\mathbf{s}))}\)
Probability of Setting a Unit to 1 in a BM | \(P(\mathbf{s}_{i}^{n + 1} = 1 | \mathbf{\mathbf{s}}^{n}) = \frac{1}{1 + e^{- \Delta E_{i} /T}}\)
Gradients of KL Divergence with Respect to Weights | \(\frac{\delta KL(P_{target}(\mathbf{s}),P_{\mathbf{W}}(\mathbf{s}))}{\delta w_{i j}} = - \frac{1}{T}(p_{ij} - q_{ij})\)
Weight Update Rule for Boltzmann Machines | \(w_{i j}(n + 1) = w_{i j}(n) + \mu(p_{ij} - q_{ij})\)

Symbols List

Loss Function | \(L\)
Model | \(h\)
Input | \(u\)
Ground Truth | \(y\)
Activation Function | \(\sigma\)
Optimal Model | \(\hat{f}\)
Expectation | \(E\)
Random Variable Input | \(U\)
Random Variable Output | \(Y\)
Risk | \(R\)
Hypothesis Space | \(\mathcal{H}\)
Parameter Space | \(\Theta\)
Sample | \(S\)
Weight Vector | \(\theta\)
Regularization | \(\textup{reg}\)
Gradient | \(\nabla\)
Learning Rate | \(\mu\)
RNN Hidden State | \(\mathbf{h}\)
Network (Function Approximator) | \(\mathcal{N}\)
Polynomial | \(p\)
Polynomial Constant | \(\omega\)
Neuron activation | \(x^\kappa_i\)
Layer size | \(L\)
Layer activation | \(x^\kappa\)
Weights matrix | \(\mathbf{W}\)
Output activation vector | \(\mathbf{y}\)
Sigmoid | \(\sigma\)
Bias | \(\mathcal{b}\)
Activation Vector | \(\mathcal{x}\)
Number of Inputs | \(K\)
Number of Neurons | \(L\)
Number of Outputs | \(M\)
State of the input neuron of LSTM | \(u\)
State of the input gate neuron in LSTM | \(g^\text{input}\)
State of the output gate of an LSTM | \(g^\text{output}\)
Memory cell of an LSTM | \(c\)
State of the forget gate in an LSTM | \(g^\text{forget}\)
Update Operator | \(T\)
State Space of Dynamical System | \(\mathcal{X}\)
System State | \(\mathbf{x}\)
Output Function | \(O\)
System Output | \(\mathbf{y}\)
System Input | \(\mathbf{u}\)
Potential of a Unit | \(a\)
Energy | \(E\)
Training pattern | \(\xi\)
Random Variable | \(X\)
Transpose | \(T\)
Error of a Neuron | \(\delta\)
Generic Neuron Activation | \(x\)
Number of Samples | \(N\)
Temperature | \(T\)
Partition Function (Normalization constant) | \(Z\)
Microstate | \(\mathbf{s}\)
Space of Possible Microstates | \(S\)
Proposed Next State | \(\mathbf{x}^*\)
Proposal Distribution | \(P_{\text{prop}}\)
Acceptance Probability (Acceptance Function) | \(P_{\text{accept}}\)
General Measure Function | \(F\)
Weight in BM | \(w\)
Average Probability Overall | \(q_{ij}\)
Average Probability Samples | \(p_{ij}\)

Was this page helpful?