Gradient Empirical Risk (sum of gradients)

Prerequisites

Description

We can write the gradient of the empirical risk as a sum of gradients. This is also exactly how the gradient is computed in practice. By doing a sweep through the training set (an 'epoch'), we can compute the gradient by aggregating the gradient of the loss function with respect to each training sample.

Equation

\[\htmlId{tooltip-vectorGradient}{\nabla} \htmlId{tooltip-risk}{R}^\text{emp}(\htmlId{tooltip-network}{\mathcal{N}}_{\htmlId{tooltip-weightVector}{\theta}})=\frac{1}{N}\sum_{{\htmlId{tooltip-iter}{i}}=1,\dots,N}\htmlId{tooltip-vectorGradient}{\nabla} \htmlId{tooltip-loss}{L}\left(\htmlId{tooltip-network}{\mathcal{N}}_{\htmlId{tooltip-weightVector}{\theta}}(\htmlId{tooltip-input}{u}_{\htmlId{tooltip-iter}{i}}), \htmlId{tooltip-outputActivationVector}{\mathbf{y}}_{\htmlId{tooltip-iter}{i}}\right)\]

Symbols Used

\(\mathcal{N}\)

This is the symbol used for a function approximator, typically a neural network.

\(i\)

This is the symbol for an iterator, a variable that changes value to refer to a sequence of elements.

\(R\)

This symbol denotes the risk of a model.

\(\theta\)

This is the symbol we use for model weights/parameters.

\(\mathbf{y}\)

This symbol represents the output activation vector of a neural network.

\(L\)

This is the symbol for a loss function. It is a function that calculates how wrong a model's inference is compared to where it should be.

\(\nabla\)

This symbol represents the gradient of a function.

\(u\)

This symbol denotes the input of a model.

Derivation

  1. Recall the definition of the empirical risk \(\htmlId{tooltip-risk}{R}^\text{emp}\) of a model \(\htmlId{tooltip-network}{\mathcal{N}}_{\htmlId{tooltip-weightVector}{\theta}}\). \[\htmlId{tooltip-risk}{R}^{emp}(\htmlId{tooltip-model}{h}) = \frac{1}{N} \sum^{N}_{i=1} L (\htmlId{tooltip-model}{h}(\htmlId{tooltip-input}{u}_i), \htmlId{tooltip-groundTruth}{y}_i)\]
  2. Recall the definition of the gradient of the empirical risk \[\htmlId{tooltip-vectorGradient}{\nabla} \htmlId{tooltip-risk}{R}^\text{emp}(\htmlId{tooltip-network}{\mathcal{N}}_{\htmlId{tooltip-weightVector}{\theta}^{(\htmlId{tooltip-wholeNumber}{n})}})=\left(\frac{\partial \htmlId{tooltip-risk}{R}^\text{emp}}{\partial \htmlId{tooltip-weightVector}{\theta}_1}(\htmlId{tooltip-weightVector}{\theta}^{(\htmlId{tooltip-wholeNumber}{n})}),\dots,\frac{\partial \htmlId{tooltip-risk}{R}^\text{emp}}{\partial \htmlId{tooltip-weightVector}{\theta}_{\htmlId{tooltip-numNeurons}{L}}}(\htmlId{tooltip-weightVector}{\theta}^{(\htmlId{tooltip-wholeNumber}{n})})\right)\]
  3. We can plug in the definition of the empirical risk to obtain \[\htmlId{tooltip-vectorGradient}{\nabla} \htmlId{tooltip-risk}{R}^\text{emp}(\htmlId{tooltip-network}{\mathcal{N}}_{\htmlId{tooltip-weightVector}{\theta}})=\htmlId{tooltip-vectorGradient}{\nabla}\left(\frac{1}{N}\sum_{{\htmlId{tooltip-iter}{i}}=1,\dots,N} \htmlId{tooltip-loss}{L}\left(\htmlId{tooltip-network}{\mathcal{N}}_{\htmlId{tooltip-weightVector}{\theta}}(\htmlId{tooltip-input}{u}_{\htmlId{tooltip-iter}{i}}), \htmlId{tooltip-outputActivationVector}{\mathbf{y}}_{\htmlId{tooltip-iter}{i}}\right)\right).\]
  4. Using the linearity of differentiation, we know that the gradient of a sum is the sum of the gradients. This is similar to the sum rule, as you know from single-variable calculus. Also, note that we can take out the constant \(\frac{1}{N}\). We obtain: \[\htmlId{tooltip-vectorGradient}{\nabla} \htmlId{tooltip-risk}{R}^\text{emp}(\htmlId{tooltip-network}{\mathcal{N}}_{\htmlId{tooltip-weightVector}{\theta}})=\frac{1}{N}\sum_{{\htmlId{tooltip-iter}{i}}=1,\dots,N}\htmlId{tooltip-vectorGradient}{\nabla} \htmlId{tooltip-loss}{L}\left(\htmlId{tooltip-network}{\mathcal{N}}_{\htmlId{tooltip-weightVector}{\theta}}(\htmlId{tooltip-input}{u}_{\htmlId{tooltip-iter}{i}}), \htmlId{tooltip-outputActivationVector}{\mathbf{y}}_{\htmlId{tooltip-iter}{i}}\right)\] as required.

References

  1. Jaeger, H. (2024, May 4). Neural Networks (AI) (WBAI028-05) Lecture Notes BSc program in Artificial Intelligence. Retrieved from https://www.ai.rug.nl/minds/uploads/LN_NN_RUG.pdf

Was this page helpful?