We can write the gradient of the empirical risk as a sum of gradients. This is also exactly how the gradient is computed in practice. By doing a sweep through the training set (an 'epoch'), we can compute the gradient by aggregating the gradient of the loss function with respect to each training sample.
\(\mathcal{N}\) | This is the symbol used for a function approximator, typically a neural network. |
\(i\) | This is the symbol for an iterator, a variable that changes value to refer to a sequence of elements. |
\(R\) | This symbol denotes the risk of a model. |
\(\theta\) | This is the symbol we use for model weights/parameters. |
\(\mathbf{y}\) | This symbol represents the output activation vector of a neural network. |
\(L\) | This is the symbol for a loss function. It is a function that calculates how wrong a model's inference is compared to where it should be. |
\(\nabla\) | This symbol represents the gradient of a function. |
\(u\) | This symbol denotes the input of a model. |
Was this page helpful?