Risk of Optimal Model

Prerequisites

Description

The risk of an optimal model describes an empirical method for determining the optimal model \( \htmlId{tooltip-optimalModel}{\hat{f}} \) for a given problem. It accomplishes this task by evaluating all candidate models \( \htmlId{tooltip-model}{h} \) from the hypothesis space \( \htmlId{tooltip-hypothesisSpace}{\mathcal{H}} \) on a sampled dataset and selecting the model with the minimum risk.

Equation

\[\htmlId{tooltip-optimalModel}{\hat{f}} = h_{opt} = \underset{h \in \htmlId{tooltip-hypothesisSpace}{\mathcal{H}}}{argmin} \hspace{0.2cm} \frac{1}{N} \sum^{N}_{i=1} L (\htmlId{tooltip-model}{h}(\htmlId{tooltip-input}{u}_i), \htmlId{tooltip-groundTruth}{y}_i)\]

Symbols Used

\(\hat{f}\)

This symbol denotes the optimal model for a problem.

\(y\)

This symbol stands for the ground truth of a sample. In supervised learning this is often paired with the corresponding input.

\(\mathcal{H}\)

This is the symbol representing the set of possible models.

\(h\)

This symbol denotes a model in machine learning.

\(u\)

This symbol denotes the input of a model.

Derivation

  1. Recall the definition of the empirical risk of a model
    \[\htmlId{tooltip-risk}{R}^{emp}(\htmlId{tooltip-model}{h}) = \frac{1}{N} \sum^{N}_{i=1} L (\htmlId{tooltip-model}{h}(\htmlId{tooltip-input}{u}_i), \htmlId{tooltip-groundTruth}{y}_i)\]
  2. Now suppose that all our models \( \htmlId{tooltip-model}{h} \) are drawn from a hypothesis space \( \htmlId{tooltip-hypothesisSpace}{\mathcal{H}} \):

    The symbol \( \mathcal{H} \) denotes the set of possible models, often from a particular class like "polynomials of any degree" or "multi-layer perceptron networks". For any learning algorithm, \( \mathcal{H} \) indicates the space where an optimal model may be found.

  3. Using the definition of the optimal model \( \htmlId{tooltip-optimalModel}{\hat{f}} \):

    The symbol \(\hat{f}\) denotes the optimal model for a problem. It yields the lowest risk \( \htmlId{tooltip-risk}{R} \) for pairs of inputs and outputs. The goal of machine learning is to optimize \( \htmlId{tooltip-model}{h} \) until it becomes \(\hat{f}\).


    We observe that we need to take the model \( \htmlId{tooltip-model}{h} \) with the lowest risk. This can be done using the argmin operator.
  4. Therefore, we obtain
    \[\htmlId{tooltip-optimalModel}{\hat{f}} = \underset{h \in \htmlId{tooltip-hypothesisSpace}{\mathcal{H}}}{argmin} \hspace{0.2cm} \frac{1}{N} \sum^{N}_{i=1} L (\htmlId{tooltip-model}{h}(\htmlId{tooltip-input}{u}_i), \htmlId{tooltip-groundTruth}{y}_i) \]
    as required

Example

Suppose, we have the following models with their empirical risk calculated on an arbitrary dataset of samples:

\[\begin{align*}\htmlId{tooltip-risk}{R}^{emp}(\htmlId{tooltip-model}{h}_1) &= 3 \\\htmlId{tooltip-risk}{R}^{emp}(\htmlId{tooltip-model}{h}_2) &= 2.3 \\\htmlId{tooltip-risk}{R}^{emp}(\htmlId{tooltip-model}{h}_3) &= 6\end{align*}\]
Using the equation described above, we conclude observe that the optimal model \( \htmlId{tooltip-optimalModel}{\hat{f}} \) is the model /.h with the lowest risk.

Therefore, we obtain \( \htmlId{tooltip-optimalModel}{\hat{f}} \) = \(\htmlId{tooltip-model}{h}_2\).

References

  1. Jaeger, H. (n.d.). Neural Networks (AI) (WBAI028-05) Lecture Notes BSc program in Artificial Intelligence. Retrieved April 14, 2024, from https://www.ai.rug.nl/minds/uploads/LN_NN_RUG.pdf

Was this page helpful?