Chapter 1 • Exercise 1
Part 1
By law of iterative expectation
\begin{align}
\mathbb{E}\left[Z_i(f)\right] &= \mathbb{E}\left[\mathbb{E}\left[Z_i(f) \mid x_i\right]\right] \\
&= \mathbb{E}\left[\mathbb{E}\left[f(x_i)^2 - f^*(x_i)^2 - 2 y f(x_i) + 2 y f^*(x_i) \mid x_i\right]\right] \\
&= \mathbb{E}\left[\mathbb{E}\left[f(x_i)^2 - f^*(x_i)^2 - 2 f^*(x_i) f(x_i) + 2 f^*(x_i) f^*(x_i)\right]\right] \\
&= \mathbb{E}\left[\mathbb{E}\left[\left(f(x_i) - f^*(x_i)\right)^2 \mid x_i\right]\right] \\
&= \mathbb{E}\left[\left(f(x_i) - f^*(x_i)\right)^2\right]
\end{align}
Part 2
We have
\begin{align}
\mathrm{var}\left[Z_i\right] &\leq \mathbb{E}\left[Z_i^2\right] \\
&\leq \mathbb{E}\left[\left((f(x_i) - f^*(x_i))( f(x_i) + f^*(x_i) - 2 y)\right)^2\right] \\
&\leq 4\mathbb{E}\left[\left(f(x_i) - f^*(x_i)\right)^2\right]
\end{align}
Here we use the fact that all \(f(x), f^*(x), y \in [0, 1]\)
Part 3
We apply Bernstein's inequality on \(-Z_i\). Note that \(|\mathbb{E}[Z_i] - Z_i| \leq 1\). Hence with probability at least \(1-\delta\), we have
\begin{align}
\frac{1}{T} \sum_{i=1}^T \left(\mathbb{E}[Z_i] - Z_i\right) &\leq \sqrt{\frac{2 \mathbb{V}[Z_i] \log {1 / \delta}}{T}} + \frac{\log{1 / \delta}}{3T} \\
\end{align}
Since \(\mathbb{E}[Z_i] = \mathcal{E}(f)\) and \(\mathbb{V}[Z_i] \leq 4 \mathcal{E}(f)\), we have
\begin{align}
\mathcal{E}(f) - \left(\hat{L}(f) - \hat{L}(f^\star)\right) &\leq \sqrt{\frac{8 \mathcal{E}(f) \log {1 / \delta}}{T}} + \frac{\log{1 / \delta}}{3T} \\
&\leq \frac{\mathcal{E}(f)}{2} + \frac{4 \log {1 / \delta}}{T} + \frac{\log{1 / \delta}}{3T}\\
&= \frac{1}{2} \mathcal{E}(f) + \frac{13}{3} \frac{\log(1 / \delta)}{T}
\end{align}
Note here we used AM_GM in step 2 and that \(\frac{1}{T} \sum_{i=1}^{T} Z_i = \hat{L}(f) - \hat{L}(f^*)\). Rearranging, we have that with probability at least \((1 - \delta)\)
\begin{align}
\mathcal{E}(f) &\leq 2(\hat{L}(f) - \hat{L}(f^*)) + C\frac{\log(1 / \delta)}{T}
\end{align}
where \(C = 26/3\).
Part 4
By union bound, we have with probability at least \(1 - |\mathcal{F}|\delta\),
\begin{align}
\forall f \in \mathcal{F}, \ \mathcal{E}(f) \leq 2 (\hat{L}(f) - \hat{L}(f^*)) + \frac{C \log (1 / \delta)}{T}
\end{align}
Setting \(\delta = \delta / |\mathcal{F}|\), we have that with probability at least \(1 - \delta\)
\begin{align}
\forall f \in \mathcal{F}, \ \mathcal{E}(f) \leq 2 (\hat{L}(f) - \hat{L}(f^*)) + \frac{C \log (|\mathcal{F}| / \delta)}{T}
\end{align}
For \(f = \hat{f}\), since \(\hat{L}(\hat{f}) \leq \hat{L}(f^*)\), we must have that with probability at least \(1 - \delta\):
\begin{align}
\mathcal{E}(\hat{f}) \leq \frac{C \log (|\mathcal{F}| / \delta)}{T}
\end{align}
as desired.
Return home