Chapter 1 • Exercise 1

Part 1

By law of iterative expectation \begin{align} \mathbb{E}\left[Z_i(f)\right] &= \mathbb{E}\left[\mathbb{E}\left[Z_i(f) \mid x_i\right]\right] \\ &= \mathbb{E}\left[\mathbb{E}\left[f(x_i)^2 - f^*(x_i)^2 - 2 y f(x_i) + 2 y f^*(x_i) \mid x_i\right]\right] \\ &= \mathbb{E}\left[\mathbb{E}\left[f(x_i)^2 - f^*(x_i)^2 - 2 f^*(x_i) f(x_i) + 2 f^*(x_i) f^*(x_i)\right]\right] \\ &= \mathbb{E}\left[\mathbb{E}\left[\left(f(x_i) - f^*(x_i)\right)^2 \mid x_i\right]\right] \\ &= \mathbb{E}\left[\left(f(x_i) - f^*(x_i)\right)^2\right] \end{align}

Part 2

We have \begin{align} \mathrm{var}\left[Z_i\right] &\leq \mathbb{E}\left[Z_i^2\right] \\ &\leq \mathbb{E}\left[\left((f(x_i) - f^*(x_i))( f(x_i) + f^*(x_i) - 2 y)\right)^2\right] \\ &\leq 4\mathbb{E}\left[\left(f(x_i) - f^*(x_i)\right)^2\right] \end{align} Here we use the fact that all \(f(x), f^*(x), y \in [0, 1]\)

Part 3

We apply Bernstein's inequality on \(-Z_i\). Note that \(|\mathbb{E}[Z_i] - Z_i| \leq 1\). Hence with probability at least \(1-\delta\), we have \begin{align} \frac{1}{T} \sum_{i=1}^T \left(\mathbb{E}[Z_i] - Z_i\right) &\leq \sqrt{\frac{2 \mathbb{V}[Z_i] \log {1 / \delta}}{T}} + \frac{\log{1 / \delta}}{3T} \\ \end{align} Since \(\mathbb{E}[Z_i] = \mathcal{E}(f)\) and \(\mathbb{V}[Z_i] \leq 4 \mathcal{E}(f)\), we have \begin{align} \mathcal{E}(f) - \left(\hat{L}(f) - \hat{L}(f^\star)\right) &\leq \sqrt{\frac{8 \mathcal{E}(f) \log {1 / \delta}}{T}} + \frac{\log{1 / \delta}}{3T} \\ &\leq \frac{\mathcal{E}(f)}{2} + \frac{4 \log {1 / \delta}}{T} + \frac{\log{1 / \delta}}{3T}\\ &= \frac{1}{2} \mathcal{E}(f) + \frac{13}{3} \frac{\log(1 / \delta)}{T} \end{align} Note here we used AM_GM in step 2 and that \(\frac{1}{T} \sum_{i=1}^{T} Z_i = \hat{L}(f) - \hat{L}(f^*)\). Rearranging, we have that with probability at least \((1 - \delta)\) \begin{align} \mathcal{E}(f) &\leq 2(\hat{L}(f) - \hat{L}(f^*)) + C\frac{\log(1 / \delta)}{T} \end{align} where \(C = 26/3\).

Part 4

By union bound, we have with probability at least \(1 - |\mathcal{F}|\delta\), \begin{align} \forall f \in \mathcal{F}, \ \mathcal{E}(f) \leq 2 (\hat{L}(f) - \hat{L}(f^*)) + \frac{C \log (1 / \delta)}{T} \end{align} Setting \(\delta = \delta / |\mathcal{F}|\), we have that with probability at least \(1 - \delta\) \begin{align} \forall f \in \mathcal{F}, \ \mathcal{E}(f) \leq 2 (\hat{L}(f) - \hat{L}(f^*)) + \frac{C \log (|\mathcal{F}| / \delta)}{T} \end{align} For \(f = \hat{f}\), since \(\hat{L}(\hat{f}) \leq \hat{L}(f^*)\), we must have that with probability at least \(1 - \delta\): \begin{align} \mathcal{E}(\hat{f}) \leq \frac{C \log (|\mathcal{F}| / \delta)}{T} \end{align} as desired.

Return home