Chapter 3 • Exercise 5

We have the following results:

Lemma 1: With probability \(1 - \delta\), for any \(t \in [T]\), any \(x \in \mathcal{X}\) and any \(\pi \in \Pi\), we have \begin{align} |\hat{f}^t(x, \pi) - f^\star(x, \pi)| \leq \sqrt{\frac{2 \log (2T^2 A |\mathcal{X}|/ \delta)}{n^t(x, \pi)}} \end{align}
Proof: This follows from just applying union bound to hoeffding's inequalities over all \(x \in \mathcal{X}\) and \(\pi \in \Pi\).

Lemma 2: \begin{align} \sum_{i=1}^{T} \frac{1}{\sqrt{n^t(x^t, \pi^t)}} \land 1 \lesssim \sqrt{A|\mathcal{X}|T} \end{align}
Proof: This is just the confidence width potential lemma applied to \(\mathcal{X} \times [A]\) instead of \([A]\).

Coming back to main proof, we just combine both lemmas and lemma 7 from the book to get \begin{align} \sum_{i=1}^T f^\star(x^T, \pi^\star(x^t)) - f^\star(\pi^t) &\leq \sum_{x \in \mathcal{X}} \sum_{i = 1}^T \left( f^\star(x, \pi^\star(x^t)) - f^\star(x, \pi^t) \right) \bm{1}_{x^t = x} \\ &\leq \sum_{x \in \mathcal{X}} \sum_{t=1}^{T} \left(\bar{f}^t(x, \pi^t) - \underline{f}^t(x, \pi^t)\right) \bm{1}_{x^t = x} \\ &\leq \sum_{x \in \mathcal{X}} \sum_{t=1}^{T} \bm{1}_{x^t = x} \cdot 2 \sqrt{\frac{2 \log (2T^2 A |\mathcal{X}|/ \delta)}{n^t(x^t, \pi^t)}} \land 1 \\ &\lesssim \sqrt{|A| \mathcal{X} T \log (T^2 A / \delta)} \end{align} as desired.

Return home