Chapter 3 • Exercise 5
We have the following results:
Lemma 1: With probability \(1 - \delta\), for any \(t \in [T]\), any \(x \in \mathcal{X}\) and any \(\pi \in \Pi\), we have
\begin{align}
|\hat{f}^t(x, \pi) - f^\star(x, \pi)| \leq \sqrt{\frac{2 \log (2T^2 A |\mathcal{X}|/ \delta)}{n^t(x, \pi)}}
\end{align}
Proof: This follows from just applying union bound to hoeffding's inequalities over all \(x \in \mathcal{X}\) and \(\pi \in \Pi\).
Lemma 2:
\begin{align}
\sum_{i=1}^{T} \frac{1}{\sqrt{n^t(x^t, \pi^t)}} \land 1 \lesssim \sqrt{A|\mathcal{X}|T}
\end{align}
Proof: This is just the confidence width potential lemma applied to \(\mathcal{X} \times [A]\) instead of \([A]\).
Coming back to main proof, we just combine both lemmas and lemma 7 from the book to get
\begin{align}
\sum_{i=1}^T f^\star(x^T, \pi^\star(x^t)) - f^\star(\pi^t) &\leq \sum_{x \in \mathcal{X}} \sum_{i = 1}^T \left( f^\star(x, \pi^\star(x^t)) - f^\star(x, \pi^t) \right) \bm{1}_{x^t = x} \\
&\leq \sum_{x \in \mathcal{X}} \sum_{t=1}^{T} \left(\bar{f}^t(x, \pi^t) - \underline{f}^t(x, \pi^t)\right) \bm{1}_{x^t = x} \\
&\leq \sum_{x \in \mathcal{X}} \sum_{t=1}^{T} \bm{1}_{x^t = x} \cdot 2 \sqrt{\frac{2 \log (2T^2 A |\mathcal{X}|/ \delta)}{n^t(x^t, \pi^t)}} \land 1 \\
&\lesssim \sqrt{|A| \mathcal{X} T \log (T^2 A / \delta)}
\end{align}
as desired.
Return home