Goal: for a function f:X→[0,1] where ∣X∣=K<∞ find x∈Xmaxf(x) based on noisy observations f(xt)+εxt where εxt is random.
Regret: R(T)=t=1∑Tf(x∗)−f(xt) where x∗=x∈Xargmaxf(x).
Theorem. For any algorithm there is an f such that E[R(T)]≥Ω(KT).
Result (Hoeffding). Let y1,..,yt be an IID sequence of random variables with values in [0,1]. Let yˉt be their sample mean. Then P(E(y1)>yˉt+δ)≤e−2tδ2.
Assume f(xt)+εxt satisfy Hoeffding's inequality. Choose δ,η so that P(∣yˉt(x)−f(x)∣≤σt(x)nt(x)2lnT)≥1−η. where
Theorem. Hoeffding–UCB's regret satisfies E[R(T)]≤O(KT) uniformly for all f.
Key idea. With sufficiently high probability, we have Δ(xt)=f(x∗)−f(xt)≤ft+(x∗)−ft−(xt)≤ft+(xt)−ft−(xt)=2σt(xt)=2nt(x)2lnT=t=TO(nT−1/2) using ft−(x)≤f(x)≤ft+(x), and ft+(xt)≥ft+(x), ∀x,t.
This gives E[R(T)]=t=1∑TΔ(xt)f(x∗)−f(xt)=x∈X∑O(nT−1/2)Δ(x)nT(x)≤x∈X∑O(nT(x))≤O(Kx∈X∑nT(x))=O(KT).