Monday 2:45pm: Building A, Level 4, Room A410
Many problems in statistical learning and game theory:
Learning in games via repeated play
Sample-efficient reinforcement learning
Confidence sets and generalization bounds
Under-the-hood technical tool: adversarial online learning
This talk: non-discrete action spaces
Adversary behaves arbitrarily $\leadsto$ actions need to be randomized
| Discrete Actions | Continuous Actions | |
| Representation | Array of probabilities | Unnormalized density? |
| Sampling | Straightforward | Markov Chain Monte Carlo? |
New result: novel algorithm and regret bound for bounded Lipschitz online learning
New analysis technique: probabilistic bounds for follow-the-perturbed-leader with correlated Gaussians
Joint work with Jeff Negrea
At each time $t = 1,..,T$:
Regret: $$ R(p,q) = \E_{\substack{x_t\~p_t\\y_t\~q_t}} \ubr{\sup_{x\in X} \sum_{t=1}^{\smash{T}} y_t(x)}{ \htmlClass{fragment}{ \substack{\t{single best action}\\\t{in hindsight}} } } - \ubr{\sum_{t=1}^{\smash{T}} y_t(x_t)}{ \htmlClass{fragment}{ \substack{\t{total reward of}\\\t{learner's actions}} } } $$
Standard approaches and challenges:
Core method design toolkit: convex optimization
Our starting point: follow-the-perturbed-leader (FTPL) methods
Start with adversary's perspective:
Start with adversary's perspective:
Start with adversary's perspective:
Algorithm: FTPL in the form of Thompson sampling
Regret: $\forall p$, $$ \ubr{R(p,q^{(\gamma)}) \vphantom{E_{q^{(\gamma)}_\infty}}}{ \htmlClass{fragment}{ \substack{\t{strong adversary's}\\[0.2ex]\t{lower bound}} } } \,\,\leq\,\, \ubr{R(p^{(\gamma)}_\infty,q) \vphantom{E_{q^{(\gamma)}_\infty}}}{ \htmlClass{fragment}{ \substack{\t{true regret of}\\[0.2ex]\t{our algorithm}} } } \htmlClass{fragment}{ \,\,=\,\, \ubr{R(p,q^{(\gamma)}_\infty) \vphantom{E_{q^{(\gamma)}_\infty}}}{ \htmlClass{fragment}{ \substack{\t{prior regret:}\\[0.2ex]\t{what we thought}\\[0.2ex]{\t{adversary would do}}} } } \,\,+\,\, \ubr{E_{q^{(\gamma)}_\infty}(p^{(\gamma)}_\infty,q)}{ \htmlClass{fragment}{ \substack{\t{excess regret:}\\[0.2ex]\t{what adversary}\\[0.2ex]{\t{actually did}}} } } } $$ where:
The hard part: smooth adversaries lead to correlated priors
Our approach: follow Bayesian intuition $$ \begin{aligned} \htmlData{fragment-index=5,class=fragment}{ \t{linear algebra} } \quad & \htmlData{fragment-index=5,class=fragment}{ \leftrightsquigarrow } \quad \htmlData{fragment-index=5,class=fragment}{ \t{conditional probability} } \\ \htmlData{fragment-index=6,class=fragment}{ \t{traces of matrices} } \quad & \htmlData{fragment-index=6,class=fragment}{ \leftrightsquigarrow } \quad \htmlData{fragment-index=6,class=fragment}{ \t{conditional expectations} } \\ \htmlData{fragment-index=7,class=fragment}{ \begin{gathered}L_{\infty,1} \t{matrix}\\[-1ex]\t{norm identities}\end{gathered} } \quad & \htmlData{fragment-index=7,class=fragment}{ \leftrightsquigarrow } \quad \htmlData{fragment-index=7,class=fragment}{ \begin{gathered}\t{balance of probability}\\[-1ex]\t{in truncated Gaussians}\end{gathered} } \end{aligned} $$
New result: general criterion connecting adversary's smoothness with learner's prior and correlations
Sanity check: in classical setting, do we recover optimal rates?
Adversarial problem: not obvious Bayesian learning makes sense
Theorem. Let $X = [0,1]^d$, $Y$ bounded Lipschitz with $\norm{y}_\infty\leq\beta$ and $\abs{y}_{\f{Lip}}\leq\lambda$, and the prior is Matérn-$\frac{1}{2}$ with variance $\beta$ and length scale $\frac{1}{\lambda}$. Then Thompson sampling achieves a regret of $$ R(p,q) \leq \beta \del{32 + \frac{32}{1 - \frac{1}{e}}}\sqrt{T d\log\del{1 + \sqrt{d}\frac{\lambda}{\beta}}} . $$
Criterion: $\displaystyle\sup_{y\in Y} y(x) - y(x')\frac{k(x,x')}{k(x',x')} \leq C_{Y,k} \del{1 - \frac{k(x,x')}{k(x',x')}} , \quad C_{Y,k} = \frac{2\beta}{\del{1 - e^{-\frac{2\beta}{\lambda\kappa+\beta}}}}$
Fundamental theoretical insights:
Next steps:
Longer-term:
A. Terenin and J. Negrea. An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces. arXiv:2502.14790, 2025.
Reminder: check out our Gittins Index tutorial! Monday 2:45pm: Building A, Level 4, Room A410