Talk for ML Scholars Workshop 2026
Joint work with Qian Xie, Linda Cai, Theo Brown, Raul Astudillo, Peter Frazier, and Ziv Scully
Alexander Terenin
Automatic explore-exploit tradeoff
Motivating application: neural network hyperparameter tuning
Goal: maximize test accuracy under expected compute budget $$ \htmlClass{fragment}{ \begin{aligned} &\max_{t=1,..,T} f(x_t) & &\t{subject to} & &\E\sum_{\smash{t=1}}^{\smash{T}} c(x_t) \leq B \end{aligned} } $$
Research question: how to choose $\alpha(x)$?
Broader goal: from fundamental perspective, how should we think about questions like this?
Expected improvement per unit cost $$ \begin{aligned} \htmlClass{fragment}{ \alpha_{t}^{\f{EIPC}}(x) } & \htmlClass{fragment}{ = \frac{\f{EI}_{f\given y_1,..,y_t}(x; \max_{1\leq\tau\leq t} y_\tau)}{c(x)} } & \htmlClass{fragment}{ \f{EI}_\psi(x; y) } & \htmlClass{fragment}{ = \E \max(0, \psi(x) - y) } \end{aligned} $$
Direct extension of non-cost-aware state-of-the-art
Challenge: what should one do instead?
My work: new cost-aware acquisition function with strong performance
Key insight: novel connection to a fundamental decision problem
Two assumptions:
replaces budget constraint
looks really restrictive, but leads to an interesting general solution
This setting: Pandora's Box!
Key difficulty: rewards are stochastic – boxes are hard to compare
Approach: compute a deterministic fair value
Should we open the closed box?
Small $g$: yes
Large $g$: no
If both opening and not opening are co-optimal: $g$ is a fair value
Define: $\alpha^\star(x) = g$ where $g$ solves $\f{EI}_f(x; g) = c(x)$
co-optimality equation
Gittins index
Theorem (Weitzman, 1979). This policy is optimal in expectation.
Caveat – not true with correlated boxes!
...but the definition of a Gittins index still makes sense
Additional difference: expected budget constraint
Gittins index $\alpha^\star$: optimal for cost per sample problem
Theorem. Assume the expected budget constraint is feasible and active. Then there exists a $\lambda \gt 0$ and a tie-breaking rule such that the policy defined by maximizing the Gittins index acquisition function $\alpha^\star(\.)$, defined using costs $\lambda c(x)$, is Bayesian-optimal.
Proof idea: Lagrangian duality
Bayesian optimization: posterior distribution is correlated
Define the Pandora's Box Gittins Index acquisition function:
$\alpha^{\f{PBGI}}_t(x) = g\quad$ where $g$ solves $\quad\f{EI}_{\htmlClass{anchor-1}{f\given y_1,..,y_t}}(x; g) = \htmlClass{anchor-2}{\lambda}\,c(x)$
correlatedposterior
Largange multiplierfrom budget constraint
Controls risk-averse vs. risk-seeking behavior
should be tuned to match budget
Limit as $\lambda\to0$: converges to UCB
with automatic learning rate
Not an optimal policy. But, recall the caveat: maybe a strong one?
Looks promising! What about in practice?
Ackley function
Levy function
Rosenbrock function
Lessons for bigger picture: role of costs and budget, early stopping, and more...
this work
follow-up
Neural network training: proceeds over time
New goal: maximize test accuracy, allowing for early stopping $$ \htmlClass{fragment}{ \begin{aligned} &\max_{k=1,..,K} f(x_{k,T_k}) & &\t{subject to} & &\E\sum_{\smash{k=1}}^{\smash{K}} \sum_{\smash{t=1}}^{\smash{T_k}} c(x_{t,k}) \leq B \end{aligned} } $$
Cost-aware freeze-thaw Bayesian optimization: this setting is novel!
Transient Markov chain: closed box $\leadsto$ open box $\leadsto$ selected box
Instead of choosing between Pandora's Boxes, we choose between Markov chains!
Pandora's Box: solved by considering one closed and one open box
This is the general form of the Gittins index!
Theorem. In MCS, choosing the Markov chain of maximal Gittins index is optimal.
Caveat! Optimality is fragile. Definitions are not!
Example goal: maximize test accuracy, allowing for early stopping $$ \htmlClass{fragment}{ \begin{aligned} &\max_{k=1,..,K} f(x_{k,T_k}) & &\t{subject to} & &\E\sum_{\smash{k=1}}^{\smash{K}} \sum_{\smash{t=1}}^{\smash{T_k}} c(x_{t,k}) \leq B \end{aligned} } $$
Admits a well-defined Gittins index:
open problem: regret analysis
open problem: non-discrete numerics
Long-term question: how do we generate synthetic data for training decision foundation models?
for applications in different domains
Q. Xie, R. Astudillo, P. Frazier, Z. Scully, and A. Terenin. Cost-aware Bayesian optimization via the Pandora's Box Gittins index. NeurIPS, 2024. INFORMS Data Mining Best Paper Finalist.
Q. Xie, L. Cai, A. Terenin, P. Frazier, and Z. Scully. Cost-aware Stopping for Bayesian Optimization. In review at NeurIPS, 2025.
Z. Scully and A. Terenin. Gittins Index: A Design Principle for Decision-making Under Uncertainty. INFORMS Tutorials in Operations Research, 2025.