Talk for Cornell ORIE

The Gittins Index: A Design Principle for

Decision-making Under Uncertainty

Alexander Terenin

https://avt.im/ · @avt_im

Decision-making Under Uncertainty

via Bayesian Optimization

Motivation: hyperparameter optimization

Hyperparameter Optimization

Machine learning algorithms have hyperparameters:

Neural network: layer width and depth
Decision tree: number of splits
Training via optimization: learning rate and batch size

Goal: find best hyperparameters to minimize test loss $$ \htmlClass{fragment}{ \min_{\theta\in\Theta} L_{\f{test}}(\theta) } $$

This setup: an example of black-box global optimization

Feedback: choose hyperparameter $\leadsto$ observe test loss

Performance Impact

Bayesian Optimization

Automatic explore-exploit tradeoff

Bayesian Optimization

Goal: optimize unknown function $\phi$ in as few evaluations as possible

Build posterior $f \given y$ using data $(x_1,\phi(x_1)),..,(x_t,\phi(x_t))$
Choose $$ \htmlClass{fragment}{ x_{t+1} = \argmax_{x\in\c{X}} \alpha_{f\given y}(x) } $$ using the acquisition function $\alpha_{f\given y}$, built from the posterior

Useful property: separation of modeling from decision-making

Thompson Sampling

Same model, different decisions, similar performance

The Pandora's Box Gittins Index:

a new acquisition function design principle

Joint work with Qian Xie, Linda Cai, Raul Astudillo, Peter Frazier, and Ziv Scully

Motivation: handling evaluation costs

Some machine learning algorithms cost more than others to train:

Neural networks: width and depth change needed compute
Optimization: batch size affects GPU utilization

Goal: find best hyperparameters to minimize test loss $$ \htmlClass{fragment}{ \begin{aligned} &\min_{t=1,..,T} L_{\f{test}}(\theta_t) & &\t{subject to} & &\E\sum_{t=1}^T c(\theta_t) \leq B \end{aligned} } $$

Black-box global optimization under expected budget constraint:

Choose hyperparameter $\leadsto$ incur training cost $\leadsto$ observe test loss

Cost-aware Bayesian Optimization

Goal: minimize unknown function $\phi$ in as few evaluations as possible

$\phi$: for formulation purposes, assume drawn randomly from prior
$c(x_t)$: cost of getting new data point, expected budget constraint

Algorithm:

Build posterior $f \given y$ using data $(x_1,\phi(x_1)),..,(x_t,\phi(x_t))$
Find optimum of acquisition function $\alpha_{f\given y}$ and evaluate $\phi$ at $$ \htmlClass{fragment}{ x_{t+1} = \argmax_{x\in\c{X}} \alpha_{f\given y}(x) } $$

Optimal choice $x_{t+1}$ and when to stop: intractable dynamic program

Expected improvement per unit cost

Cost-aware baseline: expected improvement per unit cost $$ \begin{aligned} \htmlClass{fragment}{ \alpha_{t}^{\f{EIPC}}(x) } & \htmlClass{fragment}{ = \frac{\f{EI}_{f\given y_1,..,y_t}(x; \max_{1\leq\tau\leq t} y_\tau)}{c(x)} } & \htmlClass{fragment}{ \f{EI}_\psi(x; y) } & \htmlClass{fragment}{ = \E \max(0, \psi(x) - y) } \end{aligned} $$

Often strong in practice, but can perform arbitrarily-badly

Issue: high-cost high-variance points (Astudillo et al., 2021)

Derivation: one-step approximation to intractable dynamic program

What if I told you this dynamic program can sometimes be solved exactly?

Expected improvement: time-based simplification

Gittins index: space-based simplification

Cost-aware Bayesian Optimization: a simplified setting

Assumptions:

Cost-per-sample problem: algorithm decides when to stop
Reward once stopped: best observed point (simple regret)
Distribution over objective functions is known
$X$ is discrete, $f(x_i)$ and $f(x_j)$ for $x_i \neq x_j$ are independent

These are restrictive! But they lead to an interesting, general solution

This setting: Pandora's Box problem from economics

Whether to open Pandora's Box?

Pandora's Box: formulation

We're given a collection of boxes with unknown rewards:

Rewards: $f(x_i) \~ p_i$ (where $p_i$ is known, different boxes independent)
Cost to open each box: $c(x_i)$
Can only take reward from one open box $\vphantom{c(x_i)}$

Goal: maximize $$ \E \max_{1 \leq t \leq T} f(x_t) - \E \smash{\sum_{t=1}^T} c(x_t) $$

Key difficulty

One can show: obvious greedy policies are suboptimal

...and yet, the optimal policy is surprisingly simple

Solving Pandora's Box

Consider: one closed vs. one open box

Should we open the closed box? Maybe!

Depends on costs $c$, reward distribution $f$, and value of open box $g$

Consider: one closed vs. one open box

One closed vs. open box: Markov decision process

Optimal policy: open if $\f{EI}_f(x; g) \gt c(x)$

Consider: one closed vs. one open box

Consider how optimal policy changes as a function of $g$

If both opening and not opening are co-optimal: $g$ is a fair value

Define: $\alpha^\star_t(x) = g$ where $g$ solves $\f{EI}_f(x; g) = c(x)$

Back to many boxes

Theorem (Weitzman, 1979). This policy is optimal in expectation.

Caveat! This result and its analogs are fragile.

Cost-aware Bayesian Optimization:

Correlated Pandora's Box?

Difference so far: expected budget constraint

...turns out not to matter: Lagrange multipliers

Pandora's Box Gittins Index for Bayesian Optimization

Bayesian optimization: posterior distribution is correlated

Define Pandora's Box Gittins Index acquisition function:

$\alpha^{\f{PBGI}}(x) = g\quad$ where $g$ solves $\quad\f{EI}_{f\given y}(x; g) = \htmlClass{fragment}{\ubr{\lambda}{\htmlClass{fragment}{\substack{\t{Lagrange multiplier}\\\t{from budget constraint}}}}} c(x)$

Correlations: incorporated into acquisition function via the posterior

Does it work?

Not an optimal policy. But maybe a strong one?

An initial sanity-check

Performance counterexample for EIPC baseline:

Random objective with high-variance high-cost region

Pandora's Box Gittins Index: performs well

What does $\lambda$ do?

Controls risk-averse vs. risk-seeking behavior

Optimal tuning of $\lambda$: depends on the expected budget

Limit as $\lambda\to0$: converges to UCB with automatic learning rate

Objective functions: sampled from the prior

Synthetic benchmark functions

Ackley function

Levy function

Rosenbrock function

Synthetic benchmark functions

Empirical objectives

Conclusion: Gittins indices can perform well, even where they are not optimal

Future work: handling gradual feedback

Hyperparameter optimization: test loss decreases during training

Can sometimes predict final loss from early iterations

Gittins index: generalizes to this setting, including with evaluation costs

Challenge: numerical computation

Thank you!

https://avt.im/· @avt_im

Thank you!

https://avt.im/· @avt_im

Q. Xie, R. Astudillo, P. Frazier, Z. Scully, and A. Terenin. Cost-aware Bayesian optimization via the Pandora's Box Gittins index. NeurIPS, 2024. INFORMS Data Mining Best Paper Finalist.

Q. Xie, L. Cai, A. Terenin, P. Frazier, and Z. Scully. Cost-aware Stopping for Bayesian Optimization. In review at NeurIPS, 2025.

Z. Scully and A. Terenin. Gittins Index: A Design Principle for Decision-making Under Uncertainty. INFORMS Tutorials in Operations Research, 2025.