Machine learning algorithms have hyperparameters:
Goal: find best hyperparameters to minimize test loss $$ \htmlClass{fragment}{ \min_{\theta\in\Theta} L_{\f{test}}(\theta) } $$
This setup: an example of black-box global optimization
Goal: optimize unknown function $\phi$ in as few evaluations as possible
Joint work with Qian Xie, Linda Cai, Raul Astudillo, Peter Frazier, and Ziv Scully
Some machine learning algorithms cost more than others to train:
Goal: find best hyperparameters to minimize test loss $$ \htmlClass{fragment}{ \begin{aligned} &\min_{t=1,..,T} L_{\f{test}}(\theta_t) & &\t{subject to} & &\E\sum_{t=1}^T c(\theta_t) \leq B \end{aligned} } $$
Black-box global optimization under expected budget constraint:
Goal: minimize unknown function $\phi$ in as few evaluations as possible
Algorithm:
Optimal choice $x_{t+1}$ and when to stop: intractable dynamic program
Cost-aware baseline: expected improvement per unit cost $$ \begin{aligned} \htmlClass{fragment}{ \alpha_{t}^{\f{EIPC}}(x) } & \htmlClass{fragment}{ = \frac{\f{EI}_{f\given y_1,..,y_t}(x; \max_{1\leq\tau\leq t} y_\tau)}{c(x)} } & \htmlClass{fragment}{ \f{EI}_\psi(x; y) } & \htmlClass{fragment}{ = \E \max(0, \psi(x) - y) } \end{aligned} $$
Often strong in practice, but can perform arbitrarily-badly
Derivation: one-step approximation to intractable dynamic program
Assumptions:
These are restrictive! But they lead to an interesting, general solution
We're given a collection of boxes with unknown rewards:
Goal: maximize $$ \E \max_{1 \leq t \leq T} f(x_t) - \E \smash{\sum_{t=1}^T} c(x_t) $$
Theorem (Weitzman, 1979). This policy is optimal in expectation.
Caveat! This result and its analogs are fragile.
Bayesian optimization: posterior distribution is correlated
Define Pandora's Box Gittins Index acquisition function:
$\alpha^{\f{PBGI}}(x) = g\quad$ where $g$ solves $\quad\f{EI}_{f\given y}(x; g) = \htmlClass{fragment}{\ubr{\lambda}{\htmlClass{fragment}{\substack{\t{Lagrange multiplier}\\\t{from budget constraint}}}}} c(x)$
Correlations: incorporated into acquisition function via the posterior
Performance counterexample for EIPC baseline:
Hyperparameter optimization: test loss decreases during training
Gittins index: generalizes to this setting, including with evaluation costs
Challenge: numerical computation
Q. Xie, R. Astudillo, P. Frazier, Z. Scully, and A. Terenin. Cost-aware Bayesian optimization via the Pandora's Box Gittins index. NeurIPS, 2024. INFORMS Data Mining Best Paper Finalist.
Q. Xie, L. Cai, A. Terenin, P. Frazier, and Z. Scully. Cost-aware Stopping for Bayesian Optimization. In review at NeurIPS, 2025.
Z. Scully and A. Terenin. Gittins Index: A Design Principle for Decision-making Under Uncertainty. INFORMS Tutorials in Operations Research, 2025.