Goal: minimize unknown function $\phi$ in as few evaluations as possible

- Build posterior $f \given y$ using data $(x_1,\phi(x_1)),..,(x_n,\phi(x_n))$
- Choose $$ \htmlClass{fragment}{ x_{n+1} = \argmax_{x\in\c{X}} \alpha_{f\given y}(x) } $$ using the acquisition function $\alpha_{f\given y}$, built from the posterior

Modeling:

- Uncertainty and generalization
- Symmetries and geometry
- Smoothness and non-uniformity
- Causal information

Decision-making:

- Multi-stage feedback
- Scheduling and asynchronicity
- What kind of uncertainty is needed?
- Adversarial objectives
- Theoretical guarantees and empirical performance

Joint work with Qian Xie, Raul Astudillo, Peter Frazier, and Ziv Scully

Goal: minimize unknown function $\phi$ in as few evaluations as possible

- $\phi$: drawn randomly from the prior
- $c(x_t)$: cost of getting new data point, expected budget constraint

Algorithm:

- Build posterior $f \given y$ using data $(x_1,\phi(x_1)),..,(x_t,\phi(x_t))$
- Find optimum of acquisition function $\alpha_{f\given y}$ and evaluate $\phi$ at $$ \htmlClass{fragment}{ x_{t+1} = \argmax_{x\in\c{X}} \alpha_{f\given y}(x) } $$

Optimal choice $x_{t+1}$ and when to stop: intractable dynamic program

Cost-aware baseline: expected improvement per unit cost $$ \begin{aligned} \htmlClass{fragment}{ \alpha_{t}^{\f{EIPC}}(x) } & \htmlClass{fragment}{ = \frac{\f{EI}_{f\given y_1,..,y_t}(x; \max_{1\leq\tau\leq t} y_\tau)}{c(x)} } & \htmlClass{fragment}{ \f{EI}_\psi(x; y) } & \htmlClass{fragment}{ = \E \max(0, \psi(x) - y) } \end{aligned} $$

Invariant to multiplying costs by a scalar: do we want this behavior?

Cost-aware analog of expected improvement:

- Expected improvement: derived in non-cost-aware setting via one-step approximation to intractable dynamic program

Assumptions:

- Cost-per-sample problem: algorithm decides when to stop
- Reward once stopped: best observed point (simple regret)
- Distribution over objective functions is known
- $X$ is discrete, $f(x_i)$ and $f(x_j)$ for $x_i \neq x_j$ are independent

These are restrictive! But they lead to an interesting, general solution

**Theorem** (Weitzman, 1979).
Let:

- $X$ be a finite set,
- $f : X \to \R$ be a finite-mean random function for which $f(x)$ is independent of $f(x')$ for $x \neq x'$,
- $c : X \to \R_+$, without loss of generality, be deterministic.

Then, for the cost-per-sample problem, the policy defined by maximizing the Gittins index acquisition function $\alpha^\star$ with its associated stopping rule is Bayesian-optimal.

Gittins index $\alpha^\star$: optimal for cost-per-sample problem

- What about expected budget-constrained problem?

**Theorem.**
Assume the expected budget constraint is feasible and active.
Then there exists a $\lambda \gt 0$ and a tie-breaking rule such that the policy defined by maximizing the Gittins index acquisition function $\alpha^\star(\.)$, defined using costs $\lambda c(x)$, is Bayesian-optimal.

Proof idea: Lagrangian duality

Our work: extends special case of a result of Aminian et al. (2024) to non-discrete reward distributions

Bayesian optimization: posterior distribution is correlated

Define *Pandora's Box Gittins Index* acquisition function:

$\alpha^{\f{PBGI}}(x) = g$ where $g$ solves $\f{EI}_{f\given y}(x; g) = c(x)$

Correlations: incorporated into acquisition function via the posterior

Novel acquisition function: Pandora's Box Gittins Index

- Settings: expected budget-constrained and cost-per-sample
- Works in heterogeneous-cost setting and for uniform costs
- Closely-related to both expected improvement and UCB
- Exact optimality in simplified problem: orthogonal insights

Performance of Pandora's Box Gittins Index

- Sufficiently-easy low-dim. problems: comparable to baselines
- Too-difficult high-dim. problems: similar to random search
- Medium-hard problems of moderate dim.: strong performance
- Can compete with state-of-the-art non-myopic approaches

Gittins Index Theory

- Workhorse tool in queuing theory
- Minimize expected wait time: serve short jobs first

Bayesian Optimization: high-dimension and complex feedback models

- Freeze-thaw
- Continuous-time and asynchronous
- Bayesian quadrature
- Function networks
- Exact optimality in simplified problems without dependence

Q. Xie, R. Astudillo, P. Frazier, Z. Scully, and A. Terenin. Cost-aware Bayesian optimization via the Pandora's Box Gittins index. *arXiv:2406.20062*, 2024.