Talk for Alan Turing Institute

Non-Euclidean Matérn

Gaussian Processes

Alexander Terenin

https://avt.im/ · @avt_im

Gaussian Processes

Bayesian Optimization

Goal: minimize unknown function $\phi$ in as few evaluations as possible.

  1. Build posterior $f \given \v{y}$ using data $(x_1,\phi(x_1)),..,(x_n,\phi(x_n))$.
  2. Choose $$ \htmlClass{fragment}{ x_{n+1} = \argmin_{x\in\c{X}} \alpha_n(x) } $$ to optimize the acquisition function $\alpha_n$, for instance Thompson sampling $\alpha_n \~ f\given\v{y}$.

Automatic explore-exploit tradeoff.

Bayesian Optimization

Modeling Dynamical Systems with Uncertainty

$$ \htmlData{fragment-index=0,class=fragment}{ x_0 } \qquad \htmlData{fragment-index=1,class=fragment}{ x_1 = x_0 + f(x_0)\Delta t } \qquad \htmlData{fragment-index=2,class=fragment}{ x_2 = x_1 + f(x_1)\Delta t } \qquad \htmlData{fragment-index=3,class=fragment}{ .. } $$

Bayesian Optimization in Robotics

Orientation: sphere
 

Manipulability:
SPD manifold

Joint postures: torus
 


Jaquier et al. (2020)

Geometry-aware Gaussian Processes

Matérn Gaussian Processes

$\nu = 1/2$

$\nu = 3/2$

$\nu = 5/2$

$\nu = \infty$

$$ \htmlData{class=fragment fade-out,fragment-index=9}{ \footnotesize \mathclap{ k_\nu(x,x') = \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)} \del{\sqrt{2\nu} \frac{\norm{x-x'}}{\kappa}}^\nu K_\nu \del{\sqrt{2\nu} \frac{\norm{x-x'}}{\kappa}} } } \htmlData{class=fragment d-print-none,fragment-index=9}{ \footnotesize \mathclap{ k_\infty(x,x') = \sigma^2 \exp\del{-\frac{\norm{x-x'}^2}{2\kappa^2}} } } $$ $\sigma^2$: variance $\kappa$: length scale $\nu$: smoothness
$\nu\to\infty$: recovers squared exponential kernel

Riemannian Geometry

How should Matérn kernels generalize to this setting?

Geodesics

$$ k_\infty^{(d_g)}(x,x') = \sigma^2\exp\del{-\frac{d_g(x,x')^2}{2\kappa^2}} $$

Theorem. (Feragen et al.) Let $M$ be a complete Riemannian manifold without boundary. If $k_\infty^{(d_g)}$ is positive semi-definite for all $\kappa$, then $M$ is isometric to a Euclidean space.

Need a different candidate generalization


Feragen et al. (2015)

Stochastic Partial Differential Equations

$$ \htmlData{class=fragment,fragment-index=0}{ \underset{\t{Matérn}}{\undergroup{\del{\frac{2\nu}{\kappa^2} - \Delta}^{\frac{\nu}{2}+\frac{d}{4}} f = \c{W}}} } \qquad \htmlData{class=fragment,fragment-index=1}{ \underset{\t{squared exponential}}{\undergroup{\vphantom{\del{\frac{2\nu}{\kappa^2} - \Delta}^{\frac{\nu}{2}+\frac{d}{4}}} e^{-\frac{\kappa^2}{4}\Delta} f = \c{W}}} } $$ $\Delta$: Laplacian $\c{W}$: (rescaled) white noise

Generalizes well to the Riemannian setting

Whittle (1963)
Lindgren et al. (2011)

Riemannian Matérn Kernels: compact spaces

$$ k_\nu(x,x') = \frac{\sigma^2}{C_\nu} \sum_{n=0}^\infty \del{\frac{2\nu}{\kappa^2} - \lambda_n}^{\nu-\frac{d}{2}} f_n(x) f_n(x') $$

$\lambda_n, f_n$: Laplace–Beltrami eigenpairs

Analytic expressions for circle, sphere, ..

Riemannian Matérn Kernels

$k_{1/2}(\htmlStyle{color:rgb(0, 0, 0)!important}{\bullet},\.)$

Example: regression on the surface of a dragon

(a) Ground truth

(b) Posterior mean

(c) Std. deviation

(d) Posterior sample

Weighted Undirected Graphs

$$ f : G \to \R $$

$$ f\Big(\smash{\includegraphics[height=2.75em,width=1.5em]{figures/g1.svg}}\Big) \to \R $$

$$ f\Big(\smash{\includegraphics[height=2.75em,width=1.5em]{figures/g2.svg}}\Big) \to \R $$

$$ f\Big(\smash{\includegraphics[height=2.75em,width=1.5em]{figures/g3.svg}}\Big) \to \R $$

Stochastic Partial Differential Equations

$$ \htmlClass{fragment}{ \underset{\t{Matérn}}{\undergroup{\del{\frac{2\nu}{\kappa^2} - \Delta}^{\frac{\nu}{2}+\frac{d}{4}} f = \c{W}}} } \qquad \htmlClass{fragment}{ \underset{\t{squared exponential}}{\undergroup{\vphantom{\del{\frac{2\nu}{\kappa^2} - \Delta}^{\frac{\nu}{2}+\frac{d}{4}}} e^{-\frac{\kappa^2}{4}\Delta} f = \c{W}}} } $$ $\Delta$: Laplacian $\c{W}$: (rescaled) white noise
$e^{-\frac{\kappa^2}{4}\Delta}$: (rescaled) heat semigroup

The Graph Laplacian

$$ \htmlClass{fragment}{ (\m\Delta\v{f})(x) = \sum_{x' \~ x} w_{xx'} (f(x) - f(x')) } $$ $$ \htmlClass{fragment}{ \m\Delta = \m{D} - \m{W} } $$ $\m{D}$: degree matrix $\m{W}$: (weighted) adjacency matrix

Note: different sign convention, analogous to Euclidean $-\Delta$

Graph Matérn Gaussian Processes

$$ \htmlClass{fragment}{ \underset{\t{Matérn}}{\undergroup{\del{\frac{2\nu}{\kappa^2} + \m\Delta}^{\frac{\nu}{2}} \v{f} = \c{W}\mathrlap{\hspace*{-2.42ex}\c{W}\hspace*{-2.42ex}\c{W}}}} } \qquad \htmlClass{fragment}{ \underset{\t{squared exponential}}{\undergroup{\vphantom{\del{\frac{2\nu}{\kappa^2} - \m\Delta}^{\frac{\nu}{2}+\frac{d}{4}}} e^{\frac{\kappa^2}{4}\m\Delta} \v{f} = \c{W}\mathrlap{\hspace*{-2.42ex}\c{W}\hspace*{-2.42ex}\c{W}}}} } $$ $\m\Delta$: graph Laplacian $\c{W}\mathrlap{\hspace*{-2.42ex}\c{W}\hspace*{-2.42ex}\c{W}}$: standard Gaussian

Graph Matérn Gaussian Processes

$$ \htmlClass{fragment}{ \underset{\t{Matérn}}{\undergroup{\vphantom{\v{f} \~\f{N}\del{\v{0},e^{-\frac{\kappa^2}{4}\m\Delta}}} \v{f} \~\f{N}\del{\v{0},{\textstyle\del{\frac{2\nu}{\kappa^2} + \m\Delta}^{-\nu}}}}} } \qquad \htmlClass{fragment}{ \underset{\t{squared exponential}}{\undergroup{\v{f} \~\f{N}\del{\v{0},e^{-\frac{\kappa^2}{2}\m\Delta}}}} } $$

Graph Fourier Features

$$ \htmlClass{fragment}{ k_\nu(x,x') = \frac{\sigma^2}{C_\nu} \sum_{n=1}^{|G|} \del{\frac{2\nu}{\kappa^2} + \lambda_n}^{-\nu} \v{f}_n(x)\v{f}_n(x') } $$ $\lambda_n,\v{f}_n$: eigenvalues and eigenvectors of graph Laplacian

Prior Variance

(a) Complete graph

(b) Star graph

(c) Random graph

(d) Random graph

Prior variance depends on geometry

Example: Graph Interpolation of Traffic

(a) Predictive mean

(b) Standard deviation

Example: Graph Interpolation of Traffic

(a) Predictive mean

(b) Standard deviation

Riemannian Limits

Gaussian Vector Fields

Frames

$ \large\x\, $

$ \large\x\, $

$ \large= $

Same vector field: represented differently in different frames

Projected Kernels

Construct vector fields by embedding and flattening scalar fields

Geometry-aware Bayesian Optimization

Ackley function

Levy function

Rosenbrock function

Geometry-aware Bayesian Optimization

Geometry-aware Bayesian Optimization

Geometry-aware Bayesian Optimization

Thank you!

https://avt.im/· @avt_im

V. Borovitskiy,* P. Mostowsky,* A. Terenin,* M. P. Deisenroth. Matérn Gaussian Processes on Riemannian Manifolds. Advances in Neural Information Processing Systems, 2020.

N. Jaquier,* V. Borovitskiy,* A. Smolensky, A. Terenin, T. Asfour, L. Rozo. Geometry-aware Bayesian Optimization in Robotics using Riemannian Matérn Kernels. Conference on Robot Learning, 2021.

V. Borovitskiy,* I. Azangulov,* P. Mostowsky,* A. Terenin,* M. P. Deisenroth, N. Durrande. Matérn Gaussian Processes on Graphs. Artificial Intelligence and Statistics, 2021.

M. J. Hutchinson,* A. Terenin,* V. Borovitskiy,* S. Takao,* Y. W. Teh, M. P. Deisenroth. Vector-valued Gaussian Processes on Riemannian Manifolds via Gauge Independent Projected Kernels. Advances in Neural Information Processing Systems, 2021.

J. T. Wilson,* V. Borovitskiy,* P. Mostowsky,* A. Terenin,* M. P. Deisenroth. Efficiently Sampling Functions from Gaussian Process Posteriors. International Conference on Machine Learning, 2020.

J. T. Wilson,* V. Borovitskiy,* P. Mostowsky,* A. Terenin,* M. P. Deisenroth. Pathwise Conditioning of Gaussian Process. Journal of Machine Learning Research, 2021.

*Equal contribution