Modern Gaussian Process Regression

Thoughts and Theory

Gaussian Process Regression can be used to learn a multitude of periodic and aperiodic signals, such as those depicted in this figure. Photo by Ryan Stone on Unsplash

Unlimited Model Expression + Modern Computing

Ever wonder how you can create non-parametric supervised learning models with unlimited expressive power? Look no further than Gaussian Process Regression (GPR), an algorithm that learns to make predictions almost entirely from the data itself (with a little help from hyperparameters). Combining this algorithm with recent advances in computing, such as automatic differentiation, allows for applying GPRs to solve a variety of supervised machine learning problems in near-real-time.

In this article, we’ll discuss:

A brief overview/recap of the theory behind GPR
The types of problems we can use GPR to solve, and some examples
How GPR compares to other supervised learning algorithms
Modern programming packages and tools we can use to implement GPR

This is the second article in my GPR series. For a rigorous, Ab initio introduction to Gaussian Process Regression, please check out my previous article here.

Recap: Gaussian Process Regression (GPR) Concepts

Before we dive into how we can implement and use GPR, let’s quickly review the mechanics and theory behind this supervised machine learning algorithm. For more detailed derivations/discussion of the following concepts, please check out my previous article on GPR here. GPR:

i. Predicts the conditional posterior distribution of test points conditioned on observed training points:

ii. Computes the mean of predicted test point targets as linear combinations of observed target values, with the weights of these linear combinations determined by the kernel distance from the training inputs to the test points:

iii. Uses covariance functions to measure the kernel distance between inputs:

iv. Interpolates novel points from existing points by treating each novel point as part of a Gaussian Process, i.e. parameterizing the novel point as a Gaussian distribution:

An example of interpolation in 1D using a noisy sinusoidal time series dataset. Image source: Author.

What Problems Can I Solve With GPR?

GPR can be applied to a variety of supervised machine learning problems (and in some cases, can be used as a subroutine in unsupervised machine learning). Here are just a few classes of problems that can be solved with this machine learning technique:

A. Interpolation/kriging

Interpolation is a key task in a variety of fields, such as signal processing, spatial statistics, and control. This application is particularly common in fields that leverage spatial statistics, such as geostatistics. As a concrete example, consider the problem of generating a surface corresponding to the mountain below, given only a limited number of defined points on the mountain. If you’re interested in seeing a specific implementation of this, please check out my article here.

Kriging and interpolation are often used in geostatistics, and can be used for interpolating surfaces in high-dimensional spaces! Photo by Markos Mant on Unsplash

B. Time series forecasting

This class of problems looks at projecting a time series into the future using historical data. Like kriging, time series forecasting allows for predicting unseen values. Rather than predicting unseen values at different locations, however, this problem applies GPR for predicting the mean and variance of unseen points in the future. This is highly applicable for tasks such as predicting electricity demand, stock prices, or the state-space evolution of a linear dynamical system.

Furthermore, not only does GPR predict the mean of a future point, but it also outputs a predicted variance, enabling decision-making systems to factor uncertainty into their decisions.

Example of time series forecasting with a noisy sinusoid. The dark blue line represents the predicted mean, while the lighter blue interval represents the confidence interval of the model. Image source: Author.

Predicting Uncertainty

More generally, because GPR allows for predicting variance at test points, GPR can be used for a variety of uncertainty quantification tasks – i.e. any task for which it is relevant to estimate both an expected value, and the uncertainty, or variance, associated with this expected value.

You may be wondering: Why is uncertainty important? To motivate this answer, consider predicting the trajectory of a pedestrian for an autonomous navigation safety system. If the predicted trajectory of a pedestrian has high predicted uncertainty, an autonomous vehicle should exercise increased caution to account for having low confidence in the pedestrian’s intention. If, on the other hand, the autonomous vehicle has low predicted variance of the pedestrian’s trajectory, then the autonomous car will be better able to predict the pedestrian’s intentions, and can more easily proceed along with its current driving plan.

In a sense, by predicting uncertainty, decision-making systems can "weight" the expected values they estimate according to how uncertain they predict these expected values to be.

Predicting the uncertainty of a pedestrian's intentions for self-driving car systems is an example application of GPR. Photo by Fallon Michael on Unsplash — Predicting the uncertainty of a pedestrian’s intentions for self-driving car systems is an example application of GPR. Photo by Fallon Michael on Unsplash

Why GPR Over Other Supervised Learning Models?

You may be wondering – why should I consider using GPR instead of a different supervised learning model? Below, I enumerate a few comparative reasons.

GPR is non-parametric. This means it learns largely from the data itself, rather than by learning an extensive set of parameters. This is especially advantageous because this results in GPR models not being as data-hungry as highly parametric models, such as neural networks, i.e. they don’t need as many samples to achieve strong generalizability.
For interpolation and prediction tasks, GPR estimates both expected values and uncertainty. This is especially beneficial for decision-making systems that take this uncertainty into account when making decisions.
GPR is a linear smoother [5] – from a supervised learning lens, this can be conceptualized as a regularization technique. From a Bayesian lens, this is equivalent to imposing a prior on your model that all targets on test points must be linear combinations of existing training targets. This attribute helps GPR to generalize to unseen data, so long as the true unseen targets can be represented as linear combinations of training targets.
With automatic differentiation backend frameworks such as torch and tensorflow, which are integrated through GPR packages such as gpytorch and gpflow, GPR is lightning fast and scalable. This is particularly true for batched models. For an example case study of this, please see my previous article on batched, multi-dimensional GPR here!

How Can I Implement GPR?

Below, we introduce several Python machine learning packages for scalable, efficient, and modular implementations of Gaussian Process Regression. Let’s walk through each of them!

1. Scikit-Learn [1]

This is a great package for getting started with GPR. It allows for some model flexibility, and is able to carry out hyperparameter optimization and defining likelihoods under the hood. To use sklearn with your datasets, please make sure your datasets can be represented numerically with np.array objects. The main steps for using GPR with sklearn:

Preprocess your data. Training data (np.array) can be represented as a (x_train, y_train) tuple with x_train shape (N, D) and y_train shape (N, 1), where N is the number of samples, and D is the dimension of the features. Your test points (np.array) can be represented as x_test with shape (N, D).
Define your covariance function. In the code segment below, we use a Radial Basis Function (RBF) kernel RBF along with additive noise using a WhiteKernel.
Define your GaussianProcessRegressor object using your covariance function, and a random state that seeds your GPR. This random_state is important for ensuring reproducibility.
Fit your gpr object using the method gpr.fit(x_train, y_train). This "trains your model", and optimizes the hyperparameters of your gpr object using gradient methods such as lbfgs, a second-order Hessian-based optimization routine.
Predict the mean and covariance of the targets on your test points x_test using the method gpr.predict(x_test, return_std=True). This gives you both a predicted value, as well as a measure of the uncertainty for this predicted point.

To install dependencies for the example below using pip:

pip install scikit-learn numpy matplotlib

Here is an example that fits and predicts a one-dimensional sinusoid using sklearn:

2. GPyTorch [2] **** (PyTorch backend)

This package is great for creating fully customizable, advanced, and accelerated GPR models that scale. This package supports everything from GPR model optimization via auto-differentiation to hardware acceleration via CUDA and PyKeOps.

It’s recommended you have some familiarity with PyTorch and/or auto-differentiation packages in python before working with GPyTorch, but the tutorials make this framework easy to learn and use. Data for GPRs in GPyTorch are represented as torch.tensor objects. Here are the steps for fitting a GPR model in GPyTorch:

Preprocess your data. Training data can be represented as a (x_train, y_train) tuple with x_train shape (B, N, D) and y_train shape (B, N, 1), ** where B is the batch size, ** N is the number of samples, and D is the dimension of the features. Your test points can be represented as x_test with shape (B, N, D).
Define your ExactGPModel by subclassing the gpytorch.models.ExactGP class. To subclass this model, you’ll need to define: (i) The constructor method, which specifies the mean and covariance functions of the model, (ii) The forward method, which describes how the GPR model makes predictions. To use batching, check out this tutorial [here](https://docs.gpytorch.ai/en/v1.1.1/examples/00_Basic_Usage/Hyperparameters.html?highlight=priors). To use prior distributions on your hyperparameters, check out this tutorial here.
Specify your likelihood function, which your model uses to relate latent variables f to observed targets y.
Instantiate your model using your likelihood and training data (x_train, y_train).
Perform hyperparameter optimization ("training") of your model using pytorch auto-differentiation. Once finished, ensure your model and likelihood are placed in posterior mode with model.eval() and likelihood.eval().
Compute mean and variance predictions on your test points using your model by calling likelihood(model(x_test)). The inner function predicts latent test values f* from test inputs x*, and the outer function predicts mean and variance from latent test values *`f`**.

To install dependencies for the example below using pip:

pip install gpytorch torch matplotlib numpy

# (Optional) - Installs pykeops
pip install pykeops

Here is an example to fit a noisy one-dimensional sinusoid using gpytorch:

3. GPFlow [3] (TensorFlow backend)

Another GPR package that supports automatic differentiation (this time in tensorflow), GPFlow has extensive functionality built-in for creating fully-customizable models, likelihood functions, kernels, and optimization and inference routines. In addition to GPR, GPFlow has built-in functionality for a variety of other state-of-the-art problems in Bayesian Optimization, such as Variational Fourier Features and Convolutional Gaussian Processes.

It’s recommended you have some familiarity with TensorFlow and/or auto-differentiation packages in Python before working with GPFlow. Data for GPRs in GPFlow are represented as tf.tensor objects. To get started with GPFlow, please check out this examples link.

4. GPy [4]

This package has Python implementations for a multitude of GPR models, likelihood functions, and inference procedures. Though this package doesn’t have the same auto-differentiation backends that power gpytorch and gpflow, this package’s versatility, modularity, and customizability make it a valuable resource for implementing GPR.

5. Pyro [6]

Pyro is a probabilistic programming package that can be integrated with Python that also supports Gaussian Process Regression, as well as advanced applications such as Deep Kernel Learning.

6. Gen [7]

Gen is another probabilistic programming package built on top of Julia. Gen offers several advantages with Gaussian Process Regression: (i) It builds in proposal distributions, which can help to narrow down a search space by effectively imposing a prior on the set of possible solutions, (ii) It has an easy API for sampling traces from fit GPR models, (iii) As is the goal for many probabilistic programming languages, it enables for easily creating hierarchical models for tuning the priors of GPR hyperparameters.

7. Stan [8]

Stan is another probabilistic programming package that can be integrated with Python, but also supports other languages such as R, MATLAB, Julia, and Stata. In addition to having functionality built-in for Gaussian Process Regression, Stan also supports a variety of other Bayesian inference and sampling functionality.

8. BoTorch [9]

Built by the creators of GPyTorch, BoTorch is a Bayesian Optimization library that supports many of the same GPR techniques, as well as advanced Bayesian Optimization techniques and analytic test suites, as GPyTorch.

Wrap-Up And Review

In this article, we reviewed the theory behind Gaussian Process Regression (GPR), introduced and discussed the types of problems GPR can be used to solve, discussed how GPR compares to other supervised learning algorithms, and walked through how we can implement GPR using sklearn, gpytorch, or gpflow.

To see more articles in reinforcement learning, machine learning, computer vision, robotics, and teaching, please follow me! Thank you for reading!

Acknowledgments

Thank you to CODECOGS for their inline equation rendering tool, Carl Edward Rasmussen for open-sourcing the textbook Gaussian Processes for Machine Learning [5], and for Scikit-Learn, [GPy](https://gpy.readthedocs.io/en/deploy/GPy.models.html)Torch, GPFlow, and GPy for open-sourcing their Gaussian Process Regression Python libraries.

References

[1] Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." the Journal of machine Learning research 12 (2011): 2825–2830.

[3] Gardner, Jacob R., et al. "Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration." arXiv preprint arXiv:1809.11165 (2018).

[3] Matthews, Alexander G. de G., et al. "GPflow: A Gaussian Process Library using TensorFlow." J. Mach. Learn. Res. 18.40 (2017): 1–6.

[4] GPy, "GPy." http://github.com/SheffieldML/GPy.

[5] Carl Edward Rasmussen and Christopher K. I. Williams. 2005. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press.

[6] Eli Bingham, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, and Noah D. Goodman. 2019. Pyro: deep universal probabilistic programming. J. Mach. Learn. Res. 20, 1 (January 2019), 973–978.

[7] Gen: A General-Purpose Probabilistic Programming System with Programmable Inference. Cusumano-Towner, M. F.; Saad, F. A.; Lew, A.; and Mansinghka, V. K. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’19).

[8] Stan Development Team. 2021. Stan Modeling Language Users Guide and Reference Manual, VERSION. https://mc-stan.org.

[9] Balandat, Maximilian, et al. "BoTorch: A framework for efficient Monte-Carlo Bayesian optimization." Advances in Neural Information Processing Systems 33 (2020).