A Gaussian Process can be used to approximate a non-linear function \(y = f(x)\)
. To get started with the Gaussian Process package, first import the library:
import com.github.jonnylaw.gp._
Finitely many points of a Gaussian process are distributed according to a
multivariate Normal (MVN) distribution. The MVN is parameterised by a mean and
covariance function. A covariance function is
determined by the distance between points and some static parameters. The
distance function can be any suitable function from a pair of locations to a
distance measurement (Location[Double], Location[Double]) => Double
. Euclidean
distance is a suitable distance function when dealing with small scale spatial data:
Other suitable distance functions include Manhattan distance or Great Circle distance for measurements on global scale.
In order to simulate data from a Gaussian Process, first specify a vector of locations:
scala> val xs = GaussianProcess.samplePoints(-10.0, 10.0, 100).
| map(One.apply)
xs: scala.collection.immutable.Vector[com.github.jonnylaw.gp.One[Double]] = Vector(One(-9.859552146780675), One(-9.832488280812374), One(-9.768698045015466), One(-9.442795800797992), One(-8.908371260519562), One(-8.639547218148369), One(-8.596512364554968), One(-8.481551204523905), One(-8.033944278630187), One(-7.8309854004202295), One(-7.684292383471911), One(-7.624921971461109), One(-7.210476690810554), One(-7.005713594412382), One(-6.869650718857394), One(-6.570876382304092), One(-6.059245751021551), One(-5.805565083562594), One(-5.7235159505558375), One(-5.7021098915442), One(-5.649566529769858), One(-5.518758081800166), One(-5.314373893566979), One(-5.1505070184288115), One(-4.99831136861129), One(-4.612459379259848), One(-4.522017634102178), One(-4.222217751791009), One(-4.0820258...
The function samplePoints
uniformly samples a vector of 300 Doubles between -10
and 10. The sampled points are then put into a Location
object One
. Next a
matrix representing the pairwise distances between each point can be calculated:
scala> val m = GaussianProcess.distanceMatrix(xs, Location.euclidean)
m: breeze.linalg.DenseMatrix[Double] =
0.0 0.02706386596830157 0.0908541017652098 ... (100 total)
0.02706386596830157 0.0 0.06379023579690823 ...
0.0908541017652098 0.06379023579690823 0.0 ...
0.41675634598268374 0.38969248001438217 0.32590224421747394 ...
0.9511808862611133 0.9241170202928117 0.8603267844959035 ...
1.2200049286323065 1.192941062664005 1.1291508268670967 ...
1.2630397822257073 1.2359759162574058 1.1721856804604975 ...
1.3780009422567705 1.350937076288469 1.2871468404915607 ...
1.8256078681504881 1.7985440021821866 1.7347537663852783 ...
2.028566746360446 2.0015028803921444 1.9377126445952362 ...
2.1752597633087642 2.1481958973404627 2.0844056615435544 ...
2.2346301...
Next a covariance function can be selected, for instance the squared exponential covariance function:
\(\sigma\)
is known as the length scale, large values of sigma indicate that large
changes in distance are required for the covariance to change significantly.
\(h\)
controls the amount of change between two points. In addition some white
noise (Normally distributed with mean zero) representing independent measurement
noise can be added to the covariance function. Then the covariance function can be applied to the distance matrix using a map
:
scala> val covFn = (distance: Double) => KernelFunction.squaredExponential(h = 3.0,
| sigma = 5.0)(distance) + KernelFunction.white(sigma = 0.5)(distance)
covFn: Double => Double = $$Lambda$4405/1175160170@3e9d0ba0
scala> val covMat = m.map(covFn)
covMat: breeze.linalg.DenseMatrix[Double] =
3.5 2.999912106946619 ... (100 total)
2.999912106946619 3.5 ...
2.9990096273720694 2.9995117364359003 ...
2.9792299306558157 2.9818320082942282 ...
2.893371657316043 2.899251493478395 ...
2.826603481470843 2.833996633890227 ...
2.814547517848851 2.8221720812532487 ...
2.7805724528445404 2.788799033739133 ...
2.6255713385404995 2.635892587604198 ...
2.5446903353209436 2.5558164882011836 ...
2.4826886032255286 2.4943357570881437 ...
2.4568238745753272 2.4686669985480343 ...
2.265755500926901 2.2787214030820113 ...
2.1659011476835404 2.1792615558343345 ...
2.0980993568059008 2.1116635135745234...
Then a draw from a zero-mean Gaussian Process prior with covariance matrix
covMat
at locations xs
can be performed:
import breeze.linalg._
import breeze.stats.distributions._
val nugget = diag(DenseVector.fill(xs.size)(1e-3))
val root = eigSym(covMat + nugget)
val x = DenseVector.rand(covMat.cols, Gaussian(0, 1))
val ys = root.eigenvectors * diag(root.eigenvalues.mapValues(math.sqrt)) * x
Then the data can be plotted and saved:
import com.cibo.evilplot.plot.aesthetics.DefaultTheme._
val sims = GaussianProcess.vecToData(ys, xs)
Plot.scatterPlot(sims).
standard().
render().
write(new java.io.File("docs/src/main/resources/figures/simulated_gp.png"))