New Paper - Accuracy of Inference

A couple of weeks ago I put my newest paper on to the arXiv, which you can find here; https://arxiv.org/abs/1711.06287

The paper describes how much we can hope to learn from gravitational-wave observations of black holes, focusing in particular on how well we can constrain our models of the evolution of binary stars. In order to calculate this I used COMPAS, the binary evolution simulation software we are developing here in Birmingham, and a technique called Fisher information matrices.

If you have a model, which depends on some parameters \{\lambda\}, and makes predictions about some observable random variable, then the Fisher information quantifies how sensitive the expected observations are to changes in the underlying parameters. If some parameters have a large effect on our expected observations, then it makes sense that we can learn a lot about them.

I calculated the Fisher matrix by running COMPAS (a lot!) and seeing how the output changed with respect to four different bits of physics; the strength of supernova kicks, the ferocity of stellar winds during two different stellar evolutionary phases, and the frictional energetics during the common envelope phase. Below is the money plot from the paper.

The plot shows how accurately we will be able to measure each of these parameters after 1000 observations of binary black holes. The fuzziness comes from a quantification of the various sources of uncertainty in this process. It also shows how the different parameters behave with respect to each other (their correlation). The way I like to think about this plot is in terms of null hypothesis testing. If we find the truth is a massive outlier with respect to these distributions, then we reject the null hypothesis that our model is correct. This plot shows that the model doesn't have to be wrong by more than a few percent before we can rule out our model.

Intense Collaboration à Clermont-Ferrand

I've spent the last week working in the beautiful (and intensely hot) Clermont-Ferrand in the south of France, in the fourth edition of the COIN residence program. These residence programs are aimed to bring together a diverse collection of astrophysicists (and others!) to live together in a house for a week, and collaborate intensely on a couple of statistical and/or machine learning problem in astrophysics. The house itself is super luxurious (by British standards, at least)

I had no idea how much work could be done in a single week, we have two papers in a pretty good state (from an almost cold start), and we expect to submit them in the next couple of weeks. I'm not sure how much I can say about them before then, but I'll write a summary here once they appear on the arXiv.

I would thoroughly recommend future incarnations of this meeting to anyone. The website is here; https://iaacoin.wixsite.com/crp2017

 

 

 

Greek Sieves and Corrupted Data

The largest simulation of my PhD finished a couple of days ago on our local cluster. It was a simulation of a total of 360 million binaries, combining datasets using around 300 different sets of physical assumptions. The whole lot took around 12 days running simultaneously on around 120 cores. That's around 35000 CPU hours, or just shy of 4 CPU years of processing! I will be using the simulations to try and pin down exactly how much we can hope to learn about our models of binary physics using gravitational wave observations. The paper is coming along nicely, and I'll write a summary here when it's done.

Alas, when I had run my postprocessing code to collect my results from the different bits of the cluster I sent it to, I found that I only had 359999189 binaries in my dataset (811 had gone missing!). This isn't a problem I can just ignore, since I make use of the fact that I simulate the same number of binaries, using the same initial conditions, for each setting of the physics. I need to work out what's missing. This turns out to be quite a challenge when you have a 40GB data file with 3.6\times 10^8 binaries to look through!

The main challenge is memory management. I can't load the whole file into memory (I even use a little bit of swap memory when loading in a single column). Luckily, I kept track of an ID type variable when constructing my dataset, so for one of the columns, I know that there should only be 1.2\times 10^6 unique values, repeated for each variation of the physics.

So, in summary, I have a column which should have 300 copies of the same 1.2\times 10^6 IDs. The problem is therefore to go through my dataset, and find out which ones have been skipped over for some of my datasets (in a way that doesn't kill the RAM on my machine).

The solution I came up with is partially based on the Sieve of Eratosthenes, which is a very simple and efficient way of finding all the prime numbers up to a given limit. You can see details of the algorithm here, however, when I first saw an implementation of it, it blew my mind. The trick is to make a boolean array, and then mark the elements whose indices are prime numbers. This is easier to see in code, here's my implementation from a few years ago in C++;

vector<bool> primeSieveUpTo(int max)
{
	vector<bool> sieve(max+1,true);
	
	sieve[0] = false;
	sieve[1] = false;
	
	for (int i=2; i<=max; i++)
	{
		if (sieve[i])
		{
			for (int j=2*i; j<=max; j+=i) sieve[j] = false;
		}
	}
	
	return sieve;
}

I realised I could use a similar trick (using the indices of an array to keep track of things) in my code. Here's my whole Python script, which efficiently goes through my 360000000 line file, and counts instances of each unique seed.

import numpy as np

# these are the 1.2e6 unique values I know will be in the 2nd column of
# my data
uniqueSeeds = np.loadtxt('uniqueSeeds.dat',dtype=np.int64)

seedUsageCounts = np.zeros_like(uniqueSeeds)

# my data file
f = open('allInitial.dat')

# skip the headers
f.readline()
f.readline()

for l in f:
	seed = int(l[:-1].split(',')[1])
	# add 1 to the index corresponding to the observed seed
	seedUsageCounts[uniqueSeeds == seed] += 1
	
f.close()

np.savetxt('seedUsageCounts.dat', seedUsageCounts,fmt='%i')

I'm sure I could improve performance a little by doing things in batches, but it's memory I was trying to manage, not time. I can leave it running overnight!

Villiers Park

I was recently accepted to teach a week long residential astrophysics course at Villiers park, just outside Cambridge UK. It sounds like it's going to be loads of fun.

The basic idea is that they invite some of the best A-Level (17-18 year old) students to stay at Villiers Park for a week in September, whilst myself and another physicist teach them a series of topics broadly related to astrophysics, as well as generally hang out with them and show that we're not all Sheldon Coopers.

I'm currently pencilled in to give 90 minute sessions on each of

  • Universe at the largest scales
  • Special Relativity (with a little GR at the end)
  • Gravitational Waves
  • Exoplanets

I'm genuinely excited for this!

Random Samples From a Sphere

During my work today, at some point I had to draw samples uniformly from a sphere, and made an observation I hadn't really appreciated before. The name of the game is to make sure that every infinitesimal shell has the same density of samples, on average.

 \frac{4\pi r^2}{\left< N_{samp} \right>} = \mathrm{constant}

Or in other words, the volume gets more spread out, so we need more samples for higher r. The density is simply constant with respect to direction (angles). So, to get the correct distribution we simply need to sample from

 \theta \sim \mathcal{U}(0,2\pi)

 \phi \sim \mathcal{U}(0,\pi)

 r \sim r^2

Or in other words, just sample from a power law distribution with index 2. Elementary observation? Maybe, but it makes it easier to code up!

Gaussian Process Update Equations

When you start to learn about Gaussian processes, you come across the update equations fairly early on. The update equations are fairly intimidating to look at, and are typically dismissed as trivial to derive (for example, Rasmussen and Williams' simply point you towards a statistics book from the 1930's, which is neither available online nor in our university library...). I had a go at the derivation, and promptly realised it wasn't trivial at all from a cold start.

Fortuitously, at the PEN emulators workshop I recently attended, there was an introductory lecture from Jochen Voss, where he went through a univariate case, and then gave us the structure of the derivation for the full multivariate case. Luckily, this gave me the push I needed to try the derivation again, so I went away and filled in the gaps.

So here it is, in all it's glory; the derivation of the update equations for Gaussian processes.

The overall endgame of Gaussian process regression is to write down a conditional distribution P(y_p | y_t) for a set of predicted outputs y_p given a set of training (observed) outputs y_t. By the product rule

 P(y_p|y_t) = \frac{P(y_p,y_t)}{P(y_t)}

Since we have a constant set of data, P(y_t) is just a constant in this expression.

The underlying assumption in Gaussian process regression is that outputs are jointly Gaussian distributed, so that

 P(y_p|y_t) \propto P(y_p,y_t) \propto \exp\left[-\frac{1}{2} \left(\begin{array}{c} y_t \\ y_p \end{array}\right)^T\Sigma^{-1}\left(\begin{array}{c} y_t \\ y_p \end{array}\right)\right]

Where \Sigma is the joint covariance matrix. Remember that under the Gaussian process model we have trained a function which computes the elements of the Covariance matrix purely as a function of the inputs, it is only a function of the outputs y_p that we're trying to find. We can define the covariance matrix blockwise

 \Sigma = \left(\begin{array}{cc} T & C^T \\ C & P \end{array}\right)

Where T is the covariance matrix computed using only the training inputs x_t, P is the covariance matrix computed using the prediction inputs x_p, and C is the cross terms (i.e. the covariance \emph{between} y_t and y_p, computed using x_t and x_p). It is a well known result (it's in numerical recipes, or on Wikipedia) that you can blockwise invert a matrix;

 \Sigma^{-1} = \left(\begin{array}{cc}T^{-1} + T^{-1}C^T M CT^{-1} & -T^{-1}C^TM \\ -MCT^{-1} & M\end{array}\right)

Where M = (P-CT^{-1}C^T)^{-1}. So, we can directly compute our Gaussian density

 P(y_p|y_t) \propto \exp\left[-\frac{1}{2} y_t^T(T^{-1} + T^{-1}C^T M CT^{-1})y_t + \frac{1}{2}y_t^T (T^{-1}C^TM)y_p + \frac{1}{2}y_p^T (MCT^{-1})y_t - \frac{1}{2}y_p^TMy_p\right]

However, the only thing that isn't a constant here is y_p, so we can drop a bunch of terms (since we're only interested in the density, not absolute values)

 P(y_p|y_t) \propto \exp\left[\frac{1}{2}y_t^T (T^{-1}C^TM)y_p + \frac{1}{2}y_p^T (MCT^{-1})y_t - \frac{1}{2}y_p^TMy_p\right]

If we take the transpose of the middle term, we can group the terms together a bit more

 P(y_p|y_t) \propto \exp\left[\frac{1}{2}y_t^T (T^{-1}C^TM + (MCT^{-1})^T)y_p - \frac{1}{2}y_p^TMy_p\right]

Now, in general, a multivariate Gaussian has the form;

 \mathcal{N}(\tilde{y},\tilde{\Sigma}) \propto \exp\left[-\frac{1}{2}(y-\tilde{y})^T\tilde{\Sigma}^{-1}(y-\tilde{y})\right]

If we remember that covariance matrices are symmetric, we can expand, drop some constant terms and then rearrange this to

 \mathcal{N}(\tilde{y},\tilde{\Sigma}) \propto \exp\left[-\frac{1}{2}y^T\tilde{\Sigma}^{-1}y + \tilde{y}^T\tilde{\Sigma}^{-1}y\right]

we can therefore see that both P(y_p|y_t) and \mathcal{N}(\tilde{y},\tilde{\Sigma}) have exactly the same form. We can therefore straightforwardly match expressions for \tilde{\Sigma}.

 \tilde{\Sigma} = M^{-1} = P-CT^{-1}C^T

The expression for \tilde{y} requires a little more work. We start by matching terms

 \tilde{y}^T\tilde{\Sigma}^{-1} = \frac{1}{2}y_t^T (T^{-1}C^TM + (MCT^{-1})^T)

We can rearrange this a little bit

 \tilde{y}^T\tilde{\Sigma}^{-1} = \frac{1}{2}y_t^T T^{-1}C^T (M + M^T)

We know that M is a symmetric matrix (we just showed that its inverse is the covariance matrix). So, if we right multiply by the covariance matrix \tilde{\Sigma} and take the transpose, we finally arrive at

\tilde{y} = CT^{-1}y_t

And, so, in conclusion we know that

P(y_p|y_t) \sim \mathcal{N}(CT^{-1}y_t, P-CT^{-1}C^T)

So, not quite as trivial as the textbooks claim!

EWASS and PEN

Over the last couple of weeks I've attended both the PEN (Past Earth Network) emulators workshop in the ultra-glamorous Leeds, UK, and the European Week in Astronomy and Space Science (EWASS) in the gorgeous Prague, Czech Republic.

I made a new poster for EWASS, with a slightly more detailed account of my emulators work. I'm more or less at the paper writing stage for that project (and another), so watch this space for arXiv numbers and accessible reviews in the coming couple of months!

SAMSI Workshop on Emulation

At the beginning of April I attended the SAMSI workshop on the emulation of astrophysical populations, just outside of Durham North Carolina. I helped co-author the official SAMSI blog post, which can be found here.

I also finally submitted my proceedings paper from the Astroinformatics conference last year to the arXiv, which you can look at over at arXiv 1704.03781.

I may get around to writing a summary post about it, although things are quite busy these days. I'll see what I can do.

Chats and Virtual Networks

My azure job manager I mentioned last time worked rather nicely. I found myself running into various usage limits, which the people on the service desk happily removed for me, until I railed up against the public IP limit. Turns out I'd been misunderstanding how to structure my queue. I have now rewritten the code so that instead of launching 60 virtual machines with 60 unique IPs, I now have a 'head node' of sorts, with a public IP, and a virtual network of virtual machines hidden behind that one. I just submitted a big run using this new queuing system. We'll see how it pans out.

Our department was also visited by Iain Murray of Edinburgh this week. I had seen him speak a couple of years ago at the intractable likelihood conference in Bristol, and then more recently I bumped into his PhD student at a workshop in Warwick (who, as it turns out, also works on emulation of physical simulations), so I suggested him as a speaker. He gave an excellent talk about density estimation and neural networks in astronomy contexts, and we had a lot of interesting talks about my own work, and he provided some useful suggestions regarding Gaussian processes.

I also started reacquainting myself with the Gaussian process work flows I had developed before the COMPAS writing workload piled up, and I'll spend the next week working on these, as well as nursing simulations and data generation.

My Azure job manager

I've spent the last week or so investigating  exactly what I can do with my Microsoft Azure grant. As I listed in my last blog post, I want to be able to split my job up into a bunch of small chunks, which I can then distribute out to some humble linux virtual machines, and then collect the results back again.

I started by investigating Azure-Batch, which is Azure's built in way of mimicking a gigantic cluster. However, after spending a couple of days wrestling with their broken tutorial, I discovered that they don't support the service for their simplest VMs, which means I would have to use a more expensive one, which would effectively half my CPU hours.

I then spent some time trying to work out how to deploy a virtual machine usig their python API. Again, the tutorials are largely broken or incomprehensible, and in the end I ended up coming across the Azure-CLI (command line interface). This was a lot more intuitive (or at least, one of their CLIs is, they appear to have two distinct ones), so I set about writing some code which will use the CLI to launch virtual machines.

The code took a couple of days to write and debug, but it's in a more or less complete state. You can find it here.

Alas, it turns out I now rail against limits imposed by Microsoft, stopping me from launching more than 20 VMs concurrently. The website explaining the limits is pretty difficult to follow, so I've raised a service desk ticket to ask them to help me out.

In other news, we locked in the first "stable" version of our population synthesis code - COMPAS, which is the version we used to write the first COMPAS paper (watch this space).