Plot of the Week 22nd - 25th August

This week I spent a lot of time worrying about inferring derivatives from observed data. This weeks plot was generated by the following method. Consider a Taylor series of a function of two variables

 f(x +\delta x, y +\delta y) = f(x,y) \; +\; \delta x \frac{\partial f}{\partial x} \; +\; \delta y \frac{\partial f}{\partial y} \; +\; \frac{1}{2}(\delta x)^2 \frac{\partial^2 f}{\partial x^2} \; +\; \frac{1}{2}(\delta y)^2 \frac{\partial^2 f}{\partial y^2} \; +\; \delta x\delta y \frac{\partial^2 f}{\partial x \partial y} + O(\delta^3)

And we have a set of observations \{f(x_i,y_i\}, then we can straightforwardly calculate a set of 'difference' observations \{f(x_i,y_i),f(x +\delta x, y +\delta y),\delta x_i,\delta y_i, \delta x_i^2, \delta y_i^2\}, then we can do a multiple linear regression on the derivatives

 f(x +\delta x, y +\delta y) = f(x,y) \; +\; \delta x A \; +\; \delta y B \; +\; \frac{1}{2}(\delta x)^2 C \; +\; \frac{1}{2}(\delta y)^2 D \; +\; \delta x\delta y E + F

solving for \{A,B,C,D,E,F\}. The results are quite encouraging, although quite sensitive to the 'window', or the range of distances used to calculate the derivatives. The plot below shows the fraction of merging binaries as a function of supernova kick velocity (the Newton's 3rd law style kick from asymmetric supernovae) and the common envelope efficiency (the fraction of orbital energy available to throw off the shared envelope of a binary), with the estimated direction of steepest ascent in the top panel.



Plot of the week 15th - 19th August

So I thought I'd try a new series, where I'll try and write a short paragraph about a plot from my research each week. Over the last week or so, I've been simultaneously recovering from a week off, and having another look at my first project, to do with Continuous Auto-Regressive Moving Average (CARMA) models. I had written some code to implement a new way of expressing and computing the likelihood function for CARMA as a model for correlated noise. We have known for a long time that our method is plagued by numerical instability, however we didn't know exactly why until now.

It turns out that since the CARMA model essentially says that you can capture the properties of correlated noise by looking only at the last few time samples (together with a couple of global statistics). What we discovered is that if these local windows are sampled at the Nyquist frequency (or some integer multiple thereof), then a certain matrix in our calculation goes singular, and all bets are off. The plot below shows how the log likelihood deviates from the correct value as you approach one of these points.


PhD Update - July 2016

So, that PhD month in review idea didn't work out so well... From now on I'll not be so ambitious and pledge to write a new blog post as and when I get time in between my real work.

In my April month in review I described random forests and Gaussian process regression which I was experimenting with in order to solve the problem of interpolation between population synthesis models. Back then, the project was in its infancy, and things have crystallised a great deal since then, so I'll do my best to give a summary of our progress.

Gaussian processes, whilst really cool, simply weren't suitable for this project, since the mapping from inputs to outputs is non-deterministic. I anticipate using them at some point for an extension of this project (keep your eyes peeled for that one). Similarly neural networks (which I promised to introduce in my review of May, sorry about that!) didn't really work out. Despite being really cool, they also struggle with non-deterministic functions. I did start to dabble with mixture density networks, however a stubborn bug in that code combined with the resounding success of a much simpler method led me to abandon it for the time being. That being said, I have an upcoming meeting about a project which may have me dusting off my neural network code. Watch this space.

k - Nearest Neighbours Regression (sort of)

Whilst reading around random forests, I realised that the way I was using them meant that I wasn't really doing random forests... I was doing bootstrap aggregated nearest neighbours regression, which is a bit of a mouthful but actually a very simple concept.

Nearest neighbours methods are quite popular in the machine learning community when it comes to discrete (classification) problems, where your observable outputs all fit nicely into discrete classes (cats, dogs, monkeys etc). Basically, you take the set of k training data whose inputs match the input of your test point as 'closely' as possible, and then combine their outputs in some way (e.g. majority voting).

Nearest neighbours regression takes this idea and applies it to continuous observable outputs, and uses some other combination (arithmetic mean, median etc) to combine the outputs into a point estimate.

Bootstrap aggregation ("bagging") is a means of squeezing down random fluctuations in your estimate (in much the same way as random forests do for decision trees). Basically, a nearest neighbours regression is performed on datasets drawn with replacement from the original dataset. My intuition for why this works is to imagine that I have a really really bad datapoint, that throws my estimate off. Using bootstrap aggregation, this datapoint will only show up ~2/3 of the time, so the final estimate will be less corrupted by it, even though the overall estimate will still converge on the truth for infinite data.

My Method

Notice that I said above that nearest neighbours regression combines outputs into a point estimate? This isn't what we want; for any particular combination of hyperparameters, the non-determinism of the binary evolution process means that there is a range of possible outputs, not a single set of numbers. Our problem is to interpolate between distributions of quantities, not point estimates.

Therefore, the method I have been employing is to densely sample hyperparameter space with binaries, and then to approximate the output distribution by taking the set of k binaries whose hyperparameters are closest  to the point we want to test. If we are sufficiently densely sampled (and we assume that whichever distribution we're interested in changes smoothly with the hyperparameters) then the set outputs from these k closest inputs will form a good approximation to distribution at that point. This is more or less what I was doing with random forests, but more efficient and interpretable.

It has been pointed out to me that this is in some sense similar to approximate Bayesian computation (ABC), except in reverse. ABC is a method of inference used in situations where the likelihood function is either too difficult or impossible to compute. Instead, you run a simulation for your sample point, and assess the output against your data via some distance measure (which is typically a bit of a dark art to define). Smaller distances are then proxies for high likelihoods. We're instead assessing binaries on the 'closeness' of their inputs, and then using their outputs as 'samples'.

I'm not completely sure how useful this analogy is, but it certainly sounds more impressive than simply saying 'nearest neighbours', as I demonstrated on my latest conference poster, which I presented at the intractable likelihood conference in Lancaster.

I'm still working on this method and how best to apply it to the population synthesis problem, so I'm reluctant to present any preliminary results yet, but they're coming (as is hopefully a paper, but I'm not holding my breath yet). Watch this space.

Other Stuff

I'm planning to write a complimentary blog post on the 'meta-work' I've been doing, developing a bunch of tools for dealing with large quantities of data. I've been messing around with HDF5 data formats, F2PY for wrapping python code around FORTRAN routines (which also involved learning FORTRAN...), as well as organising our group's journal club and a new thing; PhD book club, in which myself and a few colleagues collectively read through Thorne & Blandford's excellent textbook, before getting together to solve the exercises on the board.

I wont overdo it for this blog post, and I'll try to write another one soon.

It's Jim's life, but not as we know it!

This morning I participated in the Astronomy and Astrophysics group Science Jamboree 2016, where I managed to win a prize (chocolates!) for the best talk. I'm not quite sure how I managed it, since in the midst of a series of informative and engaging talks, I performed a silly little poem;

For my science Jamboree
I'll tell you a bit about me
My journey through physics
in a few clumsy limericks
Right through to my PhD

I was born in July '91
In the blistering east Norfolk sun
And... oh wait a minute,
there's a strict time limit
So I had best be getting along

We'll skip ahead 18 odd years
Whilst I, still wet 'round the ears
kicked off my degree,
eager to see,
How I felt after 2 dozen beers

But I did learn some physics as well
Einsten, Newton, Maxwell
But as some of you know,
not a lot of astro
Although I think you can probably all tell

To be honest I found space a bit dreary
I preferred condensed matter theory
my vague proclivity
for superconductivity
"I should do a PhD in this clearly!"

Alas, I left those low temperatures
And became a software developer
but the real world is crap
I resolved to come back
And I somehow became an Astronomer....

My first project involved an obsession
with continuous autoregression
take an impulse response
to stochastic nonsense
et voila! banded matrix compression!

And with that O(n) inference
Time to write a paper of significance!
Except instabilities
destroy all utility
Try selling that at a conference!

So my PhD project, take 2!
But what was I going to do?
I felt it was prudent
to become Ilya's student
since bless him, he's only got a few

Binary population synthesis;
i.e how did it end up like this?
which hyperparameters
make a chirp that glamorous?
The combinations seem virtually limitless

So here's where I come in
When parameter space coverage is thin
we can interpolate
perhaps extrapolate
what happens when things spiral in

Now this work isn't finished yet
Ive tried random forests and neural nets
but despite all my labour
it seems nearest neighbours
may well be our safest bet

To conclude my science Jamboree
I'll make a short summary
my project got switched
because CARMA's a bitch
and I write really shit poetry!

PhD Month in Review - April 2016

I've decided to try and be a bit more active on the blogging front, and to try and showcase my progress through my PhD. This month I started a whole new project, so it seems like the perfect time to start a new series of blog posts; PhD Month in review. The idea is pretty self evident, at the end of each month I'll write a post explaining the highlights of what I've been working on and what I've been learning about. So, without further ado, here's what I've been doing this April. I'll start by outlining my new project.

Population Synthesis

Two of my colleagues, Simon Stevenson and Alejandro Vigna-Gómez are currently working with Professor Ilya Mandel on a compact binary population synthesis code. The basic idea of the code is to spawn a bunch of pairs of stars, sit them next to each other and let them evolve for a hubble time, before noting down their final characteristics and, most importantly, whether or not they've formed compact objects and merged. I say 'most importantly' because these compact binary mergers are exactly what the guys down at aLIGO are looking for with their kick-ass interferometer; gravitational wave sources.

GW150914 showed us that there are definitely gravitational wave sources out there, but there is still a great deal of uncertainty about the full range of what is physically possible in terms of gravitational wave sources. What range of masses are possible in a black hole binary? What about spins? Does supernova kick velocity play a significant role? As more gravitational wave observations start rolling in, constraining these underlying bits of astrophysics becomes an important and interesting task.

A pair of merging black holes - Image from the SXS project (simulating extreme spacetimes)
A pair of merging black holes - Image from the SXS project (simulating extreme spacetimes)

Now we see why population synthesis codes are important. We can spawn a whole bunch of stars with, say, a wide distribution of masses, let them go for a while and then look at which ones have formed black holes and merged. Then we check against our gravitational wave observations and see if the range of masses we began with were sensible. If they weren't, then we go back and amend our code accordingly. For example, If our population synthesis code says that a pair of 30 solar mass black holes is wildly unlikely, we know we need to change our initial distribution of masses. That's an extreme case, but that's the basic idea. More technical details can be found in my colleague's paper.

Now, where do I come in?

Interpolation and Prediction

There are a few problems with simulating populations of compact binary mergers;

  1. They're expensive - Each pair of stars can take up to a couple of seconds to evolve. Doesn't sound like much, but it is when you think about the next problem...
  2. They're quite rare - you may need to evolve millions of pairs of stars to see an appreciable (statistically meaningful) number of mergers
  3. They come in many different flavours - I've mentioned mass, spin and supernova kicks already, but there a dozens of other characteristics that might affect merger rate.
  4. They're stochastic - Basically, even with two identical pairs of stars, one might merge and the other might not
  5. They're complicated - You can't write down a simple equation about what's going on!

So, in summary, these simulations are expensive to run and each simulation only gives you information about a tiny piece of a vast parameter space. My task; work out what we can say about the simulations that haven't been run yet, given the simulations that have. Most importantly, we need to be able to quantify our uncertainty in any predictions we make. This last month I've been doing a series of initial investigations of 3 different ways that I might tackle this problem; Gaussian processes, random forests and neural networks.

Stating the Problem

The problem I am hoping to tackle is actually a relatively general one. I start by drawing stars from some known distributions and sit them next to each other in some universe characterised by a series of hyperparameters, which we'll call \vec{\varphi}, evolve them according to some highly complicated and non-linear series of operations (with stochastic elements), which we'll encapsulate in a black box function f, and at the end we get a collection of output characteristics from the binary system, which we'll call \vec{y}. The goal of this project is to therefore characterise the conditional distribution


There are two rather different ways of visualising this problem, which each lend themselves to different methods. The first is closely related to the description given above, where we would try to develop some parametrised function g of the hyper parameters such that

\vec{y} = g(\vec{\varphi}|\vec{\theta})

Where \vec{\theta} are 'model parameters' specific to the type of model that we employ (random forest or neural network in my case). The aim would then be to come up with a function g which does a good job of explaining what we see.

The other approach is to consider both the inputs and the outputs on the same footing, as part of some hypersurface upon which we want to determine the population density of binaries, so that if we call our hypersurface z(\vec{y},\vec{\varphi}) then we want to know the binaries are distributed on that surface, i.e.

z(\vec{y},\vec{\varphi}) = \frac{1}{N}\frac{dN}{d(\vec{y},\vec{\varphi})}

Gaussian Processes

I am still working towards a good understanding of what Gaussian processes actually are. So far the picture in my head is that we make some assumption about how the outputs of a function vary with respect to the distance between the inputs of that function, which we encapsulate in a kernel function. In my own implementation (and most other peoples too) I use the squared exponential kernel

k(x,y)= A e^{\frac{-|x-y|^2}{2l}}

Which essentially says that the covariance of a pair points in output space falls off exponentially with their L^2 distance in inputs space. At a deeper level, Gaussian Processes work on a large space of possible functions f(x) and posit that these functions obey a Gaussian distribution with a mean function \mu(x)=\left<f(x)\right> and covariance over functions k(x,y)=\left<(f(x)-\mu(x))(f(y) - \mu(y)\right>. Using the nice properties of Gaussians it's then easy to make point predictions with uncertainties. See Rasmussen and Williams for more details.

I haven't really done so much work on this approach, but what little work I have done has thrown up some encouraging results (see the plot below). My basic method was to naively marginalise each dimension (by literally ignoring the others), make a histogram and use the bin edges as inputs and bin counts as outputs (so this is the hypersurface, population density approach described abaove). I  then use these to make predictions of unseen data.

I am largely sceptical that this approach is the right one for this project, since if we want to consider covariance between dimensions, either binning is going to become difficult or the computation is going to become intractable. That being said, I could well see this approach being part of a prediction pipeline somewhere. We shall see.

True (blue) and agnostically predicted (red) chirp mass distributions
True (blue) and agnostically predicted (red) chirp mass distributions

Random Forests

After playing around with Gaussian Processes for a while, I started looking for other regression methods popular in the machine learning community, and came across the concept of decision trees and, by extension, random forests. Decision trees take your data and partition it according to a binary decision. The decision could be 'has the binary merged or not?' or 'is the final chirp mass greater than 50 solar masses?'. After applying many of these decisions you end up with a bunch of 'leaf' nodes containing a small number of data points. Then, when you want to predict the output for a previously unknown input, you follow the tree down to the corresponding leaf node and your prediction is some average of the observed outputs there.

Random forests extend this idea by growing these decision trees for a random subset of the data until each leaf node contains only one observation, then a prediction is an average over the predictions made by all trees in the 'forest'.

This approach lends itself to the input/output way of formulating the problem, and is nice since it doesn't require any naive marginalisation. I used the scikit-learn implementation of a binary decision tree to make predictions, and got excellent preliminary results (see the plot below). I feel as though I may very well come back to this approach at some point in the future. My supervisor has warned that as I increase dimensionality beyond the simple test cases I'm currently using, tree based approaches may start to fall down since larger and larger fractions of the parameter volume is occupied by the surface. I will look into this further, but this was only the second regression algorithm that I tried...

True (blue) and predicted (red) mass distributions.
True (blue) and predicted (red) mass distributions.

Neural Networks (and Mixture Density Networks)

The rest of this month I have spent doing some preliminary reading around neural networks and working towards implementing one. I will leave a more detailed description for next month's blog post, when I have more idea of what's going on. For the time being, you can follow my development progress via the github repo I've set up to contain all of my work related to this project; Tools for Interpolation and Prediction of Population SYnthesis (TIPPSY).   (Update 25/07/2016 - I'm no longer maintaining this repository publicly, and will instead release my interpolation tools with my group's population synthesis code as a whole, when it's ready)

Chess Blog #3 - More Online Fun

I know, I know. I should probably blog about something that isn't chess, and in particular isn't my terrible attempts at playing chess. However, better a chess blog than no blog right? right..?

The 2015-2016 season is just around the corner. My first competitive match is about 4 weeks away on 23rd September, so I'm going to start weaning myself off of blitz chess, and start playing some slightly longer games. This started today, and my very first 30-0 game turned out to be quite a nice one. I had to put up quite a tough defence against some very scary moves with a rather exposed King. I'm very proud of my combination to finish it.

I'll admit that I didn't find that final combo all in one go after white's 33rd move, I was just playing to take the Knight without allowing check from his Queen. That doesn't mean it's not pretty though...

Chess Blog #2 - Online Fun

It's been a while since I've written a blog post, but whilst playing in a casual online tournament I played a game which I couldn't resist sharing. The tournament was the open section of a regular 15-10 tournament hosted on, meaning that each player started with 15 minutes on their clock, and gained an additional 10 seconds each time they made a move. Not exactly classical time controls, but enough time to play some reasonable moves.

The tournament wasn't especially strong; I was 7th seed with a rating in the mid 1500s. I finished joint 5th with a score of 3/5. The game I'm about to show was played against one of the top seeds of the tournament, rated around 1640. It was the 4th round, and I was on 2/3 (+2-1=0).

The game was won fairly easily in the end after my opponent blundered his queen, however I'm completely positionally winning, and it wont be long until I start winning material on the queenside. All in all, quite a nice little game, where I think a few unwise choices in the opening cost my opponent the tempi necessary to make something happen on my kingside before I overwhelmed his queenside. Of course any attack would have had to overcome my mighty bishop on h6!


PhD Blog #1 - Poster for Warwick IMPACT conference

This Friday I will be attending IMPACT (Impact Midlands Physics Alliance Conference Training) at Warwick university. It is a student led conference designed to ease graduate students into the business of networking, giving seminars and presenting posters. As a 1st year I am going to be presenting a poster. I will write another, more detailed blog post after the event, but for now, here is the poster I will be presenting (you can click it to get the pdf);


Chess Blog #1 - Shropshire Congress

When I designed my academic site, I thought it would be a nice idea to bolster it with a little content about my life outside work.

I used to play a little competitive chess as a teenager, but with the pressures of exams and coursework, I wasn't able to give it the consistency necessary to improve my game. I'm now in a position where I can play regularly, and I have joined the South Birmingham Chess Club, where I have been playing for both the division 6 sides. I have had a successful season so far, 7 wins, 2 draws and 2 losses, and I thought it would be a nice idea to document a few of my games in a blog setting.

Last weekend I played in the 2015 Shropshire Congress, a 5 round, classically time controlled tournament, where each player had 90 minutes to make 36 moves, with an extra 15 minutes added on move 37 to complete the remainder of the game. I played in the minor section, which had a grade limit of 125 ECF (~1635 elo). The arbiters had given me an estimated grade of 106 ECF based on my performance in the league.

Round 1 (0/0)

I was very nervous prior to round 1, where I was paired as black against Charlie Arkell, niece of none other than GM Keith Arkell. She had a low grade (44 ECF), but a lot of youngsters rise exponentially in their first few years. She played a very solid opening, and I think she certainly had a good position after 10 moves;

Luckily we both missed the tactic Bxf5. Unfortunately, shortly after this position, my opponent blundered a piece, and then I spotted a short combination to win her queen.

Round 2 (1/1)

I had settled my nerves now, and found myself paired as white against a man graded 115 ECF, the highest graded player I had ever faced in a serious game. I wasn't confident by any stretch especially when, on move 2, I was already out of book. My opponent played the Scandinavian (centre-counter) opening (1. e4   d5), although he made quite a few positional concessions in the opening, and I reached a position after 16 moves that I was reasonably comfortable with;

Unfortunately, I played a few fairly loose moves after this point, and my opponent was able to regain his momentum and I found myself in a position where I was about to go down a pawn.

I found the best move in the position, Qxd4, so I was two bishops for a rook and a pawn, a trade that should be equal on paper.  Alas, my opponent allowed me to trade off a pair of rooks. leaving him with all the problems to solve. I was able to effectively utilise my bishop pair and my opponent resigned in the following position;

I was surprised he didn't make me play on, I am perfectly capable of throwing a position like this, but I was happy to accept the point.

Round 3 (2/2)

Most people had elected to take their 1/2 point bye in round 3. so the conference centre was very quiet compared to earlier rounds. I was paired as black against another 115 ECF player. I was exhausted; it was 6.30pm and I had already played well over 5 hours of chess. I only had to hope my opponent felt the same. I'll present this game in full;

All of a sudden I'm one of only 4 players out of 65 with a maximum score. I can't quite believe it. There's a whole day of chess left yet though!

Round 4 (3/3)

Throughout the tournament, I had heard several rumours a guy called Stephen Crockett; that he was one of the most active players in the country, that he always won the minor, that he was too good for the minor section.

In round 4 I was drawn against Mr Crockett (124 ECF, the top seed), on board 1 with the white pieces, which was being broadcast live over the internet. My confidence was running high with 3/3, but not that high!

I had around 4 minutes left, my opponent about 90 seconds. I don't think either of us wanted to through the last 3.5 hours of play away, so I re-offered the draw and he accepted. I was disappointed, I'd fumbled a clear win, but hey, a draw against Crockett; not a bad scalp!

Round 5 (3.5/4)

I wasn't the only one to draw my round 4 game. In fact, there were no players on 4/4 at this stage, so I was sharing pole position with 7 others; it was all to play for!

Alas, I didn't play my finest game in round 5. I was losing it by a large margin for the majority of the game, and I spent a good portion of the latter half of the match ready to push over my king in resignation. It was by sheer luck that I managed to force a draw, I didn't really deserve it. You can find the whole game (and in fact all of my other games from this season) in my microbase.


Not a bad weekend's work! I won £25, which covered my entrance fee, not that I was there for the money. I was there for the chess 🙂