Tuesday, December 11, 2012

To Infinity... and Beyond! Observing Dark Worlds


Fresh off our first machine learning competition, we were hungry for more. We decide to enter another Kaggle competition called Observing Dark Worlds. This competition was way more advanced than the digit recognition competition, but we were up for the challenge. Because this was an actual competition with real money at stake, we witheld all blog posts to the end of the competition...we also wrote all of them at the end of the competition. So, past tense.


Here's the problem posed by the competition: 

Dark matter makes up most of the universe, but -- since it doesn't emit or absorb light -- we can't see it. However, we can detect it because it warps and bends spacetime. Light emitted from starts in a galaxy will be distorted by the time it reaches the earth, causing galaxies to appear more elliptical than they normally would. Dark matter tends to appear in clusters, which are known as halos. Our job is to locate the position of these halos, given a sky of galaxies.

Affect of Dark Matter on Shapes of Galaxies
Dark matter bending light from a background galaxy.
Note that galaxies have some natural amount of ellipticity independent of the ellipticity caused by dark matter. So, the correlation between galaxies' ellipticities and the presence of dark matter is much less obvious than one might expect. The graphic below should help you to visualize this phenomenon:
(Both figures are from the project description at http://www.kaggle.com/c/DarkWorlds)


For this competition, we were given a training data set of 300 skies, each with a number of galaxies and between 1 and 3 dark matter halos. For each galaxy in a sky, we were given its position coordinates and its ellipticity. We were not given any information about the halo(s) in each sky. Our job was to determine 1) how many dark matter halos were in the sky, and 2) the position coordinates of each halo.

And that's the problem we spent the six weeks working on. The competition included 4 benchmark methods, and our personal goal was to beat all of them. The competition had its own scoring method for all entries, and to "beat" all the benchmarks, we needed to achieve a better score than them. Like golf, a lower score was a better score for this competition.


No comments:

Post a Comment