Photo courtesy of GlaxyZoo.
There are times in my PhD when I feel overwhelmed by the amount of
data I have to process on my own....although compared to some
scientific disciplines, I have got off reasonably lightly! At the Gatsby
Annual Network Meeting, Chris Lintott, Professor of Astrophysics at
Oxford University, Introduced us to one solution - take to the web and
recruit an army of virtual helpers!
"The
problem of having too much data is now present across all the sciences
but astronomy got there slightly earlier" Chris began. Most famous for
presenting the BBC series The Sky at Night, Chris has spent his
research career investigating whether there is an underlying structure
to the universe. "In some parts of the universe, there is a lot of stuff
and in others, not so much" he said. "It may seem random but there does seem
to be some sort of honeycomb-like structure to it all..." One way to
test this is to look at the shape ( or 'morphology' if you want a more
technical term!) of galaxies as "the shape of a galaxy can show us the
interactions that formed it - like an integrated history of 30 billion
years".
In the early days, when the most modern
telescopes could only photograph so many galaxies at a time, the
Professors themselves would study these images. In the 1980s, technology
improved so that thousands of galaxies could be captured at once, so
the job was passed onto PhD students, the traditional 'willing
workhorse' of the lab. But now up to millions of galaxies can be imaged
and the field has hit a data processing wall. You might ask, surely the
minds that built these telescopes and satellites could work out a way
to automate the process? Unfortunately not. "The job involves
recognising fuzzy patterns and you just cannot teach this to a machine
with 99% accuracy" said Chris. But PhD students can only do so much (and
I should know!). Chris cited the example of Kevin who managed 50,000
galaxies in a week before "telling us where to stuff it". After bribing
Kevin with beer and begging the Vice-Chancellor for more PhD students
both failed Chris knew he needed a radically new solution.
Professor Christ Lintott in action! (Photo - Chris Lintott)
And
so in 2007, GalaxyZoo was born. The idea was that interested members of
the public who wanted to do their bit for science could sign up,
classify a few galaxies in their lunch break and so help Chris and his
colleagues work their way through the mountain of images. Although their
expectations were low, within a day of launching the website had
rocketed its way to the top of the BBC NEWS story board ( just pipped to
the first post by 'Man flies to wedding a year early...'). Chris and
his colleagues could hardly believe it - people actually wanted to get
involved and help! "We were soon doing a Kevin-weeks' worth of
classifications in an hour!" he said. The researchers realised that they
had created a "distributed supercomputer" - albeit one with opinions
and which volunteered it's time in unpredictable ways. In addition, they
realised that this collective approach significantly improved accuracy;
after all, if 7 out of 10 people think a galaxy has a sprial shape, the
estimate of confidence is greater than a single person's verdict.
Another
advantage of using people rather than machines soon became apparent.
Whilst computers can only look for what you tell them to, "people can
become distracted by the unusual". Perhaps the most famous example of
this is Hanny's Voorwerp, a rare astronomical phenomenon spotted by
Dutch school teacher Hanny van Arkel. After noticing a strange green
'blob' on one of the GalaxyZoo photographs, she mentioned it on the
website's discussion forum, where it caught Chris's attention. "Everyone
was referring to it as a 'Voorwerp', which we thought was a very
technical term, so we used it too" said Chris. "But it turns out it
actually just means 'thingy' in Dutch". It also turned out that the blob
in question was a 'quasar ionisation echo', where a powerful emission
of light triggers star formation. In short, the astronomical world was
delighted, Hanny became famous and GalaxyZoo participants rushed to be
the next to put their name to a new Voorwerp.
|
Very pretty data.... (Photo courtesy of Galaxy Zoo) |
This
might be helping the researchers to solve the riddles of the universe
but now there was a new mystery: why exactly did ordinary people feel
compelled to spend time sorting through photographs of galaxies?
Surprisingly, when GalaxyZoo participants were surveyed, only 12.4% gave
an interest in astronomy as a reason. By far, the most powerful
incentive was a genuine desire to contribute to science. "There is
clearly nothing magical about galaxy morphology itself" said David.
"Rather, this was something that anyone could do in their lunch break to
feel useful".
This suggested that similar
online projects on different topics could also be a success. And so, in
December 2009, Chris helped to launch Zooniverse, a general platform for
'poor sod' projects ( as in ' Which poor sod is going to have to sort
through all this data on their own?'). Since then, this repository has
only grown and grown - now you can count penguins, spot wildlife in the
Serengeti, identify rare orchids and help sort through the archives of
the Natural History Museum to name but a few. These online interfaces
are also being put to uses beyond curiosity-led research : during the
disaster relief efforts following the Nepal earthquake, online
volunteers helped to spot affected villages on aerial photographs that
were missing from official maps. Meanwhile, biodiversity surveys are
already demonstrating the impacts of climate change, including
documenting plants that are now flowering up to 10 days earlier.
|
But what SHAPE is it?! (Photo courtesy of Galaxy Zoo) |
|
|
|
But
won't people eventually get fed up with looking at galaxy photographs? (
even the hardcore devotees that have clocked up a million galaxies so
far?) And surely the growing choice of projects will mean that fewer
people will participate in each of them? Possibly, but Chris is already
using the data generated from GalaxyZoo to preempt loss of interest.
"These projects are themselves fascinating studies of human behaviour"
Chris said. "We can now model with 70% accuracy a person's drop out rate
within the next five galaxies". Hence, at the critical moment, the
participant is automatically sent a grateful email, reminding them of
their valuable contribution and to encourage them to keep going! Some
projects try a 'gamification' approach, including the protein-folding
game, FoldIt. Here, participants manipulate a protein structure to find
the most energetically favourable structure ( and thus,the most probable
structure to occur in nature). By solving these 3D puzzles,
participants can access higher levels and harder challenges.
But
as satellite technology advances ever onwards, and the data mountains
grow ever higher, oerhaos one day even online volunteer networks will
not be enough. So now researchers are investigating whether it is
possible to use the wealth of results generated so far to educate a
machine to do the job automatically. "After all, the face-recognition
system on FaceBook uses a similar principle" said David. "Every time we
tag someone, we are training a machine to do it automatically next
time".
|
The Zooniverse Portal |
It can only be a good thing to make
scientific research more accessible to the public who, after all, fund
many of these projects through their taxes. Furthermore, online
platforms demonstrate that you don't need a degree to help investigate
the unexplained. Now...I wonder if I can convince anyone to help me
count the number of parasites in my photographs?