Citizen scientists' roles change as Big Data sweeps astronomy

Two recent articles highlighted developments that will change the role of citizen scientists in astronomy. MIT Technology Review wrote about scientists who created a machine learning system for identifying galaxies. (arXiv: 1503.07077) Astrobites wrote about an effort to automate the discovery of supernovae and other transient events. (arXiv: 1504.02936) Both galaxy-spotting and supernova-detecting have been the focus of crowdsourced citizen science projects like Galaxy Zoo and Snapshot Supernova, but now software outperforms the world’s amateurs in both accuracy and the volume of data processed.

What does it mean for amateurs? Software is displacing citizen scientists for the same reasons citizen scientists displaced the professionals. The original Clickworkers project and its descendants at the Zooniverse and Cosmoquest grew out of scientists’ frustrations. In the past they would get a few images from an observatory, but turn-of-the-century projects like the Sloan Digital Sky Survey and the fleet of planetary missions produced millions of images. Software wasn’t up to the task. It couldn’t detect indistinct objects in those images, like galaxies and craters, as reliably as human vision. Scientists had to review each image by hand - a laborious process that barely made a dent in the growing pile of images.

Crowdsourcing lets the public do the work despite their lack of expertise. The projects overcome that by combining the contributions of thousands of volunteers. Through the wisdom of the crowd, the average answer is as accurate as an expert analyst's judgement. The catalogs these citizen science projects create let astronomers conduct more sophisticated research. Consider for example the successes of the first Galaxy Zoo project:

  • 100,000 volunteers
  • 40,000,000 classifications
  • 300,000 galaxies

The four phases of Galaxy Zoo combined have yielded 59 peer-reviewed publications on the structure and evolutions of galaxies.

The Thirty Meter Telescope, seen here in an artist's illustration, is one of several astronomy projects that will generate a deluge of data in the coming decades. Credit: TMT Observatory

A new generation of observatories like the Large Synoptic Survey Telescope will generate so much data that even the world’s citizen scientists couldn’t review it all. The Sloan Digital Sky Survey’s archives, for example, grew to over 120 terabytes in its first fifteen years. The LSST will generate 30 terabytes every night. The Square Kilometer Array - even in its early phase - will generate 160 terabytes of raw data every night.

Fortunately for science, the power of computing and the sophistication of software has increased just as dramatically as the flow of data. New algorithms like the ones published last week will process the flood of data in real-time with a greater accuracy than the wisdom of the crowd. Citizen scientists in their hundreds of thousands can’t hope to keep up with the data deluge any more than individual astronomers could a mere ten years ago. 

When Dutch schoolteacher Hanny van Arkel saw a glowing green blob in one of the Galaxy Zoo images, she asked in the project's forum whether anyone knew what it was. That triggered a research project that involved scientists and observatories around the world. The green blobs are filaments of gas surrounding a galaxy. The now-dormant black holes at the center of those galaxies, called Active Galactic Nucleii, once bombarded the filaments with intense radiation. The glowing ionized gas is an echo of the galaxy's distant past. Scientists use the filaments to study the history of the AGNs and the properties of their host galaxies. Credit: Nasa/Esa/W. Keel (University of Alabama, Tuscaloosa)

But all is not lost for the citizen scientist. The new software isn’t that smart straight out of the box. It has to learn what to look for by analysing a large dataset that already has the right answers. And the only way to create the training set? Citizen science. Amateurs will continue to make new and interesting science happen by helping the professionals make next-generation algorithms better.

Amateurs have another trait that software still can't match - curiosity. Software only looks for what it’s told to. Teach software to find spiral galaxies and it will churn through terabyte after terabyte to produce a list of every spiral galaxy. But it won’t stop and say “hmm that’s different, what is it?” The often-told tale of Hanny’s Voorwerp is the archetype for citizen scientists’ role in discovery. From intergalactic gas clouds to yellow balls in stellar nurseries, amateurs' curiosity has led to discoveries no software could produce.

Amateurs’ roles are changing as technology advances, but their roles will be just as crucial for astronomy research by making software smarter and discovering new things in the Universe. As Galaxy Zoo co-founder Chris Lintott told me in an earlier email exchange:

I’m convinced the real future is in systems that combine humans and machines, deciding on the fly what’s missing.