Astrophysicists at MIT and the Pawsey supercomputing center in Western Australia have discovered a whole new role for supercomputers working on big-data science projects: They’ve figured out how to turn a supercomputer into a router.
Make that a really, really big router.
The supercomputer in this case is a Cray Cascade system with a top performance of 0.3 petaflops—to be expanded to 1.2 petaflops in 2014—running on a combination of Intel Ivy Bridge, Haswell and MIC processors.
The machine, which is still being installed at the Pawsey Centre in Kensington, Western Australia and isn’t scheduled to become operational until later this summer, had to go to work early after researchers switched on the world’s most sensitive radio telescope June 9.
The Murchison Widefield Array is a 2,000-antenna radio telescope located at the Murchison Radio-astronomy Observatory (MRO) in Western Australia, built with the backing of universities in the U.S., Australia, India and New Zealand. Though it is the most powerful radio telescope in the world right now, it is only one-third of the Square Kilometer Array—a spread of low-frequency antennas that will be spread across a kilometer of territory in Australia and Southern Africa. It will be 50 times as sensitive as any other radio telescope and 10,000 times as quick to survey a patch of sky.
By comparison, the Murchison Widefield Array is a tiny little thing stuck out as far in the middle of nowhere as Australian authorities could find to keep it as far away from terrestrial interference as possible.
Tiny or not, the MWA can look farther into the past of the universe than any other human instrument to date. What it has found so far is data—lots and lots of data. More than 400 megabytes of data per second come from the array to the Murchison observatory, before being streamed across 500 miles of Australia’s National Broadband Network to the Pawsey Centre, which gets rid of most of it as quickly as possible.
Most of the data is stored at Australia’s International Centre for Radio Astronomy Research (ICRAR) in Perth, where it is available for use by researchers around the world. Quite a lot is sent directly and automatically to universities supporting the project, however. MIT’s Haystack Observatory in Cambridge, Mass.—nearly as far away as it is possible to get from Perth or Pawsey—has one direct high-speed connection to the data stores, through which it gets automatic transfers of as much as 4TB per day, more than 150TB total in the four days the MWA has been active.
To store the full data feed would take three 1TB hard drives about every two hours, though most university partners don’t want the whole thing.
“MIT researchers are interested in the early universe so we use filtering techniques to control what data is copied from the Pawsey Center archive to the MIT machines,” according to a statement from Professor Andreas Wicenec of The University of Western Australia node of ICRAR. “The technical challenge isn’t just in saving the observations but how you then distribute them to astronomers from the MWA team in far-flung places so they can start using it.”
The Pawsey Centre will store about 3 petabytes at a time, but distribute it using an open-source archive and storage system called the Next Generation Archive System (NGAS), which Wicenec developed. The system keeps track of where in the world copies of the data are stored, but is designed to retrieve information from the nearest source. A data request from a researcher at MIT would be filled, to as great an extent as possible, from data already transferred to MIT. The rest would be brought up from Perth using a highly efficient dataflow management system.
“Controlling data for a widely distributed user group on this scale is a challenge that’s being faced more and more frequently in science and other fields, but nothing suitable existed that could solve this problem for us,” according to Chen Wu, associate professor at the University of Western Australia.
Only a portion of the Pawsey Centre is taken up with the work of storing and distributing the really big data from the Murchison Widefield Array. The rest of the capacity of the center and its supercomputer is used for other science projects. That may change as the other two segments of the Square Kilometer Array are built and come online, however, depending on whether the SKA hears enough from the universe to fill up all the rest of the supercomputing resources Pawsey has to offer.
Image: Paul Bourke/UWA Perth