Cray Builds Big Data Supercomputer

Cray’s modified CS300 big-data supercluster

Supercomputing pioneer Cray has changed course with a new high-performance computer designed to handle Big Data rather than simulate nuclear explosions.

Super-expensive, super-powerful high-performance computing (HPC) clusters from Cray are typically used for high-energy physics research, the design and virtual testing of nuclear weapons, and in off-center efforts such as paleontological reconstructions of the image and mechanical stride of the largest dinosaurs ever to walk the earth.

A new design called Catalyst is meant for more number crunching and less modeling and simulation, but not because of demand from corporations needing bigger computers to analyze bigger and bigger sets of customer-behavioral data, according to the first Catalyst customer.

Supercomputers have been able to simulate the mathematically consistent shapes and behavior of subatomic particles for years, but ran into problems with squishier scientific disciplines, according to Fred Streitz, director of the Institute for Scientific Computing research and High Performance Computing Innovation Center at LLNL.

Even supercomputers lacked “the number-crunching power” to take into account all the variables, idiosyncrasies and interdependent interactions involved in even simple biological processes, Steitz told attendees at a conference on biomedical research held in Napa, Calif. in September.

Unlike a decade ago, modern supercomputers can simulate the look and behavior of a beating human heart in close to real time, but the real impact of supercomputing on biomedical research will come from the same kind of analysis marketers use to prepare for Black Friday or identify new market opportunities, he said.

Medical researchers use statistical analysis as heavily as any other field, but only on carefully selected subsets of the data available – examining details of the chemical exchange of information between specific types of receptors on paired neurons, for example, rather than what might be going on with all the other receptors on a single neuron and its neighbors at the same time that one piece of data is passed, for example.

That level of organic complexity has been far too great for large-scale analysis until recently, according to Ken Turtletaub, leader of the Bioscience and Biotechnology division at LLNL, speaking at the same conference.

“Bioscientists are realizing that they’re sitting on a gold mine of data and they don’t know quite how to get to it,” he said. “It’s all there and they paid for it all. How do you get out the information back out?”

Catalyst is designed to provide modern levels of supercomputing, with a special ability to handle the number-crunching and vast data volumes more common in bioscience than physics, according to an announcement from LLNL, which helped design the new machine and took delivery of the first model in October.

The first Catalyst will be put to work on problems of environmental science and energy as well as high-energy physics, but will specialize in questions such as predicting the viability of mutated viruses, pace of climate change, and predictive models of systems that are complex because they’re designed to do more than just explode.

Catalyst may be aimed at Big Data, but is still a supercomputer.

The modified Cray CS300 cluster is able to run at 150 teraflops per second on 324 nodes, 7,776 cores built from 12-core Intel Xeon processors and 128GB of Dynamic RAM per node. It also comes with 800 GB of non-volatile RAM per node in the form of Intel solid state drives (SSD) with high-speed PCIe connections.

Catalyst is the result of a collaborative redesign directed by LLNL – a major supercomputing customer – Cray and Intel Corp.

The most significant design change was a big increase in the amount of both volatile and nonvolatile memory for each node, which delivers more power floating-point calculations and data analysis for bioinformatics, language processing and business intelligence, according to a statement from the companies. “To research new ways of translating Big Data into knowledge, we had to design a one-of-a-kind system,” said Raj Hazra, Intel vice president and general manager of the Technical Computing Group.

The design collaboration between vendors and customer also represent a new step in HPC according to Matt Leininger, deputy of Advanced Technology Projects for LLNL. “The partnership between Intel, Cray and LLNL allows us to explore different approaches for utilizing large amounts of high performance non-volatile memory in HPC simulation and big data analytics,” he said in the Labs’ published statement.

 

Image: Cray

Related