Utility Supercomputing: Powering Up Science

By the power of Grayskull.

Such a weedy-looking chap was Prince Adam. However, in the blink of an eye he could quickly lift his magical sword into the sky and poof! instantly transform into “He-Man: The Most Powerful Man in the Universe,” all ready for action.

Alas, science is not like the cartoons. Very rarely (if ever) do you see physicists and biologists in their labs eagerly thrusting their ballpoints skywards and demanding to have “More Power!” (Well, you might—but if you do, it is likely they may have been drinking on the job.)

Unfortunately, science has to fit inside the facilities we are given: a grant, an instrument, and sometimes just a regular old bucket. Once that science overflows the capacity, you are going to have to buy a bigger bucket. The problem is that, for large data-driven science, these buckets are extremely expensive and also can’t just be created magically from the ether!

Or can they?

Enter Stage Left: Utility Supercomputing

We have all seen the rise of clouds, IaaS, PaaS etc. etc. but to tame the hardest and most challenging issues that face huge data-driven science, you clearly need a whole new approach. Science and analytics now consume literally petaflops and petabytes of capacity, spread over multiple machines, with algorithms that run for weeks, months and years at a time.

This does not sound like a good fit for a pay-as-you-go cloud infrastructure, does it? Well, turns out it is possible to orchestrate servers at high speed—as long as you have really elegant configuration management systems, some equally clever algorithms and some even smarter people driving it all.

Because of public cloud we really do have the ability to rapidly boot, configure and execute science at scales we never thought possible before, and that is a really good thing. But we can’t do this alone—it takes a community.

Why?

Our research into the human genome, cancer, climate, finance, engineering and advanced physics and chemistry have one massive thing in common: They are all extremely hard and difficult problems. Regardless, we as a human race have to try and solve these difficult challenges.

We just can’t sit here idle. We have to do our part to make the world a more habitable and more awesome place for ourselves, our children, their children, and their children’s children. Novel therapeutics and compounds that allow us to live longer, have significantly better outcomes and survive major trauma have been designed using in-silico methods for years now. Each and every day, we all rely on more sophisticated and advanced computation to help us understand the world around us. This, coupled with massive cohort studies of people suffering from major public health issues, are generating ever-larger sets of disease traits that have to be analyzed for correlations and (hopefully) eventual cures.

As we continue to peer deeper into what makes us human, we are generating larger, more complex data that needs increasingly powerful computing to unravel. Such machines are incredibly expensive, complicated and difficult to manage and operate.

In particular, the configuration management and monitoring of system health attributes play a huge role. Being able to “take the temperature” of a large cluster is vital to enable it to perform at scale. At scale, you need substantial improvements in accelerating file system access. When you reach 10,000+ hosts, the filesystem also has to be made to act in parallel. Two popular open-source approaches (there are many) include glusterfs and Hadoop. The first clusters machines and disk to provide POSIX file access (lots of science code is still written in FORTRAN!) at scale or in replication. The second allows for map and reduce algorithms to slice and carve up huge datasets and make them available for search. Which is fortunate, as much of our modern science is based on search.

We should also mention “eventual consistency” as another huge concept that is accelerating science. No longer are we tied to ACID models for relational database access; no longer do we need to queue up all the requests behind a single database system; no longer do we need to interact with grumpy DBAs. Riak, Cassandra and a number of others all enable this concept of “schema free” databases, making rapid access possible from thousands of hosts at a time while also delivering consistent and reliable results.

We also have a lot for which to thank high throughput Websites. Without the load pressures placed on Web servers by the inherent “millions to few” (or what folks have called ‘the Slashdot Effect’), we would never have learned techniques to effectively balance and manipulate scientific workloads, distributing them over clusters of thousands of machines.

No article is complete without mentioning virtualization. We have seen huge advances in rapid orchestration of scientific services because of virtualization. Being able to start up and clone machines in seconds makes for faster science. KVM has also allowed us to do this for very low cost. However, as progress of “virtualized networks,” or software-defined networks accelerates, it is safe to say you ain’t seen nothing yet. Imagine a very near future where your scientific workload is free to move around the globe, locating the best compute and data resources for you while you go take a nap. It’s really not that far away.

I call these combined advancements: “The Rise Of DevOps.”

Basically, the openness of computer operating systems (in particular GNU/Linux), advanced computer languages, configuration management, networks, file systems, database developments and wonderfully complex mathematics libraries (which we can now easily orchestrate) have finally allowed us to take on some of our largest scientific challenges.

So the next time you boot up a virtual machine, configure a network, install a spot of code to automate the tedium, write some Erlang, start a cluster, debug a kernel module–remember that although on the inside you may be feeling slightly more “Adam” than “He-Man,” we are each helping society and the human race to be bigger, better and significantly stronger than we have ever been before.

So, yes in summary it turns out, that we actually do “have the power” after all.

 

James Cuff is the Chief Technology Officer for Cycle Computing, which provides utility supercomputing and high-performance computing orchestration and management for science and analytics.

Image: Vintage Action Figures

Post a Comment

Your email address will not be published.