How Supercomputing Can Deliver Key COVID-19 Research

Supercomputing provides a ton of computing power that data scientists have put to work during the COVID-19 pandemic to research new treatments and provide models on how to trace contacts. 

Data models powered by supercomputing can track the infections from social gatherings, noted John Kolb, vice president for information services and technology and CIO at Rensselaer Polytechnic Institute (RPI).

Supercomputing not only brings compute power but a large amount of data storage to pandemic research. The speed of data exchange is valuable during calculations. “One of the things that makes it such an amazing thing as a computer is getting the data very quickly to the computation part of the system, doing those calculations, and then getting the data back out again,” Kolb said. “So we can move an awful lot of data very quickly through that system.”

Companies such as IBM are actively involved in using supercomputing to fight the pandemic. Dave Turek, VP of Technical Computing for IBM Cognitive Systems, suggested that using supercomputing or high-performance computing (HPC) to fight the pandemic requires an understanding of science, mathematics and software. 

IBM has formed the COVID-19 High-Performance Computing Consortium to support COVID-19 research. The consortium is a collaboration between tech vendors such as Microsoft and Amazon, the White House Office of Science and Technology Policy, the U.S. Department of Energy and several higher education partners like Massachusetts Institute of Technology (MIT) and RPI.

Pandemic Supercomputing as Teaching Tool

Christopher D. Carothers, director of the Center for Computational Innovations and professor of computer science at RPI, teaches a course called Parallel Programming and Computing. He instructs students on how to create models on how the COVID-19 spreads using the Artificial Intelligence Multiprocessing Optimized System (AIMOS) supercomputer, which has 252 compute nodes and a top processing speed of 1048.6 teraflops. 

A final student project on COVID-19 ties together elements they learned in the course about GPU programming (both individual and multi-GPU), compute nodes with multiple GPUs and a parallel file system. The goal of the student project: Use COVID as a driver for understanding parallel computing and how fast the simulation could go, Carothers said: “They took data that we knew publicly about how the virus was spreading and adapted that for their application, and were able to get some interesting, good performance results.”

Another project involves partners such as IBM, Harvard University and RPI using supercomputing for triaging of medical symptoms and helping medical professionals decide on treatments for patients, such as whether to prescribe a ventilator. Another RPI professor, Malik Magdon-Ismail, has also developed a model of how the virus spreads. The model uses the New York State Department of Health’s reports on COVID-19 infections in multiple counties. 

Treatment Research

Oak Ridge National Laboratory (ORNL) and the University of Tennessee have used IBM’s Summit supercomputer to screen 8,000 compounds to find which ones will bind to the coronavirus “spike” protein so that it will not infect host cells. Through computation, the University of Tennessee and ORNL recommended 77 small-molecule drug compounds that showed promise for experimental testing.

“One of the ways people are looking at trying to make an impact on COVID-19 is to examine the way in which molecules and atoms sort of interact with each other,” Turek said. “And the reason they want to do that is they want to apply certain molecules or proteins to the spike protein on the virus to inhibit its ability to attack a human cell.”

Jeremy C. Smith is the Governor’s Chair at the University of Tennessee and director of the UT/ORNL Center for Molecular Biophysics at ORNL. He said the lab uses the Summit supercomputer for two types of calculations: to simulate drug targets and to dock millions of chemicals to targets to find which ones will “stick.” In 2021, ORNL plans to introduce an exascale supercomputer called Frontier, which will improve on the speed of Summit by a factor of 10.

“That will usher in the exascale era in computing, which will lead to even more possibilities for doing these calculations quickly and in more detail,” Smith said. The additional speed will help with identifying proteins for COVID-19 “because it will allow us to do calculations on more chemicals and more accurately.”

The Value of Supercomputing Skills

Although a foundation for supercomputing does require classes in statistics and mathematics, finding cures for a deadly virus does not require an entire degree in supercomputing. You can take some courses in data science, ML, or deep learning, Turek said, adding that, to develop the next vaccine, students should consider a degree such as molecular biochemistry or virology: “But in terms of the invocation of computing, to help you figure out what that is, you would take some courses in data science or machine learning or deep learning.”

To get involved in researching pandemics, the work involves more than supercomputing, Smith noted. A data scientist would need to learn how to optimize codes for physics, chemistry and genomics and run the code quickly on “massively parallel machines,” Smith said. 

Planning for a Future Pandemic

Going forward, scientists will continue to rely on computational models as they have to forecast the influenza virus each year, according to Turek. These methods include statistics to track how long it takes for a virus to mutate, how much time is required to manufacture a vaccine and what type of vaccine should be used. 

“I think one of the things that will happen is there’ll be a lot of focus on speculatively looking at the evolution of COVID-19, and how the therapies that people are working on now would need to change to accommodate that,” Turek said.

The data from millions of trials will help medical researchers plan for the COVID-19 pandemic as well as future pandemics, Carothers said. 

The ability to shrink the vaccine cycle is a valuable contribution of supercomputing. Vaccines usually take 10 years to develop, but some experts believe we could have a COVID-19 vaccine by the end of 2020, Carothers noted. Supercomputing will be able to help with managing the supply chain for drugs and vaccines in future pandemics. 

Another use of supercomputing will be to study evolutionary paths in illnesses to see if they present a risk, Turek said: “Then if you can begin to understand that, you could speculatively build therapeutic agents before they got to a point of creating that pandemic.” 

The key, of course, is using supercomputing to predict the next pandemic before it happens, Smith noted: “If we can predict, for example, how the coronavirus mutates, how it’s likely to mutate and what that will imply for how it functions with humans, I think we can predict what might inhibit mutants of the coronavirus… Then we would have an immediate answer when the pandemic starts.”

Visit our COVID-19 Resource Center, which aims to provide the tech community with the best, most up-to-date information on the novel coronavirus.