"To Boldly Go where No One has Gone Before." -- Professor Stephen Hawking ended his 2008-04-21, NASA Lecture on Why go into Space?
By Tyler O’Neal -- In honor of the new Star Trek movie being released today, I wrote a story about spacecraft design in the real-world. Star Trek takes place in a quasi-utopian society where the central role is played not by money, but rather by the need for exploration and knowledge.
NASA has recruited the outstanding professionals at Orbital Sciences Corporation to perform fundamental research aimed at producing a novel understanding and design for the next-generation spacecraft.
The company wants to make space technology more affordable, accessible and useful to millions of people on Earth.
HP technology is driving the future of space exploration and knowledge at Orbital.
In order to deliver successful products on time to NASA, Orbital’s engineers have deployed an HP Cluster Platform 4000BL (CP4000BL) supercomputing cluster because it is powerful and reliable enough to run their highly complex computational fluid dynamics (CFD) analyses.
This system is the most popular solution in supercomputing today. In fact, the readers selected HP BladeSystem c-Class blade servers for the 2006 Product of the Year Award.
“The analyses we run are some of the most complicated equations in physics; therefore, they demand extraordinary computational power. The HP Cluster Platform provides us with the high performance and reliability we need to precisely simulate flight situations,” said Dr. Luis Bermudez, Orbital Engineer.
Prior to the HP blade cluster installation, Orbital ran its CFD on an in-house symmetric multiprocessing (SMP) system. As the amount and complexity of work increased, Orbital realized they needed more supercomputing power.
• Commercial CFD structured flow solvers
• NASA/US DoD CFD applications
• Pre/Post processing tools
• Moab Cluster Suite provides Web-based job management, graphical cluster administration and management reporting tools
Primary system software
• Red Hat Enterprise Linux 4/5
• Open MPI Message Passing Libraries
• HP Cluster Platform CP4000BL
– 3x c7000 BladeSystem enclosures
– 24 nodes; each dual-processor HP ProLiant BL465c
– InfiniBand interconnect
• HP Personal Workstations:
– HP xw8200 Workstations
– HP xw8400 Workstations
• HP Care Pack Services
• Standard start-up & implementation services
Dr. Bermudez went on to explain that Orbital is responsible for validating the design assumptions developed by NASA. The design assumptions involve extremely complex flow fields, resulting in enormous computational problems.
“We must numerically simulate the flow fields, creating a virtual wind tunnel in our computer,” said Bermudez. “These analyses are some of the most complicated equations describing our physical world, therefore they demand extraordinary computation power.”
“It is important for us to have an extremely reliable system. Some of the analyses we run can take 2-3 weeks, so we can’t tolerate system downtime interruptions,” said Dr. Luis Bermudez.
Before selecting HP, Orbital looked at several competing solutions. “In the past we have tested our application on different platforms, and HP always outperformed,” Bermudez stated. “We wanted a distributed memory system that was also powerful and reliable. The HP CP4000BL proved to be the best choice. It provides us with the high performance and reliability we need to precisely simulate flight situations.”
The HP Cluster Platform 4000BL is based on the HP BladeSystem c7000 infrastructure, which consolidates and packages all supporting computing elements: compute, storage, network, and power into a single cabinet that smoothes data center integration and optimization. The c-Class enclosure features an innovative new cooling technology that uses built-in instrumentation to monitor changes in workload demand and environment. An onboard administrator then adjusts power load and thermal controls automatically to maintain system performance within the constraints of the power and cooling capacity of a data center.
It was a quick and easy installation with immediate results. HP helped Orbital configure the system based on the demanding requirements of their CFD codes. The configuration consists of three rack mounted BladeSystem c7000 enclosures with twenty-four compute nodes based on dual-core AMD Opteron processors. The c7000 enclosure is 10U high and holds up to 16 server and/or storage blades, plus optional redundant network and storage interconnect modules. It includes a shared, 5 terabit per second high-speed midplane for wire-once connectivity of server blades to network and shared storage.
Orbital decided to implement dual management nodes for redundancy and configured the cluster with an InfiniBand interconnect. The cluster is running Red Hat Enterprise Linux with a collection of fluid dynamics, numerical simulation, and commercial applications. The cluster interfaces to a workgroup of HP Personal Workstations running Red Hat Enterprise Linux.
“When we implanted the new HP cluster, we were amazed to see how fast it really was—we started running larger jobs right away. As we move from preliminary reviews to critical reviews of the design specs, our analyses must match the increased accuracy required at those levels,” Bermudez stated. “The high performance CP4000BL allows us to provide a powerful solution that adapts to the size of the problem.”
For example, one CFD simulation of six million cells shows the CP4000BL cluster to be over five times faster than the previous SMP system (20.8 seconds/cycle with 64 processors vs. 111 seconds/cycle with 16 processors). But Bermudez points out “It’s not really the reduced compute time that’s the point, it’s that in the same amount of wall clock time, the higher performance system lets us solve problems that we couldn’t even touch previously.”
“From a technical standpoint, the CP4000BL cluster is by far the best piece of hardware I have ever worked with.“ I’ve been a system administrator for more than twelve years, and I’ve managed hardware from all of the big names in the industry. The HP CP4000BL cluster is one of my favorite platforms,” said Don Collins of Orbital’s East Coast IT Operations.
Bermudez agrees, “All of the engineers love it; we haven’t had any problems. We are very pleased with the HP BladeSystem and the role it is helping us play in making future space travel safer.”
Of course, Orbital’s engineers are producing simulations at much higher resolutions, with more detailed physics, and a greater amount of input data. Yet does the extra supercomputing power ensure their predictions are more accurate or more likely to be right than a simpler model? Real lives and business decisions with a high impact on society will rest solely on these supercomputer predictions.
NASA supplies most of the code and data to Orbital. Plus, the company’s engineers have started using Monte-Carlo class algorithms, and more.
Thankfully, most developers of supercomputer models test them before releasing them for real design work. They may compare the predictions with those from other models, perhaps using different principles or algorithms or with physical testing, historical data, or other known data points. Supercomputers are often called into use for this purpose. In fact, one of the biggest roles of supercomputer simulations in science and engineering is exploring the validity of the models. Models are pushed to extreme scales, data sets and boundary conditions, to help establish the confidence that they will be safe with less extreme parameters.
Many users of models are rigorous about validating their predictions, especially those users with a strong link to the advancement of the model or its underpinning science. But, unfortunately, not all users of models are so scrupulous. They think the model must be right, after all, it is running at a higher resolution than before, or with physics algorithm v2.0, or some other enhancement, so the answers must be more accurate. Or they assume it is the model supplier's job to make sure it is correct. And yes, it is but how often do users check that their prediction relies on a certified part of parameter space.
The assumptions of increasing scale can be misleading. Higher resolutions do not guarantee more accurate results. For example, is the code numerically capable of handling the smaller or larger floating-point numbers involved? Do the algorithms remain stable over the larger number of iterations? Does the code use a subroutine or library call that may not have been certified? Fault tolerance is becoming critical, not just in node failures, but in errors such as data corruption in memory or in the interconnect.
From my understanding, modern engineers don’t just design spacecraft, they try to understand the physical processes. NASA will find that something is useful, but they need to understand why and how. They can do tests, but they just tell you it could be the case. But with simulation, they get very detailed data that tell you the why and how of the mechanics. Orbital’s engineers work is novel in providing a very high level of accuracy and in further developing simulations. Their work adds to the tools needed for understanding complex flows.
Wind tunnel simulations can only provide limited capabilities in revealing the time-dependent process, whereas supercomputer simulations with millions of grid points over hundreds of thousands of time-steps can provide high spatial and temporal resolution. That is to say, they can provide a lot more information than experiments. But the experimental work is necessary to check if the simulations are correct. They complement each other.
That’s why NASA supports the company. Because they’re developing something completely new. HP’s expertise will also enable Orbital’s engineers to explore parts of the design that are not possible any other way.
The results of Orbital’s research may still be years away, but when you hear about (or step into) a future spacecraft, flying in space, remember that the designs and space technology were put there by computational scientists like Bermudez and Collins and systems like their supercomputer.