Cloud computing isn’t just for running office productivity software or realising your startup idea. It can support high-performance computing (HPC) applications that crunch through large amounts of data to produce actionable results.
Using elastic cloud resources to process data in this way can have a real business impact. What might one of these applications look like, and how could the cloud support it?
Let’s take buildings as an example. London’s ‘Walkie Talkie’ skyscraper has suffered from a bad rap of late. First it gave the term ‘hot wheels’ a whole new meaning, melting cars by inadvertently focusing the sun’s rays on an unfortunate Jaguar, like a child turning a magnifying glass on a bug.
Then, commuters blamed it for blowing them hither and thither, thanks to the channelling of air draughts downwards. As the air travels down a building it creates flurries of wind, in what is known as the Venturi effect, or ‘downwash’.
Not only does this generate complaints from passers-by, but it may also have an adverse effect on commercial activity in the street. Commuters will avoid walking down an especially windy street and will seek out alternative routes, meaning that businesses in the street (including tenants at the foot of the skyscraper itself) will suffer.
Let’s say that another group of hypothetical architects want to avoid this problem. They have been tasked with building a skyscraper in the middle of the City, but hope to reduce or eliminate wind.
CFD to the rescue
Computational fluid dynamics (CFD) is a perfect discipline to help solve this problem. It simulates the flow of fluid and air in high resolution and in complex environments. Our architects would like to use CFD calculations to determine the best configuration and alignment of the building in relation to those around it.
This will take a lot of computing power, which makes CFD a high-performance computing problem. Simulating many different airflows among a variety of buildings calls for parallel computing resources to speed up the process. The architect firm needs to run several models based on different building designs and alignments to see which will be most appropriate for the space.
The problem for our architect firm is that it doesn’t create skyscrapers very often, meaning that buying the expensive, multi-processor computers necessary for this task would be cost-prohibitive. Instead of spending tens of thousands on computers that will mostly lie dormant and clutter up its shiny London offices, it would like to rent the power on demand. It needs a cloud computing solution.
Running CFD in the cloud involves a lot more than just dropping it into a virtual private server or two, though. High-performance computing applications impose specific requirements on infrastructure.
HPC setups typically need lots of memory to handle the heavy data volumes needed in scientific applications, and the heavy compute load needs lots of processing cores to divide up large problems and tackle them in parallel. HPC infrastructure often also needs lots of I/O bandwidth to get the data between the cores and the memory.
CFD adds yet another layer of complexity to HPC infrastructure requirements. Some high-performance computing problems such as graphics rendering are what’s known as ‘embarrassingly parallel’, meaning that they can be easily separated into lots of smaller tasks and distributed across many processors.
Conversely, CFD calculations are interconnected. All of the processing cores working on the problem need to communicate with each other as they handle their calculations. It’s a specialised workflow in an already-rarified field of computing, with requirements for high-performance network flows to shunt the data between the cores.
In theory, this means that CFD computations would be best served by a local HPC cluster, connected by high-speed Infiniband networks. Rather than shouldering that cost and inefficiency, though, our architect firm is determined to make it work in Amazon Web Services (AWS). Thanks to advances in the cloud, this is a tractable problem.
Doing CFD in the cloud
How would our architects run their CFD calculations? One of the most popular options is OpenFOAM, the CFD package released as open source in 2004. This software was acquired by SGI and subsequently by ESI Group in 2012, but before its sale the original creators also founded the OpenFOAM Foundation, to ensure that it remained an open source project. They also founded CFD Direct, a UK company that makes most of the contributions to the open source development code.
OpenFOAM may have been released before cloud computing gained traction as a commercial concept, but development has kept pace with the cloud. CFD Direct runs OpenFOAM on AWS for the kind of CFD calculations that an architectural airflow analysis firm might need.
AWS provides basic cloud computing capabilities in the form of its EC2 instances, but also features virtual machines specifically tailored for HPC applications. There are several of these, including C4 instances, which customers can rent in five sizes.
C4 EC2 instances use up to 36 virtual CPUs (that’s 18 physical Intel Xeon Haswell core-based processors) that offer features well-suited to HPC. Their 2.0GHz base clock speed can increase to 3.5GHz using Intel’s Turbo Boost technology.
The C4 Xeons feature version 2 of the Advanced Vector Extensions (AVX) that first shipped with the Sandy Bridge processor architecture in 2011. This is an evolution of the vector-based instruction sets first seen in supercomputers in the late sixties. They enable processors to use a single instruction that works on multiple data points, creating significant efficiencies when conducting similar calculations on the large numbers of data points needed in CFD workloads.
Incidentally,AWS will soon launch C5 instances based on a later version of the Xeons with Skylake cores. These will upgrade the AVX extensions to the more AVX-512 version.
After signing up to AWS, the architect would first spin up an instance of Ubuntu with OpenFoam. They would then connect to the cloud via SSH and begin running simulations. If they needed more than the 36 vCPUs in a C4 instance, they could cluster them using a master instance linked to several slave instances.
Cluster-enabled CFD uses software from a variety of libraries available on top of an AWS-based EC2 HPC cluster. MPICH is one such library, while OpenMP is another. They make use of the messaging passing interface (MPI) to exchange messages between the nodes during the CFD calculations.
Faster networking for CFD data loads
Clustering nodes like this introduces an additional networking overhead for those running CFD computations. They must optimize the connections between the nodes in the cluster to accommodate the heavy I/O requirements that we’ve already described. This happens in several ways.
One tool at their disposal is Amazon EC2 enhanced networking. This increases the packet frequency between servers using single root I/O virtualization (SR-IOV), which is a virtualization method optimized for higher I/O performance and lower CPU usage. The administrator for the application would enable this via an Elastic Network Adaptor (ENA) within the EC2 instance.
Placement groups are another feature of AWS networking that enable nodes in a virtual cloud-based CFD cluster to work together more effectively. It configures nodes to be as close together as possible, reducing the latency between them and guaranteeing 10Gbit/sec network speeds.
The architects may use enhanced networking and placement groups to solve inter-node communication, but they still have to consider communications between the nodes and shared storage. In AWS , they’ll be using Elastic Block Storage (EBS) to store data for their CFD application.
EBS comes in several flavours, spanning magnetic hard disk drive (HDD) and solid state disk (SSD). CFD Direct says that HDD will handle the lion’s share of CFD workloads. The C4 instances boost throughput from the processing cores, providing between 500Mbps and 2000Mbps/sec extra throughput to AWS’s EBS resource. The C5s will boost this to 12Gbps.
GPUS and enhanced administration
Another way to manage networking overheads is to reduce the number of CPU processor cores supporting the job. To this end, the architects running the airflow simulations would add the kind of double-point precision floating point computing capabilities that you can only get with dedicated GPUs.
AWS has added these capabilities with its P2 EC2 Instance, which uses NVIDIA Tesla K80 GPUs. A single instance can include up to 16 of these cores in conjunction with up to 64 vCPUs based on v4 (Broadwell) Xeon processors.
The architects may have in-house experts to manage the building airflow simulations, but may not have the administration skills to spin up and manage DIY open source CFD deployments. Perhaps they are working with an engineering consultant who has used an application other than OpenFoam. In this case, there are alternatives for managing HPC resources in AWS.
Rescale provides a cloud-based HPC platform called ScaleX, available in three configurations suiting individuals or SMBs, software developers, or enterprise customers.
Our architecture firm might use the lower-end version, ScaleX Pro, for its purposes. This gives it the ability to create private clusters for its CFD airflow analysis using a point and click interface and then customize the hardware configurations.
It can then deploy whichever third-party simulation software it needs, including OpenFOAM or others, to the clusters it has created. ScaleX’s workflow management features will also let the architecture firm connect multiple simulation software, should it need to run different kinds of simulation as part of its building project.
Rescale’s visualization software will also let the architecture firm and its engineering team visualize the simulation results over a standard internet connection, so that they can see just how likely shop signs and commuters are to blow over in their building’s little segment of the Square Mile.
Scaling CFD simulations while keeping costs low
How well can HPC in the cloud scale in practice, and what kind of cost savings can companies expect? A lot depends on the cloud, and the charging model.
From a performance perspective, ISVs have shown that running HPC simulations in the cloud can scale at near-ideal rates past 1000 cores, highlighting the cloud’s ability to manage communications between multiple nodes at rates close to those you’d achieve with a local HPC cluster.
Ansys showed that the type and configuration of CFD application is important when considering scalability in the cloud. Its studies found that staying above 50,000 cells (individual data points in a CFD analysis) per processing core gave the best clustering performance when using a 10Gbps network.
Having said that, not all clouds are created equal. Seattle, WA-based engineering services firm TLG Aerospace conducts CFD analyses for clients in the aerospace sector. It was using a cloud provider to handle its simulation jobs, but found that the cost per simulation was high, and that it couldn’t scale its cluster size appropriately.
The company switched to AWS, and took advantage of a pricing model that helps to drive down costs for complex simulations: Spot Instances.
AWS customers can pay for guaranteed EC2 instances using the EC2 On Demand service, but they can also add other instances if they’re available by bidding for them on the open market. Administrators set a maximum price that they’re willing to pay, along with an instance type and quantity.
When instances become available for that price, the administrator’s application begins using them until the market price rises above their stated bid. Then, the cloud interrupts the spot instance, allocating it elsewhere until the price falls again.
TLG used more than 60 EC2 Spot instances per simulation case, which helped it to slash CFD simulation costs by 75%, while also allowing it to scale up the computing power available for its simulation beyond 1000 nodes. Its experience will help it to scale even larger projects in the future, it said.
What does this mean for our architecture firm? It could run a variety of simulations, encompassing several building designs at a range of alignments. CFD in the cloud enables them to rotate all their building designs through 360 degrees at a one-degree resolution, each time visualizing the airflow through the streets below.
If they find themselves falling behind as deadlines loom, they could throw more computing power at their problem by bidding for EC2 instances on the Spot market using Amazon’s Spot Bid Advisor tool, which presents historic bid pricing and helps them set their bid for EC2 instances on the fly.
At the end of their simulation, they may find that one of ten building shapes, combined with metal fins near ground level and a seven degree rotation, would give them the least disruptive wind effects on people and property at street level. This would help them ease the building through the permit process and enable the client to show due diligence in its design process, helping to sell space in the property to those on the lower floors.
This level of computing power simply may not previously have been available to an architect firm that doesn’t do high-performance computing for a living. By using the elastic capabilities of the cloud, it can do a better job of designing its building for the client, without breaking the bank.
The end result? The client might be blown away, but thankfully the rest of London won’t.
OpenMP is not a message passing library. This is a mutli-threaded programming model for shared memory supercomputers.
The right name is OpenMPI which is a message passing library like MPICH.
Please make the correction.
Thank you for mentioning doing CFD in the cloud. I am starting a new business this year. I will find a great mesh-free particle-based CFD software service as well for this.