Behind the Cloud | Main Blog Index
July 17, 2010
A few days ago Amazon announced that it added HPC capabilities to EC2. This is great news indeed for the HPC community, because it further paves the way for HPC to becoming mainstream, which indicates there is cloud money to be made with HPC. Obviously, Amazon did a careful market analysis, and certainly got some requests from important users, and perhaps felt some pressure after rumors surfaced that Google’s server farmers are playing with Infiniband.
In more detail, Amazon added so-called Cluster Compute Instances (CCI) to EC2, each consisting of a pair of quad-core Intel X5570 (Nehalem) processors with a total of 33.5 ECU (EC2 Compute Units), 23 GB of RAM, and 1690 GB of local instance storage. CCIs are interconnected using a 10 Gbps Ethernet network. Within this network you can create one or more placement groups of type "cluster" and then launch CCIs within each group. Instances within each placement group of this type benefit from non-blocking bandwidth and low latency node to node communication. First benchmark results from LBNL show their HPC applications on CCIs ran 8.5 times faster than on the previous (vanilla) EC2 instance types.
So far so good. To me, in this context, there are two aspects which seem interesting: performance and price. Let’s look at performance first:
To achieve high performance, many HPC application programs have been optimized in the past for high execution speed, e.g. through parallelization of numerical algorithms, speeding up communication, overlapping communication with computation, and other sophisticated tricks. Thus an application programmer’s limits are mostly set by physical boundaries: e.g. if processors or interconnects are slow. Now that Amazon has added fast processors and Ethernet to its server farm, there is no surprise to see some interesting speedups over the standard EC2 servers and interconnection. But Ethernet is not Infiniband, and you still face the cloud’s virtualization layer, which may cause different parallel (virtual) processes sitting on different cluster compute instances which still may cause communication delays. Fair enough, Amazon admits that the only way to know if you got a genuine HPC setup for your specific application is to benchmark it, which is anyway a general wisdom in HPC (We should ask Ed Walker to repeat his NAS Parallel Benchmark tests from 2008 now on the new CCIs).
Looking closer into the TOP500 list, Amazon’s Linpack on 880 CCIs (7040 cores) and 41.82 TeraFLOPS is giving them the 146th position. Other Ethernet based supercomputers with a similar position have similar numbers of cores, no surprise. BUT, those with a similar position and with Infiniband interconnect need only about 4800 cores to achieve the same performance.
Therefore, my guess is that most of the average real HPC capability computing applications (e.g. in electronic design automation, automotive applications, or finite-element based material analysis) won’t show a big performance improvement over vanilla EC2 instances, especially those which really demand low latency and high bandwidh. But, fortunately, not all of the HPC applications have this demand; especially the many ones under the umbrella of Capacity Computing with more loosely coupled parallelization (and thus moderate to no communication) might benefit from this improvement. And the fact that the Berkeley LBNL expert team has been involved in early beta, and certainly in consulting AWS and doing a lot of HPC benchmarking, gives the whole project real credibility. I am sure we will soon see some good results, lessons learned, and recommendations from LBNL; at the latest when Kathy Yellick from LBNL will give her keynote at the ISC Cloud Conference in Frankfurt on October 29.
Another fact that Amazon seems to take HPC serious now is Cycle Computing’ s announcement to schedule HPC jobs on AWS Compute Clusters with Oracle Grid Engine resource manager which (as former Sun Grid Engine) is widely used today on HPC clusters and private clouds in research and industry.
Still the best solution for the HPC user would be if you were able to select between Ethernet and Infiniband, to switch virtualization on and off, and to chose between slower and faster CPUs, and multi-core optimization software such as MCOpt from eXludus. But building and maintaining such a variable cloud infrastructure for the small HPC community might not be economic, or might become much more expensive for the user than to maintain her own internal HPC cluster.
The second important factor in this scenario is price. Let’s look at Amazon’s Linpack benchmark on its CCIs. Each CCI as described above costs $1.60 per hour. Amazon ran Linpack on 880 CCIs (7040 cores) and measured the overall performance at 41.82 TeraFLOPS giving them the 146th position on the TOP500 list. The cost for such a CCI cluster is 1.6*24*30*880 = $1M per month and $12M per year (and if you select Reserved Clusters the price will be 4.3M per year). Not cheap.
In industry, many HPC infrastructures are so well managed that they are at top utilization for almost all the time and their capacity is also tuned to be sufficient for their regular workloads. If capacity needs are trending upwards then they adjust.
But, there is one use case for which Amazon’s CCI can be very useful: What if a department has a fire-drill project for which additional resources are needed immediately, and if only for a restricted period of time? Today, nobody has a simple solution to address this. It takes six months on average to procure, deploy and activate new resources. So they either can't do it, or they delay other work to free up capacity for the urgent project to run. That might work for a project requiring a few hours or days of access to resources but not for something requiring months. And apparently many companies have those fire-drill problems in some regularity.
And that's exactly where they would like to use clouds. They'd have the ability to come back and say to the user: "the option we can offer to you is more expensive and has worse performance plus you need to be aware of certain security and data privacy issues but if you're willing to put up with that then we can provide you a solution." And by virtue of existing Cloud Adapter software (for cloud bursting as exemplified and simplified by the Service Domain Manager for OGE) the end-user will actually get the illusion to work inside his company’s regular HPC environment.
Thus, usage of clouds is anticipated to add more options and flexibility to their current IT infrastructure which by all means will be maintained and even will grow. If you are a large corporation and if you have an established and well managed (!) data center then operating your own is more effective. The picture looks different if you are a small or medium sized enterprise which starts getting its feet wet on such infrastructure. Or for that matter also if you are a larger corporation and you have troubles with your data center as it stands. Instead of going through the learning curve of getting your own infrastructure "right" you might choose just to rely on a service like AWS.
Dr. Wolfgang Gentzsch is the General Chair for ISC Cloud'10, taking place October 28-29, in Frankfurt, Germany. ISC Cloud'10 will focus on practical solutions by bridging the gap between research and industry in cloud computing. Information about the event can be found at the ISC Cloud event website. HPC in the Cloud is a proud media partner of ISC Cloud'10.
Posted by Wolfgang Gentzsch - July 17 @ 2:03PM, Eastern Daylight Time
(Digg, Technorati, more)
There are 1 discussion items posted.
Cloud Adapter
Submitted by miha123 on 07/17/2010 - 9:38AM
The Cloud Adapter is part of the module Services Domain Management (SDM) of Oracle Grid Engine. It allows a cloud to get resources from an external provider (for now Amazon Web Services) in order to meet the Service Level Agreements with the users
Post #1
Wolfgang Gentzsch is Advisor to the EU project Distributed European Infrastructure for Supercomputing Applications (DEISA), a member of the Board of Directors of the Open Grid Forum, and a contributing editor to HPC in the Cloud.
More Wolfgang Gentzsch
Re: Virtualization is Not Cloud...But Does Make It Shine by pcalcada
Re: Virtualization is Not Cloud...But Does Make It Shine by miha123
Re: Virtualization is Not Cloud...But Does Make It Shine by dparrilla
Re: Virtualization is Not Cloud...But Does Make It Shine by Scott
renewable energy powered IT by Paul Halsey
Re: HPC, the Cloud, and Core Competency by Scott
I agree with Scott by null
Excellent post Miha! by apurkiss
Cloud Adapter by miha123
Fresh air at Univa by miha123
Re: Elite HPC and the Cloud Culture Clash by Badri
Consistent with what I've been seeing by rgillen
Presentation available by in_the_crease
IB EDR is 104Gb/s by in_the_crease
IB EDR is 104Gb/s by in_the_crease
Of a simple number and overflow by Wolfgang Gentzsch
Of Number and Overflow by jbernstein
Parallelism by Scott
ISV part 2 by ScottClark
ISV part 1 by ScottClark
Impact on the ISV community? by peterdenyer
Parallelism by Ron Van Holst
Why we chose Cloud Computing (part 2) by JonathanWeedon
Why we chose Cloud Computing (part 1) by JonathanWeedon
about vmware by faheem
Storage vendor 3PAR has been at the heart of an intense bidding war between HP and Dell due to its unique refinements and developments in virtualized storage platform concepts. Thin provisioning and a focus on the needs of large-scale enterprises and cloud providers have catapulted the company into the public eye but as 3PAR's Craig Nunes discusses with HPC in the Cloud, the cloud strategy has been consistent since 1999--even if the world is just taking notice now.
Read More...
The concept of private clouds is gaining traction and due to the buzz, more enterprises are taking a much closer look at the possibility—if they haven’t taken steps to virtualize some or all of their infrastructure already. For those who have not yet made the transition, a lack of understanding of the complex process behind private cloud implementation is at the core of hesitancy, therefore vendors are looking for ways to convince users to fear not, the private cloud is not only within reach—but simple to step into.
Read More...
Companies in competitive domains, such as financial services, create large data repositories containing significant amounts of data collected from daily operations. Using supercomputers to analyze these massive datasets might yield the highest level of performance, but this is prohibitively expensive. Using proprietary, custom-built HPC atop cloud environments is also a viable option--although one that does come with a series of drawbacks that must be mitigated to achieve critical performance levels.
Read More...
Aug 31 | Application delivery strategies must be shaped with flexibility in mind as the number of platforms delivering core applications is bound to change with time. Since a greater number of devices and platforms are entering the infrastructure mix, those who do not adapt quickly face being locked into strategies that do not mesh well with new developments. Read more...
Aug 27 | Storage virtualization has been gaining momentum as it moves from concept to practice but some suggest the offerings in this realm have not matured sufficiently and require a longer maturation process before wider adoption occurs. Read more...
Aug 27 | Although it was lost in the chaos of the 3PAR bidding war between, HP announced news that it acquired cloud service automation firm Stratavia to bolster its cloud management offering and further its strategy in the arena. Read more...
Aug 26 | In an interview from the NASA IT Summit last week, the agency's CIO, Linda Cureton weighs in on developments with Nebula platform and the adoption of the open source code by other agencies looking to the cloud. Read more...
Aug 24 | While private clouds are getting far more attention than they received at the beginning of the cloud buzz boom, the realities of the complexities of actual building them--not to mention the financial and time investments--are often overlooked. Read more...
Aug 30 | | Enterprises face a paradox today: while workers become increasingly distributed, IT infrastructure is rapidly consolidating. Virtualization has made it possible to create consolidated, elastic pools.
May 14 | | Empower business users, scientists and researchers with their own grid computing infrastructure in the cloud.
This Webinar will highlight the four critical areas of concern when securing cloud infrastructure services and managed enterprise applications.
Escalating energy and operational costs of building and maintaining data centers are forcing enterprises to adopt cloud computing models. But are Infrastructure as a Service (IaaS) solutions like IBM's Computing on Demand (CoD) really cost effective? Join the discussion as industry experts discuss how you can exploit cloud computing for maximum ROI.