Cycle Computing HPC Job Bank
HPC in the Cloud


Dedicated to covering high-end cloud computing
in science, industry and the datacenter

Language Flags

Datanami
Digital Manufacturing Report
HPCwire

Cloud for Academia?


Grid computing was born in academia and was originally designed to support scientific and research computing. In contrast, cloud computing has a business background and is designed to enable the delivery of scalable Web applications.

The BEinGRID project has looked into how Grid is appropriate for business use (and has run several successful business experiments proving this proposition), but what about looking at whether the cloud is useful for academia? Can it be effectively used to run scientific codes, such as those found in climate modelling, fluid dynamics or molecular physics simulations, which have traditionally required the use of supercomputers?

On the face of it, cloud services offer a compelling, simple and relatively cost-effective HPC proposition -- just pay for as many CPUs as you want, when you want them. Of course, the truth isn't quite as simple as that.

Take a look at the Google Cloud offering, App Engine. Users access App Engine through an API, which places quite a lot of restrictions on the code that can be run, including:

  • It must be written either in Python or Java (or use a JVM based interpreter or compiler) -- meaning any C or Fortran codes have to be ported.
  • It can't start any threads (instead the API is used to start a new task).
  • Any single request/task must complete in 30 seconds.
  • It has to stay within quotas on CPU, bandwidth and storage usage.

For full details, see the App Engine documentation. Note that there are different quotas for the free and billed service, and that it is possible to negotiate increases to the quotas.

This doesn't make scientific computing impossible, but it does put in place a lot of barriers. It would be interesting to see what could be achieved computationally, given the above restrictions, by an academic research group that chose to use App Engine. However, the bottom line is that Google App Engine is more suited to creating dynamic web applications, such as photo and document editing tools, than to processing long-lived scientific computations.

Amazon's offering, EC2, is a lot more promising. EC2 gives users much more access and control over the system through the use of virtualisation. Users are free to install whatever software and applications they need on EC2.

Users provide "virtual images", instances of which can be launched at any time and will normally be running in under 10 minutes. By default, only 20 instances (per region) can normally be launched, but users can apply to increase this limit, potentially allowing thousands of instances to be launched. Amazon also supports Hadoop, Condor and OpenMPI for batch/parallel processing. The Data Wrangling blog has some in-depth information on using Amazon to set up an MPI cluster.

For data storage, the Amazon S3 service can be used to store the enormous amounts of data produced by some scientific applications. Access to the data is controlled via Access Control Lists (ACLs) and data is encrypted during transmission using SSL. Users are encouraged to encrypt any sensitive data being stored in S3. It is important to note that Amazon does not guarantee that data will not be lost or compromised (see point 7.2 of the AWS Customer Agreement).

So, it should be fairly easy to get most scientific computing codes running in parallel on EC2. But what's the performance like? There has been some research and the results are mixed. Comparing a roughly equivalent amount of CPU resources, super-computer clusters are typically much faster at processing scientific codes, largely due to having a better interconnect (see this article by Edward Walker). However, if we include the amount of time it takes to get the code running (i.e., to request and boot the images on EC2 and to wait in the queue on a super-computer cluster), EC2 is likely to be faster in many cases, dependent on the size of the job and scheduling policies (as shown by Ian Foster on his blog). In the future, EC2 may offer an even more competitive service if they upgrade their systems.

I haven't taken into account Microsoft Azure, which is still in "Community Technology Preview" at the time of writing, but may be interesting for any .NET based scientific codes. The offering is similar to EC2, the main difference being that users must use a supplied Windows 2008 Server VM.

With all the money and time being invested in cloud computing, it will be interesting to the see the effect it has on HPC resource providers over the next decade. Will the emergence of cloud lead to a greater trust in outsourcing compute resources and a direct boost to HPC resource providers? Will there be a level of symbiosis where cloud resources can be built on top of or alongside HPC computing resources? Or will they just be direct competition? One things for sure; the rules of the HPC game are being questioned.

For more information on using cloud platforms for scientific computing, see the HPCcloud User Group.

-----

Reprinted with permission of Grid Voices, hosted by IT-tude.com

Discussion

There is 1 discussion item posted.

CloudBerry Explorer - freeware client for Amazon S3
Submitted by cloudberryman on Nov 25, 2009 @ 12:00 AM EST


If you want to explorer Amazon S3 storage, upload data or configure ACL check out CloudBerry Explorer that helps to
manage S3 on Windows . It is a freeware. http://cloudberrylab.com/

Post #1

Join the Discussion

Join the Discussion

Become a Registered User Today!


Registered Users Log in join the Discussion

May 18, 2012

May 17, 2012

May 16, 2012

May 15, 2012

May 14, 2012

May 11, 2012

May 10, 2012

May 09, 2012

May 08, 2012


Most Read Features

Most Read Around the Web

Most Read This Just In

Most Read Blogs

Arkeia

Around the Web

NVIDIA Raises Its Game to the Cloud

May 17, 2012 | NVIDIA GeForce GRID, a cloud gaming platform announced at the 2012 GPU Technology Conference (GTC), seeks to reduce the the latency associated with cloud gaming.
Read more...

Breaking the Cloud Barrier

May 15, 2012 | New Microsoft report shows that beyond the expected financial benefits, cloud services may offer more comprehensive security features compared to in-house IT operations.
Read more...

Vendors Demo Next-Gen Sequencing Platforms for Pharma

May 14, 2012 | During the second annual Pistoia Alliance conference, three teams demonstrated their newly-implemented cloud-based next-generation sequencing platforms.
Read more...

Zunicore Offers Bare Metal by the Hour

May 10, 2012 | PEER1's cloud division, Zunicore, will soon be offering GPU-equipped servers on-demand.
Read more...

US Cloud Providers Struggle With Data Privacy Laws

May 08, 2012 | The Patriot Act leads foreign governments to question the security of US cloud services.
Read more...

Sponsored Whitepapers

Appro White Paper: Enabling Performance-per-Watt Gains in HPC

04/05/2012 | Appro | Designed to meet the growing global demand for HPC solutions, Appro's Xtreme-X™ Supercomputer delivers superior performance-per-watt and reduced I/O latency while bringing significant flexibility to HPC workload configurations including capacity, hybrid, data intensive and capability computing.

Exploring the Potential of Heterogeneous Computing

04/02/2012 | AMD | Developers today are just beginning to explore the potential of heterogeneous computing, but the potential for this new paradigm is huge. This brief article reviews how the technology might impact a range of application development areas, including client experiences and cloud-based data management. As platforms like OpenCL continue to evolve, the benefits of heterogeneous computing will become even more accessible. Use this quick article to jump-start your own thinking on heterogeneous computing.

Sponsored Multimedia

Newsletters

Intersect360 HPC500

HPC Job Bank


Featured Events









HPC in the Cloud Conferences & Events