Cycle Computing HPC Job Bank
HPC in the Cloud


Dedicated to covering high-end cloud computing
in science, industry and the datacenter

Language Flags

Datanami
Digital Manufacturing Report
HPCwire

Blog: Behind the Cloud

Cloud Control: Outsourcing an HPC Cluster


So, thus far in this series of posts, we have discussed the following issues:

- IT is not a core competency of the business, so we should look to outsource if we can outsource without jeopardizing the business.

- We should look to cloud computing to bring costs under control and to deliver cost efficiencies over time, not as an immediate cost reduction activity.

- In order to outsource IT, we must trust the suppliers and vendors involved, which means developing relationships, not better bludgeoning weapons. And we have already done an extremely similar divestiture in our past, so we have a model to look at that says it can be done successfully

Now we need to talk about what an organization would need to look like in order to properly manage the outsourcing of your HPC cluster. So what would that look like? Well, we should assume that all technical and operational capabilities necessary to execute the infrastructure are included in the outsource. The supplier is expected to provide the entirety of the technical function and carry out all operational duties. That is not to say that the customer is off the hook technically, just the opposite. The customer needs to assemble a small team of technically savvy, business minded (specific to the core product of your company) individuals to measure and manage the outsource. This team needs to be very strong technically in order to vet and gauge any available technologies for potential use as well as identify flaws in solutions or methodology of solutions delivered. The size of the team would be dependent on the size of the company (and therefore the size of the outsource).

Functionally, the outsource management team is the control point for the outsourcing of your infrastructure. Through this group, you maintain control over your infrastructure, and therefore can have full trust in your outsource partner (because you know exactly what you want, and you know how to measure if you are getting it). The intent of this team is to stay abreast of the constantly changing needs of the business, understand the continuously evolving capabilities of technology, and combine the two awareness’ to understand how the company should be leveraging technology to maximize benefit to the business and control costs. With that combined awareness, you now hold the outsource accountable for delivering an appropriate solution to your company’s need.

This is not to say that all responsibility falls to the customer outsource team. The supplier will need to have a disciplined focus in the specific space that your company does business, and be innovating their solution to specifically solve the problems of that industry. If they do not, then they will probably not be a cost competitive, viable supplier long term.

You will see many functions that fall under the customer outsource team. And remember, this team needs to remain small in order to avoid paying too much for your solution. There will be a constant loop for the outsource team to:

1. Quantitatively measure the current solution

2. Analyze cost and benefits of the current solution

3. Assessment of best practices

4. Revision of current solution

5. Loop back to 1

There will be several technical responsibilities that the outsource team will participate in jointly. The supplier should be doing most of this work for the customer, but how do you know if the data they are presenting is 100% accurate or appropriate for your solution. When in doubt, the outsource team will generate their own data, and share that data with the supplier to derive a more accurate solution. In that, the outsource team will do some amount of, but not every facet of:

1. Technical and cost Benchmarking

2. Technical advisory / liaison (IT industry to customer business)

3. Technical architects - Designing architecture of applications and services that are appropriate for the company’s consumption

There are many responsibilities of the outsource team that will fall into the relationship management arena. This team will be the primary point of contact and control between the customer and the supplier, and I can’t say enough how important having a positive relationship with the supplier is to the quality of the product you consumer or the price you pay for that product. The outsource team will be responsible for communicating current and future requirements to the supplier, and many of those will take on the form of Service Level Agreements (SLAs) which we will talk about in a moment. The outsource team will also be responsible for how technology is being consumed by the customer company. The outsource team needs to make sure the company is getting the appropriate solution from the supplier company at an appropriate price with appropriate constraints / limitations / boundaries.

Another very important responsibility of the outsource team will be to maintain flexibility from a quality of solution as well as a cost perspective. In this, staying standards based is very important. It is not an absolute requirement, there may be solutions that are proprietary that solve a problem much more efficiently or cost effectively. What you need to consider in this case is when the vendor thinks they have you locked in, and start raising the price because they think you can’t get out of their solution, what is your plan for defeating them. So, where possible, use industry standards so that you can move from vendor to vendor without losing time, money, or critical features. Where that is not possible, what is the plan for using one vendor’s proprietary solution but being able to migrate to another vendor’s solutions without impact to maintain negotiating position.

Finally, there is the new component to infrastructure management. The outsource team will need to learn how to define and measure service level agreements (SLAs). The definition stage will have several components. What is the service level expectation (defines success and failure criteria)? This will sometimes have many different components for a single solution. An example would be storage: is there enough capacity, do we get enough IOPs, and is there enough throughput. All of these are different measurements, but critical to a storage infrastructure for HPC. How will this service level be measured and how often? We have all seen many improper SLA measurements where IT informs the engineer that they have 99.997% availability of the environment, but the engineer knows that there were several outages that had him or her non-productive for days at a time. So do you measure component level availability or solution level? How frequently are the polls for availability? Is availability the right measure? This is all part of the definition. And then, what happens when a failure criteria is met? This is where there is a lot of work happening in the industry. It is not sufficient to refund the months colo fees when a power outage cost the company 6 weeks worth of work. There is a cost to failure, and that is usually very specific to the industry. An outage on a cluster for an EDA company has different implications than an outage to a scientific computing cluster for a university. The recourse needs to be negotiated based impact. Does this at all sound familiar? Any insurance people reading this?? Well, that is one of the solutions the industry is exploring, is having insurance policies behind the supplier. Finally, we need to look at how service levels re-assessed over time. As the technology evolves, so should the service levels.

The fabless semiconductor industry is fairly mature in it’s process for outsourcing the fabrication function. They have cost models and laws (Rock’s Law for cost of a fab over time) that help decision processes, they have a collaborative (FSA) for arriving at better process, and they have an established track record that this can be accomplished very successfully and with cost benefit. The HPC Cloud industry needs to mature. That will just take time.

 

Posted by Scott Clark - August 05, 2010 @ 9:00 AM, Pacific Daylight Time

Discussion

There are 0 discussion items posted.

Join the Discussion

Join the Discussion

Become a Registered User Today!


Registered Users Log in join the Discussion

Scott Clark

Scott Clark

Scott Clark has been an infrastructure solution provider in the EDA/Semiconductor industry for almost 20 years.

More Scott Clark

Arkeia

Recent Comments

No Recent Blog Comments

Feature Articles

Cloud Services Satisfy a Higher Calling

Higher education involves many collaborative projects that lend themselves to cloud services, however often those services are not tailored to the uniqueness of an academic environment. That's where the Internet2 NET+ project comes in. By partnering with 16 major cloud providers, the networking consortium is seeking to expedite the delivery of cloud services and by doing so advance research and innovation in the United States.
Read more...

Around the Web

NVIDIA Raises Its Game to the Cloud

May 17, 2012 | NVIDIA GeForce GRID, a cloud gaming platform announced at the 2012 GPU Technology Conference (GTC), seeks to reduce the the latency associated with cloud gaming.
Read more...

Breaking the Cloud Barrier

May 15, 2012 | New Microsoft report shows that beyond the expected financial benefits, cloud services may offer more comprehensive security features compared to in-house IT operations.
Read more...

Vendors Demo Next-Gen Sequencing Platforms for Pharma

May 14, 2012 | During the second annual Pistoia Alliance conference, three teams demonstrated their newly-implemented cloud-based next-generation sequencing platforms.
Read more...

Zunicore Offers Bare Metal by the Hour

May 10, 2012 | PEER1's cloud division, Zunicore, will soon be offering GPU-equipped servers on-demand.
Read more...

US Cloud Providers Struggle With Data Privacy Laws

May 08, 2012 | The Patriot Act leads foreign governments to question the security of US cloud services.
Read more...

Sponsored Whitepapers

Appro White Paper: Enabling Performance-per-Watt Gains in HPC

04/05/2012 | Appro | Designed to meet the growing global demand for HPC solutions, Appro's Xtreme-X™ Supercomputer delivers superior performance-per-watt and reduced I/O latency while bringing significant flexibility to HPC workload configurations including capacity, hybrid, data intensive and capability computing.

Exploring the Potential of Heterogeneous Computing

04/02/2012 | AMD | Developers today are just beginning to explore the potential of heterogeneous computing, but the potential for this new paradigm is huge. This brief article reviews how the technology might impact a range of application development areas, including client experiences and cloud-based data management. As platforms like OpenCL continue to evolve, the benefits of heterogeneous computing will become even more accessible. Use this quick article to jump-start your own thinking on heterogeneous computing.

Sponsored Multimedia

Blogs by Topics

Blogs by Author

HPC Blogroll

Intersect360 HPC500

Featured Events









HPC in the Cloud Conferences & Events