Migrating Scientific Experiments to the Cloud

By Daniel de Oliveira, Fernanda Araújo Baião and Marta Mattoso

March 4, 2011

The most important advantage behind the concept of cloud computing for scientific experiments is that the average scientist is capable of accessing many types of resources without having to buy or configure the whole infrastructure.

This is a fundamental need for scientists and scientific applications. It is preferable that scientists be isolated from the complexity of configuring and instantiating the whole environment, focusing only on the development of the in silico experiment.

The amount of published scientific and industrial papers provide evidence that cloud computing is being considered as a definitive paradigm and it is already being adopted by many scientific projects.

However, many issues have to be analyzed when scientists decide to migrate a scientific experiment to be executed in a cloud environment. The article “Azure Use Case Highlights Challenges for HPC Applications in the Cloud” presents several challenges focused on HPC support, specifically, for Windows Azure Platform. We discuss in this article some important topics on cloud computing support from a scientific perspective. Some of these topics were organized as a taxonomy in our chapter “Towards a Taxonomy for Cloud Computing from an e-Science Perspective” of the book “Cloud Computing: Principles, Systems and Applications”[11].

Background on e-Science and Scientific Workflows

Over the last decades, the effective use of computational scientific experiments evolved in a fast pace, leading to what is being called e-Science . The e-Science experiments are also known as in silico experiments [12]. In silico experiments are commonly found in many domains, such as bioinformatics [13] and deep water oil exploitation [14]. An in silico experiment is conducted by a scien-tist, who is responsible for managing the entire experiment, which comprises composing, executing and analyzing it. Most of the in silico experiments are composed by a set of programs chained in a coherent flow. This flow of programs aiming at a final scientific goal is commonly named scientific workflow [12,15].

A scientific workflow may be defined as an abstraction that allows the structured controlled composition of programs and data as a sequence of operations aiming a desired result. Scientific workflows represent an attractive alternative to model pipelines or script-based flows of programs or services that represent solid algorithms and computational methods. Scientific Workflow Management Systems (SWfMS) are responsible for the workflow execution by coordinating the invocation of programs, either locally or in remote environments. SWfMS need to offer support throughout the whole experiment life cycle, including: (i) design the workflow through a guided interface (to follow a specific scientific method [16]); (ii) control several variations of workflow executions [15]; (iii) execute the workflow in an efficient way (often in parallel); (iv) handle failures (v) access, store and manage data.

The combination of the life cycle support with the HPC environment has many challenges to SWfMS due to the heterogeneous execution environments of the workflow. When the HPC is a cloud platform, more issues arise as discussed next.

Cloud check-list before migrating a scientific experiment

We discuss scientific workflow issues related to cloud computing in terms of architectural characteristics, business model, technology infrastructure, privacy, pricing, orientation and access, as shown in Figure 1 .

Main issues in clouds for scientific applications

Pricing

Cost is one of the most important characteristics in both scientific and business domains. Since most of the public clouds adopt the pay per use model, it is important to preview the final price to be paid and to determine how the financial resources available for a scientific experiment are used. In general, the price to be paid for using clouds follow three main types (that have to be analyzed by scientists): free (normally if scientists have their own cloud), pay-per-use (pays a spe-cific value related to his resource utilization normally in hours) and bill broken (where scientists pay for using each component independent of used time). However, this evaluation is far from simple, since costs saved by cloud, such as, acquiring equipment and hiring supporting staff are difficult to calculate.

Business Model

Clouds may be classified into three main categories [17]: Software as a Service (SaaS), Infrastructure as a Service (IaaS) and Platform as a Service (PaaS), creating a model named SPI [17]. The evaluation of a cloud environment must consider the business model particularly with respect to scientific data support. In the e-Science field, the generated data is one of the most valuable resources. The SPI model does not consider services that are based on storage or databases. Thus, it is important to check models that provide Storage as a Service and Database as a Service. Storage as a Service provides access to several storage facilities that are remotely located. Database as a Service provides operations and functions of a remotely hosted database management system. Database services are particularly important in scientific experiments to store provenance data [18], so it can be queried with controlled access, what is not supported by storage services.

Architectural Characteristics

When analyzing the main architectural characteristics of clouds it is important to check and analyze the support for Virtualization, Security, Resource Sharing and Scalability. For example, clouds can occasionally relocate applications among hosts and allocate multiple applications on the same host according to resource availability. These moves and instabilities can generate negative impacts on workflow performance due to the flow of activity executions and data transfer between them. Ideally the cloud scheduler should be in sync with the SWfMS to be aware of the flow.

Privacy

Privacy is a fundamental issue in scientific experiments. Many unpublished experiments and results have to be private during the course of the experiment. We may classify cloud approaches in Private, Public and Hybrid. From the scientist point of view and in terms of privacy, the most “secure” approach is to use private clouds. In private clouds, all the security control is defined by the scientist (or a computer specialist team) which means that external access are more controlled by the scientist. However, hybrid and public clouds usually provide advanced security mechanisms (such as security policies in Amazon EC2) that guarantee the privacy of data and applications. Scientists have to analyze if the provided mechanisms are enough for their expectations.

Access

There are several types of access provided, such as (non-exhaustive list): Browsers, Thin Clients, Mobile Clients and API, for example. Analyzing the access type provided is important for scientists when choosing a cloud environment to run their experiments. Scientific experiments should be able to be accessed by different ways: web pages, mobile devices. The effective use of different technologies in scientific experiments leads to the need of different types of access. Web browsers are commonly used for accessing cloud services. It is an intuitive idea to use Web browsers since almost every computer has at least one browser installed and may access cloud services. In addition, many Web browsers are focused on cloud computing, such as Google Chrome. Thin clients and mobile are other im-portant types of access for clouds out of a desktop within handhelds or mobile phones. And finally, API is a fundamental way for accessing clouds via programming languages commands (such as Java, Python or C). Complex scientific applications usually make use of APIs to access cloud infrastructure in a native form. In this case, scientists have to analyze the access methods already used for their application and verify if this access method can be used or adapted to be migrated to a cloud environment.

Cloud Orientation

The cloud orientation differs according to the business model used. In the SaaS model, application are deployed on the cloud, and can only be invoked, i.e. all the execution control is in charge of the deployed application. In this case, we consider this approach as task centric Scientists need to transfer control to the application owners instead of having control of it during the course of the experiment. On the other hand, when the infrastructure is provided as a service (IaaS where virtualized hardware is provided to be configured and controlled), the scientist has full control of the actions. The programs that will execute, the environment configurations are chosen by scientists. In this case, we consider this approach as user centric. Scientists have to analyze which approach is more suitable for their needs. If they want to execute only one application such as bioinformatics BLAST, they can choose a task centric approach. However, if they want to try several programs, change environment configurations, the user centric approach is more suitable.

Technology Infrastructure

The technological infrastructure defines how a specific cloud approach is imple-mented. It can be based on based on grids [19], Peer-to-Peer [20], PC clouds, and cluster clouds or combination of them. This evaluation may be compromised in public clouds, such as Amazon EC2 [21], because we are not able to know which kind of technology is used to implement the cloud. However, in private clouds it is possible to obtain this information. It is quite useful because many experiments need a computational cluster or a grid to execute in parallel and produce results in a feasible time.

Conclusions

This article highlighted that despite the high interest about cloud computing from the scientific community (especially those that need to execute HPC scientific ap-plications); it is still a wide open field. Choosing the best cloud support is a step forward, but there is still a need for services focused on the scientific workflow execution to bridge the gap between the cloud and the SWfMS. SciCumulus [22] is an initiative in this direction. Some SWfMS, such as Swift [23] and Pegasus [24] are also incorporating cloud support in their systems.

About the Authors

Daniel de Oliveira is a Ph.D. student at the Department of Computer Science at the COPPE Institute from Federal University of Rio de Janeiro. He received a B.Sc. degree in 2005 and M.Sc. degree in 2008, both from Federal University of Rio de Janeiro, Brazil. He is currently working on his Ph.D. thesis in Computer Science in the same institution. His interests include Cloud Computing, e-Science, workflow management, data mining, text mining and ontologies. He is also mem-ber of IEEE, ACM and of the Brazilian Computer Society.

Fernanda Baião is a Professor of the Department of Applied Informatics of the Federal University of the State of Rio de Janeiro (UNIRIO) since 2004, where she leads the Distributed Databases Research Group. She received the Doctor of Science degree from the Federal University of Rio de Janeiro (UFRJ) in 2001. During the year 2000 she worked as a visiting student at the University of Wis-consin, Madison (USA). Her current research interests include distributed and parallel databases, data management in scientific workflows, conceptual data modeling and machine learning techniques. She participates in research projects in those areas, with funding from several Brazilian government agencies, including CNPq, CAPES and FAPERJ. She participates in several program committees of national and international conferences and workshops, and is a member of ACM and of the Brazilian Computer Society.

Marta Mattoso is a Professor of the Department of Computer Science at the COPPE Institute from Federal University of Rio de Janeiro (UFRJ) since 1994, where she leads the Distributed Database Research Group. She has received the Doctor of Science degree from UFRJ. Dr. Mattoso has been active in the database research community for more than ten years and her current research interests in-clude distributed and parallel databases, data management aspects of scientific workflows. She is the principal investigator in research projects in those areas, with funding from several Brazilian government agencies, including CNPq, CAPES, FINEP and FAPERJ. She has published over 200 refereed international journal articles and conference papers. She has served in program committees of international conferences, and is a regular reviewer of several international journals.

References

[1] N. Antonopoulos and L. Gillam, 2010, Cloud Computing: Principles, Systems and Applications. 1 ed. Springer.

[2] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, et al., 2010, A view of cloud computing, Commun. ACM, v. 53, n. 4, p. 50-58.

[3] R. Buyya, C.S. Yeo, and S. Venugopal, 2008, Market-Oriented Cloud Computing: Vision, Hype, and Reality for Delivering IT Services as Computing Utilities, In: Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications, p. 5-13

[4] E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good, 2008, The cost of doing science on the cloud: the Montage example, In: SC ’08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, p. 1-12, Austin, Texas.

[5] Y. El-Khamra, H. Kim, S. Jha, and M. Parashar, 2010, Exploring the Performance Fluctuations of HPC Work-loads on Clouds, In: Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science, p. 383–387, Washington, DC, USA.

[6] C. Evangelinos and C. Hill, 2008, Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere-Ocean Climate Models on Amazon’s EC2, Chicago, IL.

[7] I. Foster, Y. Zhao, I. Raicu, and S. Lu, 2008, Cloud Computing and Grid Computing 360-Degree Compared, In: Grid Computing Environments Workshop, 2008. GCE ’08, p. 10, 1

[8] T. Hey, S. Tansley, and K. Tolle, 2009, The Fourth Paradigm: Data-Intensive Scientific Discovery. Online book, Url.: http://emotionalcompetency.com/sci/booktoc.html.

[9] A. Matsunaga, M. Tsugawa, and J. Fortes, 2008, CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications, IEEE eScience 2008, p. 229, 222.

[10] C. Hoffa, G. Mehta, T. Freeman, E. Deelman, K. Keahey, B. Berriman, and J. Good, 2008, On the use of cloud computing for scientific workflows, In: IEEE Fourth International Conference on eScience (eScience 2008), Indianapolis, USA, p. 7–12

[11] D. Oliveira, F. Baião, and M. Mattoso, 2010, “Towards a Taxonomy for Cloud Computing from an e-Science Perspective”, Cloud Computing: Principles, Systems and Applications (to be published), Heidelberg: Springer-Verlag

[12] I.J. Taylor, E. Deelman, D.B. Gannon, M. Shields, and (Eds.), 2007, Workflows for e-Science: Scientific Workflows for Grids. 1 ed. Springer.

[13] M. Addis, J. Ferris, M. Greenwood, P. Li, D. Marvin, T. Oinn, and A. Wipat, 2003, Experiences with e-Science workflow specification and enactment in bioinformatics, Proceedings of UK e-Science All Hands Meeting, p. 459–467.
 
[14] W. Martinho, E. Ogasawara, D. Oliveira, F. Chirigati, I. Santos, G. Travassos, and M. Mattoso, 2009, A Concep-tion Process for Abstract Workflows: An Example on Deep Water Oil Exploitation Domain, In: 5th IEEE International Conference on e-Science, Oxford, UK.

[15] M. Mattoso, C. Werner, G.H. Travassos, V. Braganholo, L. Murta, E. Ogasawara, D. Oliveira, S.M.S.D. Cruz, and W. Martinho, 2010, Towards Supporting the Life Cycle of Large Scale Scientific Experiments, In-ternational Journal of Business Process Integration and Management, v. 5, n. 1, p. 79–92.

[16] R.D. Jarrard, 2001, Scientific Methods. Online book, Url.: http://emotionalcompetency.com/sci/booktoc.html.

[17] L. Youseff, M. Butrico, and D. Da Silva, 2008, Toward a Unified Ontology of Cloud Computing, In: Grid Com-puting Environments Workshop, 2008. GCE ’08, p. 10, 1

[18] J. Freire, D. Koop, E. Santos, and C.T. Silva, 2008, Provenance for Computational Tasks: A Survey, Computing in Science and Engineering, v.10, n. 3, p. 11-21.

[19] I. Foster and C. Kesselman, 2004, The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann.

[20] E. Pacitti, P. Valduriez, and M. Mattoso, 2007, Grid Data Management: Open Problems and New Issues, Journal of Grid Computing, v. 5, n. 3, p. 273-281.

[21] Amazon EC2, 2010. Amazon Elastic Compute Cloud (Amazon EC2). Amazon Elastic Compute Cloud (Amazon EC2). Dispon?vel em: http://aws.amazon.com/ec2/. Acesso em: 5 Mar 2010.

[22] D. Oliveira, E. Ogasawara, F. Baião, and M. Mattoso, 2010, SciCumulus: A Lightweigth Cloud Middleware to Explore Many Task Computing Paradigm in Scientific Workflows, In: Proc. 3rd IEEE International Conference on Cloud Computing, Miami, FL.

[23] Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. von Laszewski, V. Nefedova, I. Raicu, T. Stef-Praun, and M. Wilde, 2007, Swift: Fast, Reliable, Loosely Coupled Parallel Computation, In: Services 2007, p. 206, 199, Salt Lake City, UT, USA.

[24] E. Deelman, G. Mehta, G. Singh, M. Su, and K. Vahi, 2007, “Pegasus: Mapping Large-Scale Workflows to Dis-tributed Resources”, Workflows for e-Science,  Springer, p. 376-394.
 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

The NASA Black Hole Plunge

May 7, 2024

We have all thought about it. No one has done it, but now, thanks to HPC, we see what it looks like. Hold on to your feet because NASA has released videos  of what it is like to orbit and enter a black hole. And yes, it Read more…

2024 Winter Classic: Meet the Mentors Round-up

May 6, 2024

To make navigating easier, we have compiled a collection of all the mentor interviews and placed them in this single page round-up. Meet the HPE Mentors The latest installment of the 2024 Winter Classic Studio Update S Read more…

2024 Winter Classic: The Complete Team Round-up

May 6, 2024

To make navigating easier, we have compiled a collection of all the teams and placed them in this single page round-up. Meet Team Lobo This is the other team from University of New Mexico, since there are two, right? T Read more…

How Nvidia Could Use $700M Run.ai Acquisition for AI Consumption

May 6, 2024

Nvidia is touching $2 trillion in market cap purely on the brute force of its GPU sales, and there's room for the company to grow with software. The company hopes to fill a big software gap with an agreement to acquire R Read more…

2024 Winter Classic: Oak Ridge Score Reveal

May 5, 2024

It’s time to reveal the results from the Oak Ridge competition module, well, it’s actually well past time. My day job and travel schedule have put me way behind, but I am dedicated to getting all this great content o Read more…

Intersect360 Research Takes a Deep Dive into the HPC-AI Market in New Report

May 3, 2024

A new report out of analyst firm Intersect360 Research is shedding some new light on just how valuable the HPC and AI market is. Taking both of these technologies as a singular unit, Intersect360 Research found that the Read more…

The NASA Black Hole Plunge

May 7, 2024

We have all thought about it. No one has done it, but now, thanks to HPC, we see what it looks like. Hold on to your feet because NASA has released videos  of Read more…

How Nvidia Could Use $700M Run.ai Acquisition for AI Consumption

May 6, 2024

Nvidia is touching $2 trillion in market cap purely on the brute force of its GPU sales, and there's room for the company to grow with software. The company hop Read more…

Hyperion To Provide a Peek at Storage, File System Usage with Global Site Survey

May 3, 2024

Curious how the market for distributed file systems, interconnects, and high-end storage is playing out in 2024? Then you might be interested in the market anal Read more…

Qubit Watch: Intel Process, IBM’s Heron, APS March Meeting, PsiQuantum Platform, QED-C on Logistics, FS Comparison

May 1, 2024

Intel has long argued that leveraging its semiconductor manufacturing prowess and use of quantum dot qubits will help Intel emerge as a leader in the race to de Read more…

Stanford HAI AI Index Report: Science and Medicine

April 29, 2024

While AI tools are incredibly useful in a variety of industries, they truly shine when applied to solving problems in scientific and medical discovery. Research Read more…

IBM Delivers Qiskit 1.0 and Best Practices for Transitioning to It

April 29, 2024

After spending much of its December Quantum Summit discussing forthcoming quantum software development kit Qiskit 1.0 — the first full version — IBM quietly Read more…

Shutterstock 1748437547

Edge-to-Cloud: Exploring an HPC Expedition in Self-Driving Learning

April 25, 2024

The journey begins as Kate Keahey's wandering path unfolds, leading to improbable events. Keahey, Senior Scientist at Argonne National Laboratory and the Uni Read more…

Quantum Internet: Tsinghua Researchers’ New Memory Framework could be Game-Changer

April 25, 2024

Researchers from the Center for Quantum Information (CQI), Tsinghua University, Beijing, have reported successful development and testing of a new programmable Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel Plans Falcon Shores 2 GPU Supercomputing Chip for 2026  

August 8, 2023

Intel is planning to onboard a new version of the Falcon Shores chip in 2026, which is code-named Falcon Shores 2. The new product was announced by CEO Pat Gel Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire