Migrating Scientific Experiments to the Cloud

By Daniel de Oliveira, Fernanda Araújo Baião and Marta Mattoso

March 4, 2011

The most important advantage behind the concept of cloud computing for scientific experiments is that the average scientist is capable of accessing many types of resources without having to buy or configure the whole infrastructure.

This is a fundamental need for scientists and scientific applications. It is preferable that scientists be isolated from the complexity of configuring and instantiating the whole environment, focusing only on the development of the in silico experiment.

The amount of published scientific and industrial papers provide evidence that cloud computing is being considered as a definitive paradigm and it is already being adopted by many scientific projects.

However, many issues have to be analyzed when scientists decide to migrate a scientific experiment to be executed in a cloud environment. The article “Azure Use Case Highlights Challenges for HPC Applications in the Cloud” presents several challenges focused on HPC support, specifically, for Windows Azure Platform. We discuss in this article some important topics on cloud computing support from a scientific perspective. Some of these topics were organized as a taxonomy in our chapter “Towards a Taxonomy for Cloud Computing from an e-Science Perspective” of the book “Cloud Computing: Principles, Systems and Applications”[11].

Background on e-Science and Scientific Workflows

Over the last decades, the effective use of computational scientific experiments evolved in a fast pace, leading to what is being called e-Science . The e-Science experiments are also known as in silico experiments [12]. In silico experiments are commonly found in many domains, such as bioinformatics [13] and deep water oil exploitation [14]. An in silico experiment is conducted by a scien-tist, who is responsible for managing the entire experiment, which comprises composing, executing and analyzing it. Most of the in silico experiments are composed by a set of programs chained in a coherent flow. This flow of programs aiming at a final scientific goal is commonly named scientific workflow [12,15].

A scientific workflow may be defined as an abstraction that allows the structured controlled composition of programs and data as a sequence of operations aiming a desired result. Scientific workflows represent an attractive alternative to model pipelines or script-based flows of programs or services that represent solid algorithms and computational methods. Scientific Workflow Management Systems (SWfMS) are responsible for the workflow execution by coordinating the invocation of programs, either locally or in remote environments. SWfMS need to offer support throughout the whole experiment life cycle, including: (i) design the workflow through a guided interface (to follow a specific scientific method [16]); (ii) control several variations of workflow executions [15]; (iii) execute the workflow in an efficient way (often in parallel); (iv) handle failures (v) access, store and manage data.

The combination of the life cycle support with the HPC environment has many challenges to SWfMS due to the heterogeneous execution environments of the workflow. When the HPC is a cloud platform, more issues arise as discussed next.

Cloud check-list before migrating a scientific experiment

We discuss scientific workflow issues related to cloud computing in terms of architectural characteristics, business model, technology infrastructure, privacy, pricing, orientation and access, as shown in Figure 1 .

Main issues in clouds for scientific applications

Pricing

Cost is one of the most important characteristics in both scientific and business domains. Since most of the public clouds adopt the pay per use model, it is important to preview the final price to be paid and to determine how the financial resources available for a scientific experiment are used. In general, the price to be paid for using clouds follow three main types (that have to be analyzed by scientists): free (normally if scientists have their own cloud), pay-per-use (pays a spe-cific value related to his resource utilization normally in hours) and bill broken (where scientists pay for using each component independent of used time). However, this evaluation is far from simple, since costs saved by cloud, such as, acquiring equipment and hiring supporting staff are difficult to calculate.

Business Model

Clouds may be classified into three main categories [17]: Software as a Service (SaaS), Infrastructure as a Service (IaaS) and Platform as a Service (PaaS), creating a model named SPI [17]. The evaluation of a cloud environment must consider the business model particularly with respect to scientific data support. In the e-Science field, the generated data is one of the most valuable resources. The SPI model does not consider services that are based on storage or databases. Thus, it is important to check models that provide Storage as a Service and Database as a Service. Storage as a Service provides access to several storage facilities that are remotely located. Database as a Service provides operations and functions of a remotely hosted database management system. Database services are particularly important in scientific experiments to store provenance data [18], so it can be queried with controlled access, what is not supported by storage services.

Architectural Characteristics

When analyzing the main architectural characteristics of clouds it is important to check and analyze the support for Virtualization, Security, Resource Sharing and Scalability. For example, clouds can occasionally relocate applications among hosts and allocate multiple applications on the same host according to resource availability. These moves and instabilities can generate negative impacts on workflow performance due to the flow of activity executions and data transfer between them. Ideally the cloud scheduler should be in sync with the SWfMS to be aware of the flow.

Privacy

Privacy is a fundamental issue in scientific experiments. Many unpublished experiments and results have to be private during the course of the experiment. We may classify cloud approaches in Private, Public and Hybrid. From the scientist point of view and in terms of privacy, the most “secure” approach is to use private clouds. In private clouds, all the security control is defined by the scientist (or a computer specialist team) which means that external access are more controlled by the scientist. However, hybrid and public clouds usually provide advanced security mechanisms (such as security policies in Amazon EC2) that guarantee the privacy of data and applications. Scientists have to analyze if the provided mechanisms are enough for their expectations.

Access

There are several types of access provided, such as (non-exhaustive list): Browsers, Thin Clients, Mobile Clients and API, for example. Analyzing the access type provided is important for scientists when choosing a cloud environment to run their experiments. Scientific experiments should be able to be accessed by different ways: web pages, mobile devices. The effective use of different technologies in scientific experiments leads to the need of different types of access. Web browsers are commonly used for accessing cloud services. It is an intuitive idea to use Web browsers since almost every computer has at least one browser installed and may access cloud services. In addition, many Web browsers are focused on cloud computing, such as Google Chrome. Thin clients and mobile are other im-portant types of access for clouds out of a desktop within handhelds or mobile phones. And finally, API is a fundamental way for accessing clouds via programming languages commands (such as Java, Python or C). Complex scientific applications usually make use of APIs to access cloud infrastructure in a native form. In this case, scientists have to analyze the access methods already used for their application and verify if this access method can be used or adapted to be migrated to a cloud environment.

Cloud Orientation

The cloud orientation differs according to the business model used. In the SaaS model, application are deployed on the cloud, and can only be invoked, i.e. all the execution control is in charge of the deployed application. In this case, we consider this approach as task centric Scientists need to transfer control to the application owners instead of having control of it during the course of the experiment. On the other hand, when the infrastructure is provided as a service (IaaS where virtualized hardware is provided to be configured and controlled), the scientist has full control of the actions. The programs that will execute, the environment configurations are chosen by scientists. In this case, we consider this approach as user centric. Scientists have to analyze which approach is more suitable for their needs. If they want to execute only one application such as bioinformatics BLAST, they can choose a task centric approach. However, if they want to try several programs, change environment configurations, the user centric approach is more suitable.

Technology Infrastructure

The technological infrastructure defines how a specific cloud approach is imple-mented. It can be based on based on grids [19], Peer-to-Peer [20], PC clouds, and cluster clouds or combination of them. This evaluation may be compromised in public clouds, such as Amazon EC2 [21], because we are not able to know which kind of technology is used to implement the cloud. However, in private clouds it is possible to obtain this information. It is quite useful because many experiments need a computational cluster or a grid to execute in parallel and produce results in a feasible time.

Conclusions

This article highlighted that despite the high interest about cloud computing from the scientific community (especially those that need to execute HPC scientific ap-plications); it is still a wide open field. Choosing the best cloud support is a step forward, but there is still a need for services focused on the scientific workflow execution to bridge the gap between the cloud and the SWfMS. SciCumulus [22] is an initiative in this direction. Some SWfMS, such as Swift [23] and Pegasus [24] are also incorporating cloud support in their systems.

About the Authors

Daniel de Oliveira is a Ph.D. student at the Department of Computer Science at the COPPE Institute from Federal University of Rio de Janeiro. He received a B.Sc. degree in 2005 and M.Sc. degree in 2008, both from Federal University of Rio de Janeiro, Brazil. He is currently working on his Ph.D. thesis in Computer Science in the same institution. His interests include Cloud Computing, e-Science, workflow management, data mining, text mining and ontologies. He is also mem-ber of IEEE, ACM and of the Brazilian Computer Society.

Fernanda Baião is a Professor of the Department of Applied Informatics of the Federal University of the State of Rio de Janeiro (UNIRIO) since 2004, where she leads the Distributed Databases Research Group. She received the Doctor of Science degree from the Federal University of Rio de Janeiro (UFRJ) in 2001. During the year 2000 she worked as a visiting student at the University of Wis-consin, Madison (USA). Her current research interests include distributed and parallel databases, data management in scientific workflows, conceptual data modeling and machine learning techniques. She participates in research projects in those areas, with funding from several Brazilian government agencies, including CNPq, CAPES and FAPERJ. She participates in several program committees of national and international conferences and workshops, and is a member of ACM and of the Brazilian Computer Society.

Marta Mattoso is a Professor of the Department of Computer Science at the COPPE Institute from Federal University of Rio de Janeiro (UFRJ) since 1994, where she leads the Distributed Database Research Group. She has received the Doctor of Science degree from UFRJ. Dr. Mattoso has been active in the database research community for more than ten years and her current research interests in-clude distributed and parallel databases, data management aspects of scientific workflows. She is the principal investigator in research projects in those areas, with funding from several Brazilian government agencies, including CNPq, CAPES, FINEP and FAPERJ. She has published over 200 refereed international journal articles and conference papers. She has served in program committees of international conferences, and is a regular reviewer of several international journals.

References

[1] N. Antonopoulos and L. Gillam, 2010, Cloud Computing: Principles, Systems and Applications. 1 ed. Springer.

[2] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, et al., 2010, A view of cloud computing, Commun. ACM, v. 53, n. 4, p. 50-58.

[3] R. Buyya, C.S. Yeo, and S. Venugopal, 2008, Market-Oriented Cloud Computing: Vision, Hype, and Reality for Delivering IT Services as Computing Utilities, In: Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications, p. 5-13

[4] E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good, 2008, The cost of doing science on the cloud: the Montage example, In: SC ’08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, p. 1-12, Austin, Texas.

[5] Y. El-Khamra, H. Kim, S. Jha, and M. Parashar, 2010, Exploring the Performance Fluctuations of HPC Work-loads on Clouds, In: Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science, p. 383–387, Washington, DC, USA.

[6] C. Evangelinos and C. Hill, 2008, Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere-Ocean Climate Models on Amazon’s EC2, Chicago, IL.

[7] I. Foster, Y. Zhao, I. Raicu, and S. Lu, 2008, Cloud Computing and Grid Computing 360-Degree Compared, In: Grid Computing Environments Workshop, 2008. GCE ’08, p. 10, 1

[8] T. Hey, S. Tansley, and K. Tolle, 2009, The Fourth Paradigm: Data-Intensive Scientific Discovery. Online book, Url.: http://emotionalcompetency.com/sci/booktoc.html.

[9] A. Matsunaga, M. Tsugawa, and J. Fortes, 2008, CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications, IEEE eScience 2008, p. 229, 222.

[10] C. Hoffa, G. Mehta, T. Freeman, E. Deelman, K. Keahey, B. Berriman, and J. Good, 2008, On the use of cloud computing for scientific workflows, In: IEEE Fourth International Conference on eScience (eScience 2008), Indianapolis, USA, p. 7–12

[11] D. Oliveira, F. Baião, and M. Mattoso, 2010, “Towards a Taxonomy for Cloud Computing from an e-Science Perspective”, Cloud Computing: Principles, Systems and Applications (to be published), Heidelberg: Springer-Verlag

[12] I.J. Taylor, E. Deelman, D.B. Gannon, M. Shields, and (Eds.), 2007, Workflows for e-Science: Scientific Workflows for Grids. 1 ed. Springer.

[13] M. Addis, J. Ferris, M. Greenwood, P. Li, D. Marvin, T. Oinn, and A. Wipat, 2003, Experiences with e-Science workflow specification and enactment in bioinformatics, Proceedings of UK e-Science All Hands Meeting, p. 459–467.
 
[14] W. Martinho, E. Ogasawara, D. Oliveira, F. Chirigati, I. Santos, G. Travassos, and M. Mattoso, 2009, A Concep-tion Process for Abstract Workflows: An Example on Deep Water Oil Exploitation Domain, In: 5th IEEE International Conference on e-Science, Oxford, UK.

[15] M. Mattoso, C. Werner, G.H. Travassos, V. Braganholo, L. Murta, E. Ogasawara, D. Oliveira, S.M.S.D. Cruz, and W. Martinho, 2010, Towards Supporting the Life Cycle of Large Scale Scientific Experiments, In-ternational Journal of Business Process Integration and Management, v. 5, n. 1, p. 79–92.

[16] R.D. Jarrard, 2001, Scientific Methods. Online book, Url.: http://emotionalcompetency.com/sci/booktoc.html.

[17] L. Youseff, M. Butrico, and D. Da Silva, 2008, Toward a Unified Ontology of Cloud Computing, In: Grid Com-puting Environments Workshop, 2008. GCE ’08, p. 10, 1

[18] J. Freire, D. Koop, E. Santos, and C.T. Silva, 2008, Provenance for Computational Tasks: A Survey, Computing in Science and Engineering, v.10, n. 3, p. 11-21.

[19] I. Foster and C. Kesselman, 2004, The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann.

[20] E. Pacitti, P. Valduriez, and M. Mattoso, 2007, Grid Data Management: Open Problems and New Issues, Journal of Grid Computing, v. 5, n. 3, p. 273-281.

[21] Amazon EC2, 2010. Amazon Elastic Compute Cloud (Amazon EC2). Amazon Elastic Compute Cloud (Amazon EC2). Dispon?vel em: http://aws.amazon.com/ec2/. Acesso em: 5 Mar 2010.

[22] D. Oliveira, E. Ogasawara, F. Baião, and M. Mattoso, 2010, SciCumulus: A Lightweigth Cloud Middleware to Explore Many Task Computing Paradigm in Scientific Workflows, In: Proc. 3rd IEEE International Conference on Cloud Computing, Miami, FL.

[23] Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. von Laszewski, V. Nefedova, I. Raicu, T. Stef-Praun, and M. Wilde, 2007, Swift: Fast, Reliable, Loosely Coupled Parallel Computation, In: Services 2007, p. 206, 199, Salt Lake City, UT, USA.

[24] E. Deelman, G. Mehta, G. Singh, M. Su, and K. Vahi, 2007, “Pegasus: Mapping Large-Scale Workflows to Dis-tributed Resources”, Workflows for e-Science,  Springer, p. 376-394.
 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Point. The system includes Intel's research chip called Loihi 2, Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Research senior analyst Steve Conway, who closely tracks HPC, AI, Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, and this day of contemplation is meant to provide all of us Read more…

Intel Announces Hala Point – World’s Largest Neuromorphic System for Sustainable AI

April 22, 2024

As we find ourselves on the brink of a technological revolution, the need for efficient and sustainable computing solutions has never been more critical.  A computer system that can mimic the way humans process and s Read more…

Empowering High-Performance Computing for Artificial Intelligence

April 19, 2024

Artificial intelligence (AI) presents some of the most challenging demands in information technology, especially concerning computing power and data movement. As a result of these challenges, high-performance computing Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Poin Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Resear Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel’s Xeon General Manager Talks about Server Chips 

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire