January 07, 2013
Several lectures from the VSCSE Summer School on Science Clouds (July 30, 2012) are now available for viewing on YouTube. The presentations provide a clear and concise overview on the state of cloud and virtualization technologies with a particular focus on MapReduce.
These free, online lectures are part of the MOOC movement – referring to massive open online course. MOOCs are the product of an open education ethic that is characterized by the features of open access and scalability.
There are currently four "Cloud Computing MOOC" lectures are available for viewing. In the first one, Professor Geoffrey Fox introduces the Indiana University Cloud MOOC. In addition to laying out the agenda, Fox provides examples of the applications that are best-suited for clouds, most notably those that are "pleasingly parallel." He highlights several science projects, for example FutureGrid, that are using cloud-based technologies, but also alludes to a lot of untapped potential.
Fox points to some interesting future possibilities. For example, it is projected that 24 billion devices will be connected to the Internet by 2020. This Internet of Things will rely on cloud for control and management functions. More and more, computing will look like a grid or mesh that touches nearly every aspect of our lives. The ability to offload computational tasks to the cloud will also enable advances in mobile computer devices and robotics.
Life science is another major vertical when it comes to cloud technology. Assistant Prof. Michael Schatz of the Simons Center for Quantitative Biology lectures on the use of cloud computing in genetic sequencing. Schatz is known for having produced some highly-sophisticated uses of MapReduce for biology applications. MapReduce was developed at Google for big data computations. It is a proprietary framework, but thanks to a 2004 paper, there are now open source implementations, most notably Hadoop.
Schatz notes that "Google every single day does the equivalent of a year's worth of sequence analysis." Traditional servers are no longer sufficient to handle such enormous data loads, but that's where parallel computing technologies like MapReduce come in. Schatz gives an overview of the benefits and challenges of Hadoop and MapReduce before delving into specific implementations.
In the next video series, Professor J. Hacker argues that there is a growing need for virtualization in HPC. He explains the motivation for this conclusion is threefold: the clock speed increases following Moore's law have ceased; hardware is going to multicore (example Intel MIC); and memory capacity of systems is increasing (512 GB on systems today). He notes that the traditional approach is to tie a single application to a single server. With 50-plus cores, this approach is no longer effective. Virtualization technology is being used to partition large scale servers to run many operating systems and VMs independent of each other.
The entire lecture is less than one hour long and provides an overview of virtualization and cloud technology in relation to HPC and then offers some practical advice for leveraging virtual HPC clusters. Hacker refers to cloud computing as the "distributed computing of this decade." He views cloud as a computing utility that provides services over a network that "pushes functionality from devices at the edge (e.g. laptops and mobile phones) to centralized servers."
In the last video series, Jonathan Klinginsmith, a PhD candidate at the School of Informatics and Computing at Indiana University, speaks about virtual clusters, MapReduce and the cloud. He covers such important questions as "Why is cloud interesting?" (hint: scalability, elasticity, utility computing).
While Klinginsmith's main research interest is machine learning and artificial intelligence, he has turned to computer science and information systems to address the problem of growing data sets. He is not alone. Researchers from nearly scientific endeavor are finding it necessary to attain some degree of computational proficiency.
Klinginsmith aims his talk primarily at these non-computer scientists. Thus his presentation focuses mainly on running applications on top of clusters rather than getting too deep into the nuts and bolts of building and operating clusters. For anyone who is just getting started with Hadoop or MapReduce, this will be a valuable resource. In under an hour, the viewer should acquire a basic understanding of MapReduce, virtual machines, clusters, cloud and virtualization.
The ever-growing complexity of scientific and engineering problems continues to pose new computational challenges. Thus, we present a novel federation model that enables end-users with the ability to aggregate heterogeneous resource scale problems. The feasibility of this federation model has been proven, in the context of the UberCloud HPC Experiment, by gathering the most comprehensive information to date on the effects of pillars on microfluid channel flow.
Large-scale, worldwide scientific initiatives rely on some cloud-based system to both coordinate efforts and manage computational efforts at peak times that cannot be contained within the combined in-house HPC resources. Last week at Google I/O, Brookhaven National Lab’s Sergey Panitkin discussed the role of the Google Compute Engine in providing computational support to ATLAS, a detector of high-energy particles at the Large Hadron Collider (LHC).
Frank Ding, engineering analysis & technical computing manager at Simpson Strong-Tie, discussed the advantages of utilizing the cloud for occasional scientific computing, identified the obstacles to doing so, and proposed workarounds to some of those obstacles.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/02/2012 | AMD | Developers today are just beginning to explore the potential of heterogeneous computing, but the potential for this new paradigm is huge. This brief article reviews how the technology might impact a range of application development areas, including client experiences and cloud-based data management. As platforms like OpenCL continue to evolve, the benefits of heterogeneous computing will become even more accessible. Use this quick article to jump-start your own thinking on heterogeneous computing.