September 10, 2012
Sept. 10 — Researchers from North Carolina State University have developed a new software tool to prevent performance disruptions in cloud computing systems by automatically identifying and responding to potential anomalies before they can develop into problems.
Cloud computing enables users to create multiple "virtual machines" that operate independently, even though they are all operating on one large computing platform. However, this approach can cause performance issues when a software bug, or other problem, in one virtual machine disrupts the entire cloud.
Now researchers have designed software that looks at the amount of memory being used, network traffic, CPU usage and other system-level data in a cloud computing infrastructure to develop a definition of the wide range of behaviors that can be considered "normal." CPU usage is the amount of computing power being used at any given time. The program defines normal behavior for every virtual machine in the cloud, and can then look for deviations and predict anomalies that could affect the system's ability to provide service to users.
One advantage of this approach is that it does not require users to provide so-called "training data" about what constitutes abnormal behavior, which is important because training data are often difficult to obtain in production cloud systems. Moreover, this approach is also able to predict anomalies that have never been seen before.
If the program spots a virtual machine that is deviating from its normal behavior, it runs a "black box" diagnostic that can determine which metrics – such as CPU usage – may be affected, without exposing user data. This metric data can then be used to trigger the appropriate prevention system, which will address the deviation and prevent it from becoming a problem.
"If we can identify the initial deviation and launch an automatic response, we can not only prevent a major disturbance, but actually prevent the user from even experiencing any change in system performance," says Dr. Helen Gu, an assistant professor of computer science at NC State and co-author of a paper describing the research. "Also, it's important to note that this program does not access any user's individual information. We're looking only at system-level behavior."
The program is also lightweight, meaning it does not use much of the cloud's computing power to operate. It is able to collect the initial data and define normal behavior much faster than existing approaches. Once it is up and running, it uses less than 1 percent of the CPU load and 16 megabytes of memory.
In benchmark testing, the program identified up to 98 percent of anomalies, which is much higher than the rate found in existing approaches. "It also had a 1.7 percent rate of false positives, meaning it triggered very few false alarms," Gu says. "And because the false alarms resulted in automatic responses, which are easily reversible, the cost of the false alarms is negligible."
Gu says her team's next step is to incorporate more detailed "white box" diagnostic tools into the software, so they can identify the software bugs causing any anomalies and correct them.
The paper, "UBL: Unsupervised Behavior Learning for Predicting Performance Anomalies in Virtualized Cloud Systems," was co-authored by NC State Ph.D. students Daniel Dean and Hiep Nguyen. The paper will be presented Sept. 20 at the 9th Annual ACM International Conference on Autonomic Computing in San Jose, Calif. The research was supported by the National Science Foundation, the U.S. Army Research Office, an IBM faculty award and a Google research award.
"UBL: Unsupervised Behavior Learning for Predicting Performance Anomalies in Virtualized Cloud Systems"
Authors: Daniel J. Dean, Hiep Nguyen and Xiaohui Gu, North Carolina State University
Presented: Sept. 20 at the 9th Annual ACM International Conference on Autonomic Computing in San Jose, Calif.
Abstract: Infrastructure-as-a-Service (IaaS) clouds are prone to performance anomalies due to their complex nature. Although previous work has shown the effectiveness of using statistical learning to detect performance anomalies, existing schemes often assume labeled training data, which requires significant human effort and can only handle previously known anomalies. We present an Unsupervised Behavior Learning (UBL) system for IaaS cloud computing infrastructures. UBL leverages Self-Organizing Maps to capture emergent system behaviors and predict unknown anomalies. For scalability, UBL uses residual resources in the cloud infrastructure for behavior learning and anomaly prediction with little add-on cost. We have implemented a prototype of the UBL system on top of the Xen platform and conducted extensive experiments using a range of distributed systems. Our results show that UBL can predict performance anomalies with high accuracy and achieve sufficient lead time for automatic anomaly prevention. UBL supports large-scale infrastructure-wide behavior learning with negligible overhead.
Source: North Carolina State University
Researchers from the Suddhananda Engineering and Research Centre in Bhubaneswar, India developed a job scheduling system, which they call Service Level Agreement (SLA) scheduling, that is meant to achieve acceptable methods of resource provisioning similar to that of potential in-house systems. They combined that with an on-demand resource provisioner to ensure utilization optimization of virtual machines.
Experimental scientific HPC applications are continually being moved to the cloud, as covered here in several capacities over the last couple of weeks. Included in that rundown, Co-founder and CEO of CloudSigma Robert Jenkins penned an article for HPC in the Cloud where he discussed the emergence of cloud technologies to supplement research capabilities of big scientific initiatives like CERN and ESA (the European Space Agency)...
When considering moving excess or experimental HPC applications to a cloud environment, there will always be obstacles. Were that not the case, the cost effectiveness of cloud-based HPC would rule the high performance landscape. Jonathan Stewart Ward and Adam Barker of the University of St. Andrews produced an intriguing report on the state of cloud computing, paying a significant amount of attention to the problems facing cloud computing.
Jun 19, 2013 |
Ruan Pethiyagoda, Cameron Boehmer, John S. Dvorak, and Tim Sze, trained at San Francisco’s Hack Reactor, an institute designed for intense fast paced learning of programming, put together a program based on the N-Queens algorithm designed by the University of Cambridge’s Martin Richards, and modified it to run in parallel across multiple machines.
Jun 17, 2013 |
With that in mind, Datapipe hopes to establish themselves as a green-savvy HPC cloud provider with their recently announced Stratosphere platform. Datapipe markets Stratosphere as a green HPC cloud service and in doing so partnering with Verne Global and their Icelandic datacenter, which is known for its propensity in green computing.
Jun 12, 2013 |
Cloud computing is gaining ground in utilization by mid-sized institutions who are looking to expand their experimental high performance computing resources. As such, IBM released what they call Redbooks, in part to assist institutions’ movement of high performance computing applications to the cloud.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/02/2012 | AMD | Developers today are just beginning to explore the potential of heterogeneous computing, but the potential for this new paradigm is huge. This brief article reviews how the technology might impact a range of application development areas, including client experiences and cloud-based data management. As platforms like OpenCL continue to evolve, the benefits of heterogeneous computing will become even more accessible. Use this quick article to jump-start your own thinking on heterogeneous computing.