February 22, 2011
While the economic case for cloud computing is compelling, the security challenges it poses are equally striking. Authors Yanpei Chen and Randy H. Katz, both from the Computer Sciences Divsion; EECS Department at the University of California, Berkeley, survey the full space of cloud-computing security issues, attempting to separate justified concerns from possible over-reactions. The authors examine contemporary and historical perspectives from industry, academia, government, and “black hats”.
While many cloud computing security problems have historically come up in one way or another, a great deal of additional research is needed to arrive at satisfactory solutions today.
From our combined contemporary and historical analysis, we distill novel aspects of the cloud computing threat model, and identify mutual auditability as a key research challenge that has yet to receive attention. We hope to advance discussions of cloud computing security beyond confusion, and to some degree fear of the unknown.
For the rest of the feature, we will use the term “cloud computing” per the definition advanced by the U.S. National Institute of Standards and Technology (NIST). According to this definition, key characteristics of cloud computing include on-demand self service, broad network access, resource pooling, rapid elasticity, and metered service similar to a utility.
There are also three main service models—software as a service (SaaS), in which the cloud user controls only application configurations; platform as a service (PaaS), in which the cloud user also controls the hosting environments; and infrastructure as a service (IaaS), in which the cloud user controls everything except the datacenter infrastructure. Further, there are four main deployment models: public clouds, accessible to the general public or a large industry group; community clouds, serving several organizations; private clouds, limited to a single organization; and hybrid clouds, a mix of the others. Ongoing cloud computing programs and standardizing efforts from the U.S. and EU governments appear to be converging on this definition.
Ongoing Threats to Secure Clouds
Arguably many of the incidents described as “cloud security" reflect just traditional web application and data-hosting problems. In incidents related to the industry, many underlying issues remain well-established challenges such as phishing, downtime, data loss, password weaknesses, and compromised hosts running botnets.
A recent Twitter phishing incident provides a typical example of a traditional web security issue now miscast as a cloud computing issue. Also, recent Amazon botnet incidents highlight that servers in cloud computing currently operate as (in)securely as servers in traditional enterprise datacenters.
In the research community, cloud computing security is seeing the creation dedicated forums such as the ACM Cloud Computing Security Workshop, as well as dedicated tracks and tutorials at major security conferences such as the ACM Conference on Computer and Communications Security (CCS). To date, most papers published on cloud security reflect continuations of established lines of security research, such as web security, data outsourcing and assurance, and virtual machines. The field primarily manifests as a blend of existing topics, although papers focused exclusively on cloud computing security are emerging.
In the “black hat” community, emerging cloud computing exploits also reflect extensions of existing vulnerabilities, with several examples from a dedicated cloud security track at Black Hat USA 2009. For example, username brute forcers and Debian OpenSSL exploit tools run in the cloud as they do in botnets. Social engineering attacks remain effective—one exploit tries to convince Amazon Elastic Compute Cloud (EC2) users to run malicious virtual machine images simply by giving the image an official-sounding name such as “fedora_core”. Virtual machine vulnerabilities also remain an issue, as does weak random number generation due to lack of sufficient entropy.
Old Threats Amplified
Some established vulnerabilities would be significantly amplified in cloud computing and deserve separate consideration.
For black hats, cloud computing offers a potentially more reliable alternative to botnets. While the recent brute-forcer presentation claimed that using the cloud is presently more expensive than using botnets, Amazon EC2 recently added cluster compute and GPU instances to target HPC users, which could drastically shift the cloud-botnet cost balance. Note that the prices can be quite low. We estimate that some exploits amortize to as low as $2 per exploit. That said, botnets in the cloud face two complications.
Once found, “cloud bots” are easier to shut down than traditional botnets. However, the transient nature of cloud computing services makes cloud bots hard to detect. One could potentially run botnet command and control component using the reliable but transient cloud computing services, while leaving the bots themselves outside the cloud. In other words, cloud computing could become a botnet capability amplifier.
Also, because cloud computing introduces a shared resource environment, unexpected side channels (passively observing information) and covert channels (actively sending data) can arise. Traditional co-location services faced similar problems. In public, community, and hybrid clouds, the problem becomes much more challenging due to the transient instead of long-lease nature of services. Having dedicated machines only partly solve the problem, since the shared network may still yield side channels and covert channels.
One noteworthy research effort developed methods to place an attacker virtual machine (VM) on the same physical machine as a targeted VM, establish a side channel between two VMs on the same physical machine, and conduct a SSH keystroke timing attack to steal passwords. Additional research would establish what would be an acceptable level of isolation to limit the risk of side channels and covert channels.
Another issue comes from reputation fate-sharing, which has mixed consequences. On the plus side, cloud users can potentially benefit from a concentration of security expertise at major cloud providers, ensuring that the entire ecosystem employs security best practices. On the other hand, a single subverter can disrupt many users. For example, spammers subverted EC2 and caused Spamhaus to blacklist a large fraction of EC2’s IP addresses, causing major service disruptions. Thereafter, if someone wants to send email from EC2, they must fill out a form), provide a list of (static) EC2 addresses to authorize for sending, and document their use-case. Upon approval, Amazon forwards the EC2 addresses to Spamhaus for whitelisting. The issues are more severe than traditional fate-sharing, which describes acceptable correlated failure scenarios. In cloud computing, clearly the reputation of different cloud users should not be correlated.
Lessons from Time-Sharing Systems
While cloud computing has taken off as providing today’s computing utilities, the concept of the “computing utility” originated as early as 1965 with time sharing systems such as Multics.
Thus, it comes as no surprise that a survey of historical work should yield counterparts to contemporary cloud security problems. Multics highlights several concerns worth re-emphasizing today.
A striking aspect of Multics was its security design principles. First, Multics used permission-based protection mechanisms, rather than exclusion-based. Every access to every object checked current authority. Second, Multics embodied a form of Kerckhoffs’ principle, which maintains an open design for its mechanisms, with only the protection keys secret. Third, the design explicitly recognized the importance of human usability and implemented security mechanisms accordingly. These principles remain relevant today. For example, the EC2 spam incident revealed the problems of an exclusion-based design, and the adopted solution represents a permission-based system. Similarly, insufficient attention to usability issues facilitated the social engineering attacks. In contrast, it is somewhat unrealistic to expect completely open designs today, since both cloud providers and users would want to restrict access to their system design to preserve their competitive advantage.
Multics security design also recognized the importance of preventing system administrators from becoming decision bottlenecks. Otherwise, users will bypass administrators by habit (in modern terminology, a form of “satisficing”) and compromise protection mechanisms. The contemporary counterpart is again the Amazon EC2 spam blacklist incident, where the solution imposed email limits that require administrator intervention to increase; this mechanism may become unscalable if EC2 users who wish to send email significantly increase.
Multics offered a “spectrum of security” by allowing users to build subsystems that reflect a range of different security needs. Today, different cloud computing users also have different security needs, and a good design would offer a choice of security levels and subsystem boundaries with reasonable defaults. We believe this flexibility could prove to be a major improvement if done well. One possible approach would be to formulate the security primitives around defending different stakeholders against different threat models. An additional feature might support “plug-and-play” services readily compliant with common standards such as those of HIPAA or Payment Card Industry.
Despite these parallel concerns, we note that a number of Multics security mechanisms, state-of-the-art at the time, remain prevalent today even though they do not work as well for modern computing environments. These mechanisms include access control lists (ACLs), machine-generated passwords, and weak encryption of the password file. Thus, while historical work can provide valuable insights into modern cloud security issues, naturally we must temper our assessment with due consideration to how computing has significantly changed over time.
Intended Isolation vs. Complexity
We find early work on virtual machine monitors (VMMs) noteworthy because different kinds of virtualization constitute a major facet of cloud computing. Here, we review the original argument of why VMMs are more secure than ordinary computing systems to highlight that the core assumptions of this argument no longer hold.
The original secure VMM argument has several parts. First, lower levels of multiprogramming (i.e. concurrent execution) lead to lower risks of security failures. Second, even if the level of multiprogramming is the same, VMMs are more secure because they are simpler and easier to debug. Third, for a guest OS that runs on a VMM that in turn runs on bare metal, security violation occurs only when there is simultaneous security failure in both the guest OS and the VMM. Thus, a VMM running k guest OSs with each OS running n programs should experience security failures fails much less frequently than an OS running k × n programs. Fourth, the failure of each program is independent, and hence the failure probability is multiplicative.
Overall, any one program on a VMM running k guest OSs with each OS running n programs should experiences failures much less frequently than the same program on an OS with k × n programs. The multiplication effect amplifies the reduction in each failure probability.
The argument makes three crucial assumptions. First, VMMs are simple. Second, guest OSs have a lower multiprogramming level. Third, the VMM and guest OS have independent failures. Modern VMMs violates all three. Modern VMMs are no longer “small" in an absolute sense. For example, Xen has ~150,000 lines of code, considerably smaller than ~12 million lines for Linux 2.6.32, but comparable to ~180,000 lines of code for Linux 1.0, which was already a feature rich operating system. Additionally, users use guest OSs the same way they would use a native OS, undermining the assumption that guest OSs have lower multiprogramming levels. Further, some recent VMMs have the guest OS running on a VMM that in turn runs on a host OS. Clearly, the VMM is as (in)secure as the host OS, and the host OS significantly enlarges the trusted code base.
As cloud computing providers continue to offer finer granularity and hierarchy of virtualization (e.g. virtual machines, networks, racks, datacenters, clouds, applications), it becomes crucial to verify that each additional level of intended isolation does not undermine security elsewhere. The VMM discussion here highlights one example of the tradeoff.
New Threat Models
Combining the contemporary and historical viewpoints, we arrive at the position that while many cloud computing security problems have historically come up one way or another, we need much additional research to arrive at satisfactory solutions today. We argue that the cloud computing threat model includes several novel elements, grouped into two major categories.
New Assets Worth Protecting
Data and software are not the only assets worth protecting. Activity patterns also need to be protected. Sharing of resources means that the activity of one cloud user might appear visible to other cloud users using the same resources, potentially leading to the construction of covert and side channels. Activity patterns may also themselves constitute confidential business information, if divulging them could lead to reverse-engineering of customer base, revenue flows, and the like.
Business reputation also merits protection, a concern for both cloud providers and cloud users. When using shared resources, it becomes harder to attribute malicious or unethical activity. Even if there are ways to clearly identify the culprits and attribute blame, bad publicity still creates uncertainty that can tarnish a long established reputation.
Cloud computing inevitably has a longer trust chain. For example, the application end-user could potentially use an application built by an SaaS provider, with the application running on a platform offered by a PaaS provider, which in turn runs on the infrastructure of an IaaS provider. While to our knowledge this extreme example cannot occur in practice today due to a lack of sufficient cross provider APIs, it illustrates that with any model of cloud computing, stakeholders’ can find themselves with relationships considerably more complicated than simply a provider-user relationship. Some stakeholders could be subverters, who maintain the appearance of a regular cloud user or cloud provider, but in fact perpetrate cybercrime or other cyber attacks. Examples include cloud users who run brute forcers, botnets, or spam campaigns from the cloud; or cloud providers who scan cloud users’ data and sell confidential information to the highest bidder.
A different, far more dangerous kind of subverters could be doing, say, nuclear simulations and ballistic computations in public clouds, or doing DNA analysis to attempt to create a biological super-virus, or just high-school students with operating a password brute-forcing service. It would be a major challenge to identify these kinds of subverters without an intrusive examination of user-supplied code.
Furthermore, competitive businesses can operate within the same cloud computing ecosystem: using the same cloud, or ending up in a provider-user relationship. This can lead to strong conflicts of interest, generate additional motives to access the confidential information of a competitor, thus creating another kind of potential adversary.
Finally, a subtle difficulty with understanding cloud computing threats arises from potentially inaccurate mental models of cloud computing as an always-available service. This viewpoint—which arises from the general paradigm of drawing upon a commodity service with much the flavor of a utility—can create a false sense of security, leading to inadequate security good practices, e.g. omitting regular data backups across multiple cloud providers. As such, we could find that cloud users become their own adversaries, leading to more severe consequences when clouds do fail.
Existing contemporary works already explore many pertinent research topics. One important area that has yet to receive much attention is mutual auditability.
Auditability is already a requirement for health care, banking, and similar systems. What is new to cloud computing is mutual auditability. Because the system includes stakeholders with potentially conflicting interests, cloud users and providers both need reassurance that the other operates in a fashion that is both benign and correct. In other words, trusted stakeholders prove that they are in fact trustworthy. Also, by auditability we mean more than just billing accuracy. It also encompasses the ability to detect and attribute malicious activity.
Such mutual auditability can have major benefits. First and foremost, it enables the attribution of blame. This capability acts as a deterrent in itself - attackers can no longer escape detection. Even if they escape identification, their malicious software would be removed. In incidence response, having real time mutual audit information would allow both providers and users to take timely damage containment measures.
Without the “mutual” part of mutual auditability, the burden to act rests entirely with the provider. Also, in search and seizure incidents, cloud providers can demonstrate to law enforcement that they have turned over all relevant evidence, and prove to users that they turned over only the necessary evidence and nothing more. Without the “mutual” part of mutual auditability, users would have a hard time verifying that cloud providers turned over only the necessary evidence.
Data from mutual audits can also trigger legal and contractual processes built into cloud computing user-provider agreements. Arguably, SLAs and compliance requirements would be toothless if stakeholders cannot demonstrate that the SLAs and requirements are violated. Enhanced mutual audit capabilities would facilitate the creation of new kinds of SLAs. Whether these SLAs would be created would then depend on engineering feasibility of meeting the requirements, rather than being held up by the inability to verify violations.
One complication in implementing mutual auditability is that the auditor fundamentally needs to be an independent third party, since the two primary parties in a audit relationship are potential adversaries. Note that this is also the reason that naive encryption fails to solve the problem, since the endpoints of the encrypted channel do not trust each other. A third-party auditor requires a setup quite different than today’s practice, in which cloud providers record and maintain all the audit logs.
Recent work notes that implementing thorough auditing is not a simple matter even for straightforward web services. In cloud computing, it remains an open challenge to achieve thorough mutual auditing without impairing performance and violating the privacy of all stakeholders. Not all cloud users require the most sophisticated form of mutual auditability, but achieving some measure of it robustly would constitute an important security advance.
We find it interesting to contemplate whether security would become a significant cloud computing business differentiator. To some degree, the economics of cloud computing security is yet to play itself out. If security is indeed a business differentiator, it would be an unusual one in that adequately addressing security concerns may not create a competitive advantage, but address security poorly will result in a significant disadvantage.
That said, the history of commercial Internet offerings repeatedly shows that time-to-market and undercutting prices can greatly sway customers even in the absence of sound security underpinnings.
We believe the situation would be somewhat different this time around, however, given that much of cloud computing targets customers who have extensive business reasons (and scars from the past) leading them to treat security as an elevated priority. Nonetheless, history also teaches us that developing security architectures early in the process can pay off greatly as systems evolve and accrue more disparate security requirements. The challenge is to achieve some measure of adequate and affordable security without undermining the economic advantages of cloud computing.
To summarize, we believe many established threats would translate to cloud computing, including phishing, downtime, data loss, password weakness, etc. However, cloud computing would significantly amplify other threats, such as botnets, side channel and covert channels, and reputation fate sharing.
Historical work suggests many starting points for contemporary solutions, including applying good design principles, prevent administrator bottlenecks, offer a range of security options, quantifying side channel and covert channel bit rates, and examine the tradeoff between isolation and complexity. Novel aspects of the cloud computing threat model include new assets worth protecting, as well as new adversaries. Mutual auditability would be a major advance if achieved robustly. The economic considerations of cloud computing security are yet to fully play out.
We conclude our discussion by highlighting that breaking real clouds makes them stronger. Such studies involve obvious ethical issues, but provide much more compelling results than breaking hypothetical clouds. For example, the recent EC2 side channel study in CCS 2009 triggered a highly visible security effort by Amazon Web Services, and serves as a model for similar future work in academia. Such coupled attack and defense approaches serve as a model for potential government cloud security projects today and internal adversarial efforts to discover vulnerabilities at cloud providers. Black-hat perspectives would continue to provide valuable insight, and research partnerships between different types of stakeholders will likely prove very beneficial to advancing the field.
About the Authors
Yanpei Chen is a fourth year Ph.D. student working with Professor Randy Katz at UC Berkeley . He received a B.S. and M.S. degrees from the University of California, Berkeley. His main research focus is data center workload characterization and performance improvements, with a side interest in network security. He is a National Science Foundation Graduate Research Fellow, and a member of the Reliable, Adaptable, and Distributed Systems Laboratory (RAD Lab).
Randy Howard Katz received his M.S. and Ph.D. degrees from the University of California, Berkeley. He joined the Berkeley faculty in 1983, where since 1996 he has been the United Microelectronics Corporation Distinguished Professor in Electrical Engineering and Computer Science. He is a Fellow of the ACM and the IEEE, and a member of the National Academy of Engineering and the American Academy of Arts and Sciences. His current research interests are the architecture of Internet Datacenters, particularly frameworks for datacenter-scale instrumentation and resource management. He is a member of the Reliable, Adaptable, and Distributed Systems Laboratory (RAD Lab), and is a co-author of the paper "Above the Clouds: A Berkeley View of Cloud Computing."
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 10, 2013 |
Australian visual effects company, Animal Logic, is considering a move to the public cloud.
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
May 08, 2013 |
For engineers looking to leverage high-performance computing, the accessibility of a cloud-based approach is a powerful draw, but there are costs that may not be readily apparent.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/02/2012 | AMD | Developers today are just beginning to explore the potential of heterogeneous computing, but the potential for this new paradigm is huge. This brief article reviews how the technology might impact a range of application development areas, including client experiences and cloud-based data management. As platforms like OpenCL continue to evolve, the benefits of heterogeneous computing will become even more accessible. Use this quick article to jump-start your own thinking on heterogeneous computing.