October 05, 2011
SAN DIEGO, Calif., October 5 -- Successfully managing, preserving, and sharing large amounts of digitally-based data has become more of an economic challenge than a technical one, as researchers must meet a new National Science Foundation (NSF) policy requiring them to submit a data management plan as part of their funding requests, said Michael Norman, director the San Diego Supercomputer Center (SDSC) at the University of California, San Diego.
“Data management has become an even more challenging discipline than high-performance computing,” Norman said during remarks delivered at the 50th anniversary meeting this week of the Association of Independent Research Institutes (AIRI) in La Jolla, California. “The question used to be ‘what’s the essential technology?’ but is now ‘what’s the sustainable cost model?’”
The revised NSF policy, which went into effect early this year, asks researchers to submit a two-page data management plan on how they will archive and share their data. According to the policy, “investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants.”
Norman said this revised policy was one of the key drivers that shaped SDSC’s planning for a new Web-based, 100 percent disk data storage system called the SDSC Cloud, which was announced late last month. Believed to be the largest academic-based cloud storage system in the U.S. to-date, the SDSC Cloud is primarily designed for researchers, students, and other academics requiring stable, secure, and cost-effective storage and sharing of digital information, including extremely large data sets. While SDSC’s primary motivation to create its own data cloud was to provide an affordable resource for UC San Diego researchers to preserve and share their data, the resource is being made available to all academic researchers.
“Whatever we want to call it – the data deluge, the data tsunami, or the data explosion – the fact is that we are now in the era of data-intensive computing and SDSC has been working to solve a major challenge for a whole collection of scientific disciplines: the cost of data storage and sharing,” he said.
Standard “on-demand” storage costs for UC researchers on the SDSC Cloud start at only $3.25 a month per 100GB (gigabytes) of storage. A “condo” option, which allows users to make cost-effective long term investment in hardware that becomes part of the SDSC Cloud, is also available. Full details can be found at https://cloud.sdsc.edu/hp/index.php.
Historically, data management has been a project-related cost for major research facilities, which traditionally have been funded to cover the cost of preservation and access, said Norman. “The NSF’s Office of Cyberinfrastructure (OCI), historically focused on high-performance computing, while data management was secondary and consisted of archiving in a tape-based silo,” he said.
“We now call that a ‘bit cemetery’ because use and retrieval rates were so low,” Norman told AIRI attendees. “In fact, many researchers were writing their data once and reading it never. That’s not data management – that’s data burial, and not what active researchers need.”
Over the last few years, however, the research infrastructure for data-enabled science has been widely discussed at the NSF, leading to the new data management and sharing policy. The document that is charting the course is called Cyberinfrastructure Framework for the 21st Century Science and Engineering (CIF21).
“This document works across all the NSF Directorates and finally makes data-enabled science a first-class citizen,” said Norman. “And during the last year and a half, the NSF has been moving from vision to policy to action.”
Still, Norman warned that researchers will likely never be able to afford to save all their data, and should focus on saving and sharing only what is intellectually valuable, while creating a sustainable business model. He referenced a KRDS report (Keeping Research Data Safe) that said the cost of long-term data stewardship is as much as six times the cost of bit preservation. “So it’s not the costs of storing the bits – it’s the cost of hosting the hardware, all of the administration costs, and the costs of migrating the data.
In late 2007, the Blue Ribbon Task Force on Sustainable Digital Preservation and Access was commissioned by the NSF and The Andrew W. Mellon Foundation to study the economic sustainability challenge of digital preservation and access. The Task Force, which worked in partnership with the Library of Congress, the Joint Information Systems Committee of the United Kingdom, the Council on Library and Information Resources, and the National Archives and Records Administration, published both an Interim and Final report, which can be found at http://brtf.sdsc.edu/.
“SDSC, like other data resource centers, has a long-term obligation to steward that data, and maintenance costs are needed to keep that data persistent,” said Norman. “It’s like real estate. You can either rent out your rooms or sell your condos, but if you’re not recovering costs as a landlord, you go out of business.”
As an Organized Research Unit of UC San Diego, SDSC is considered a leader in data-intensive computing and cyberinfrastructure, providing resources, services, and expertise to the national research community including industry and academia. Cyberinfrastructure refers to an accessible and integrated network of computer-based resources and expertise, focused on accelerating scientific inquiry and discovery. SDSC supports hundreds of multidisciplinary programs spanning a wide variety of domains, from earth sciences to biology to astrophysics to bioinformatics and health IT. With its two newest supercomputer systems, Trestles and the soon-to-be-launched Gordon, SDSC is a partner in XSEDE (Extreme Science and Engineering Discovery Environment), the most advanced collection of integrated digital resources and services in the world.
Source: San Diego Supercomputing Center
Researchers from the Suddhananda Engineering and Research Centre in Bhubaneswar, India developed a job scheduling system, which they call Service Level Agreement (SLA) scheduling, that is meant to achieve acceptable methods of resource provisioning similar to that of potential in-house systems. They combined that with an on-demand resource provisioner to ensure utilization optimization of virtual machines.
Experimental scientific HPC applications are continually being moved to the cloud, as covered here in several capacities over the last couple of weeks. Included in that rundown, Co-founder and CEO of CloudSigma Robert Jenkins penned an article for HPC in the Cloud where he discussed the emergence of cloud technologies to supplement research capabilities of big scientific initiatives like CERN and ESA (the European Space Agency)...
When considering moving excess or experimental HPC applications to a cloud environment, there will always be obstacles. Were that not the case, the cost effectiveness of cloud-based HPC would rule the high performance landscape. Jonathan Stewart Ward and Adam Barker of the University of St. Andrews produced an intriguing report on the state of cloud computing, paying a significant amount of attention to the problems facing cloud computing.
Jun 19, 2013 |
Ruan Pethiyagoda, Cameron Boehmer, John S. Dvorak, and Tim Sze, trained at San Francisco’s Hack Reactor, an institute designed for intense fast paced learning of programming, put together a program based on the N-Queens algorithm designed by the University of Cambridge’s Martin Richards, and modified it to run in parallel across multiple machines.
Jun 17, 2013 |
With that in mind, Datapipe hopes to establish themselves as a green-savvy HPC cloud provider with their recently announced Stratosphere platform. Datapipe markets Stratosphere as a green HPC cloud service and in doing so partnering with Verne Global and their Icelandic datacenter, which is known for its propensity in green computing.
Jun 12, 2013 |
Cloud computing is gaining ground in utilization by mid-sized institutions who are looking to expand their experimental high performance computing resources. As such, IBM released what they call Redbooks, in part to assist institutions’ movement of high performance computing applications to the cloud.
Jun 06, 2013 |
The San Diego Supercomputer Center launched a public cloud system for universities in the area designed specifically to run on commodity hardware with high performance solid-state drives. The center, which currently holds 5.5 PB of raw storage, is open to educational and research users in the University of California.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/02/2012 | AMD | Developers today are just beginning to explore the potential of heterogeneous computing, but the potential for this new paradigm is huge. This brief article reviews how the technology might impact a range of application development areas, including client experiences and cloud-based data management. As platforms like OpenCL continue to evolve, the benefits of heterogeneous computing will become even more accessible. Use this quick article to jump-start your own thinking on heterogeneous computing.