Intel HPC Job Bank
HPC in the Cloud


Dedicated to covering high-end cloud computing
in science, industry and the datacenter

Language Flags

New Application Allows Scientists Easy Access to Important Government Data


Troy, N.Y., December 13, 2010  –  Government agencies around the world make billions of bits of raw data available to the public each day, but this data is often in difficult formats or so widely spread around the Web it is virtually unusable to the public and scientists who seek to use this valuable information in their research.

Computer scientists within the Tetherless World Research Constellation at Rensselaer Polytechnic Institute have developed an application to help solve the problem. A collaboration with scientific publisher Elsevier, the application utilizes the U.S. government data warehouse, Data.gov, to provide scientists with easy and direct access to government data sets relevant to their research.

For Rensselaer, the work is the latest example of the renowned Web Science research group's efforts to enhance the hundreds of thousands of raw government datasets available on the Data.gov website with advanced Semantic Web technology. Their work is bringing scientists and the public usable, relevant, searchable, and easy replicable datasets on topics from climate change to public safety to the federal deficit.

The new application, called US Government Dataset Search, lives on Elsevier's SciVerse websites. SciVerse provides the global scientific research community with searchable access to the world's largest source of peer-reviewed scientific content. Such access is a vital component of the modern scientific process as scientists develop new discoveries by building off the findings of previous peer-reviewed publications.

"There is a growing movement to make data and content more open and accessible on the Web," said Tetherless World Research Constellation Professor James Hendler. "Elsevier's tool-based systems show a new way for publishers to join this movement without sacrificing copyrights. It should serve as a starting place to be emulated by others around the world."

Once selected from an application gallery by SciVerse users, the new application will display a customized list of government data sets most relevant to the topics for which the scientist is searching for articles. As an example, a climatologist searching SciVerse for peer-reviewed articles on climate change would be provided with a list of all relevant government data on Data.gov ranging from the National Oceanic and Atmospheric Administration's massive collaborative weather observation networks to historical climate diaries and journals from the National Archives. This free and relevant data can then be used by the scientists to advance their research, often in totally new and unexpected ways, according to its developers.

In addition to providing direct access to raw government datasets, the application simultaneously searches the Linking Open Government Data (LOGD) portal at Rensselaer's Tetherless World Research Constellation. The portal hosts Data.gov datasets that have been converted and enhanced with Semantic Web technologies. Semantic enhancements to the datasets make them much more usable and searchable to a variety of applications, enabling multiple data sets to be linked even when the underlying structure or format of each is different. Completely unseen to the average user, this semantic technology resides below the surface of the Web, augmenting rather than replacing traditional search engines. Computer scientists and developers can also take the semantic coding and utilize and enhance it independently.

"When we enhance data with semantics, we make it much more usable to a researcher than raw data," said the project lead for the application and Rensselaer research engineer John Erickson. "Through this application and others developed within the Tetherless World, we are empowering researchers with new tools for the basic practice of science by introducing semantics into the exploration of data."

Erickson was joined in the research by research scientist Li Ding, graduate student Dominic DiFranzo, as well the professors who lead the research group, Deborah McGuinness, Hendler, and Peter Fox.

"Using Semantic Web technologies, Tetherless World Research Constellation at Rensselaer has built innovative solutions leveraging open government datasets from Data.gov," said Vice President of Product Management for Elsevier's Application Marketplace and Developer Network Rafael Sidi. "We are delighted to partner with them to bring government datasets to our users. The Dataset Search application built by Rensselaer illustrates how collaboration with the research community can lead to innovative applications that enhance scientists' productivity."

-----

Source: Renassler Polytechnic Institute

This Week's Headlines


Most Read Features

Most Read Around the Web

Most Read This Just In

Most Read Blogs

Intel

Feature Articles

SLA-Aware Scheduling and Virtual Efficiency

Researchers from the Suddhananda Engineering and Research Centre in Bhubaneswar, India developed a job scheduling system, which they call Service Level Agreement (SLA) scheduling, that is meant to achieve acceptable methods of resource provisioning similar to that of potential in-house systems. They combined that with an on-demand resource provisioner to ensure utilization optimization of virtual machines.
Read more...

CloudSigma CEO Elaborates on Science Cloud

Experimental scientific HPC applications are continually being moved to the cloud, as covered here in several capacities over the last couple of weeks. Included in that rundown, Co-founder and CEO of CloudSigma Robert Jenkins penned an article for HPC in the Cloud where he discussed the emergence of cloud technologies to supplement research capabilities of big scientific initiatives like CERN and ESA (the European Space Agency)...
Read more...

Examining Questions of Virtualization and Security in the Cloud

When considering moving excess or experimental HPC applications to a cloud environment, there will always be obstacles. Were that not the case, the cost effectiveness of cloud-based HPC would rule the high performance landscape. Jonathan Stewart Ward and Adam Barker of the University of St. Andrews produced an intriguing report on the state of cloud computing, paying a significant amount of attention to the problems facing cloud computing.
Read more...

Short Takes

Hacking into the N-Queens Problem with Virtualization

Jun 19, 2013 | Ruan Pethiyagoda, Cameron Boehmer, John S. Dvorak, and Tim Sze, trained at San Francisco’s Hack Reactor, an institute designed for intense fast paced learning of programming, put together a program based on the N-Queens algorithm designed by the University of Cambridge’s Martin Richards, and modified it to run in parallel across multiple machines.
Read more...

Datapipe and Verne Global's Green Cloud

Jun 17, 2013 | With that in mind, Datapipe hopes to establish themselves as a green-savvy HPC cloud provider with their recently announced Stratosphere platform. Datapipe markets Stratosphere as a green HPC cloud service and in doing so partnering with Verne Global and their Icelandic datacenter, which is known for its propensity in green computing.
Read more...

IBM's Guide to Cloud Based HPC

Jun 12, 2013 | Cloud computing is gaining ground in utilization by mid-sized institutions who are looking to expand their experimental high performance computing resources. As such, IBM released what they call Redbooks, in part to assist institutions’ movement of high performance computing applications to the cloud.
Read more...

OpenStack and the SDSC Research Cloud

Jun 06, 2013 | The San Diego Supercomputer Center launched a public cloud system for universities in the area designed specifically to run on commodity hardware with high performance solid-state drives. The center, which currently holds 5.5 PB of raw storage, is open to educational and research users in the University of California.
Read more...

Sponsored Whitepapers

Best Practices in Big Data Storage

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Exploring the Potential of Heterogeneous Computing

04/02/2012 | AMD | Developers today are just beginning to explore the potential of heterogeneous computing, but the potential for this new paradigm is huge. This brief article reviews how the technology might impact a range of application development areas, including client experiences and cloud-based data management. As platforms like OpenCL continue to evolve, the benefits of heterogeneous computing will become even more accessible. Use this quick article to jump-start your own thinking on heterogeneous computing.

Sponsored Multimedias

Newsletters

Stay informed! Subscribe to HPC in the Cloud email Newsletters.

HPC in the Cloud Update
HPCwire Weekly Update
Digital Manufacturing Report
Datanami
HPCwire Conferences & Events
Job Bank
HPCwire Product Showcases


HPC Wall Street 2013

HPC Job Bank


Featured Events




  • November 17, 2013 - November 22, 2013
    SC'13
    Denver, CO
    United States


HPC in the Cloud Conferences & Events