December 04, 2008
SEATTLE, Dec. 4 -- Amazon Web Services LLC (AWS), a subsidiary of Amazon.com Inc., today launched “Public Data Sets on AWS,” providing access to a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all of AWS services, users pay only for the compute and storage they consume with their own applications. Data sets already available include various U.S. Census databases from the U.S. Census Bureau, 3-D chemical structures provided by Indiana University, and an annotated form of the Human Genome from Ensembl. More data sets will be available soon, including a wide range of economic statistics from the Bureau of Economic Analysis and additional scientific data sets.
Previously, large data sets such as the Human Genome and U.S. Census data required many hours to locate, download and customize. Now, anyone can access these large data sets from their Amazon Elastic Compute Cloud (Amazon EC2) instances and start computing on the data within minutes. By growing the number of people with access to important and useful data, and making it easy to compute on that data with cost-efficient services such as Amazon EC2, AWS hopes to fuel innovation and further accelerate the pace of new discoveries.
“For over five years, AWS has been working to lower the barriers to entry, level the playing field, and make it possible for our customers to be successful based on their ideas, not on their resources,” said Adam Selipsky, vice president of product management and developer relations for Amazon Web Services. “Public Data Sets on AWS is the latest of these efforts, and we can’t wait to see the discoveries and innovations that could stem from this ecosystem.”
Select public data sets are hosted on Amazon EC2 for free as Amazon Elastic Block Store (Amazon EBS) snapshots. Amazon EC2 customers can access this data by creating their own personal Amazon EBS volumes, using the public data set snapshots as a starting point. They can then access, modify and perform computation on these volumes directly using their Amazon EC2 instances and just pay for the compute and storage resources that they use. If available, researchers can also use pre-configured Amazon Machine Images (AMIs) with tools like Inquiry by BioTeam to perform their analysis.
“Public Data Sets on AWS will enable me and many of my colleagues to collaborate with each other by sharing our commonly used data sets, research environments and tools,” said Dr. Peter Tonellato from the Harvard Medical School. “We can set up a controlled environment in minutes, run our computational analysis for a couple of hours, and shut down the environment. Our results are completely repeatable. I only pay for the compute time I use, and more importantly I can spend more time focusing on research, not downloading and setting up computational infrastructure.”
"Bioinformatics is a hugely exciting area which is providing much insight into our understanding of biology and, particularly, the genetic basis of many human diseases like cancer and diabetes. The genome is a complex thing, however; it presents us with a potential source of invaluable information but also with great challenges in how to store, analyze and annotate it, and how to make both the raw genomic information and our annotations available to as many people as possible,” said Dr. Glenn Proctor, Ensembl software coordinator at the EBI. “Ensembl's approach has always been to try and lower the barriers to entry so that a researcher using a desktop PC in a lab or a laptop in an airport departure lounge has access to high-quality, up to the minute genetic information that they can use in their work. Amazon EC2 allows us to go even further and make all our data available in a robust, scalable and flexible form that anyone with an AWS account can use."
For more information about the Public Data Sets on AWS, to get started using a data set, or to submit a data set, visit aws.amazon.com/publicdatasets.
Amazon.com Inc., a Fortune 500 company based in Seattle, opened on the World Wide Web in July 1995 and today offers Earth's Biggest Selection. Amazon.com, Inc., seeks to be Earth's most customer-centric company, where customers can find and discover anything they might want to buy online, and endeavors to offer its customers the lowest possible prices. Amazon.com and other sellers offer millions of unique new, refurbished and used items in categories such as books, movies, music & games, digital downloads, electronics & computers, home & garden, toys, kids & baby, grocery, apparel, shoes & jewelry, health & beauty, sports & outdoors, and tools, auto & industrial.
Amazon Web Services provides Amazon's developer customers with access to in-the-cloud infrastructure services based on Amazon's own back-end technology platform, which developers can use to enable virtually any type of business. Examples of the services offered by Amazon Web Services are Amazon Elastic Compute Cloud (Amazon EC2), Amazon Simple Storage Service (Amazon S3), Amazon SimpleDB, Amazon Simple Queue Service (Amazon SQS), Amazon Flexible Payments Service (Amazon FPS), and Amazon Mechanical Turk.
Amazon and its affiliates operate websites, including www.amazon.com, www.amazon.co.uk, www.amazon.de, www.amazon.co.jp, www.amazon.fr, www.amazon.ca, and the Joyo Amazon Web sites at www.joyo.cn and www.amazon.cn.
Large-scale, worldwide scientific initiatives rely on some cloud-based system to both coordinate efforts and manage computational efforts at peak times that cannot be contained within the combined in-house HPC resources. Last week at Google I/O, Brookhaven National Lab’s Sergey Panitkin discussed the role of the Google Compute Engine in providing computational support to ATLAS, a detector of high-energy particles at the Large Hadron Collider (LHC).
Frank Ding, engineering analysis & technical computing manager at Simpson Strong-Tie, discussed the advantages of utilizing the cloud for occasional scientific computing, identified the obstacles to doing so, and proposed workarounds to some of those obstacles.
The private industry least likely to adopt public cloud services for data storage are financial institutions. Holding the most sensitive and heavily-regulated of data types, personal financial information, banks and similar institutions are mostly moving towards private cloud services – and doing so at great cost.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 10, 2013 |
Australian visual effects company, Animal Logic, is considering a move to the public cloud.
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/02/2012 | AMD | Developers today are just beginning to explore the potential of heterogeneous computing, but the potential for this new paradigm is huge. This brief article reviews how the technology might impact a range of application development areas, including client experiences and cloud-based data management. As platforms like OpenCL continue to evolve, the benefits of heterogeneous computing will become even more accessible. Use this quick article to jump-start your own thinking on heterogeneous computing.