December 04, 2012
Last week, Amazon Web Services kicked off its first annual customer and partner conference, AWS re: Invent. The cloud giant welcomed nearly 6,000 participants (at $1,099 a pop) to the Venetian Hotel in Las Vegas for three days of cloud-themed activities and keynotes (for $500 more, attendees could take part in AWS Technical Bootcamps).
The Wednesday morning keynote session with AWS Senior Vice President Andrew Jassy revealed some big announcements, namely a cloud-based data warehouse service, AWS Redshift, and another price cut for the S3 storage service.
During his keynote, Jassy stated that "traditional old-guard data warehousing is really expensive and really complicated to manage." Responding to these points and to customer requests, AWS set out to design a solution that was scalable, with pay-as-you-go pricing and no upfront costs, with an emphasis on high performance, low cost, and flexible integration. Jassy explained that their new data warehouse service, AWS RedShift, meets this profile at a tenth of the cost of traditional data warehousing products. The service automates deployment and administration and works with popular business intelligence tools, like MicroStrategy, Japsersoft, SAP BusinessObjects, and Cognos.
Amazon Redshift is available in two sizes, the hs1.xlarge and hs1.8xlarge instance, which hold 2 TB and 16 TB of compressed data respectively. An Amazon Redshift cluster can be created with 32 hs1.xlarge nodes for up to 64 TB of storage or 100 hs1.8xlarge nodes for up to 1.6 PB of storage. The default maximum cluster size is 40 nodes (640 TB of storage), but users have the option to request a limit increase. The data is stored in columnar format which means the I/O moves much more quickly, improving query times, notes Jassy.
Amazon is no stranger to traditional data warehousing solutions. The company's retail division operates a warehouse cluster with 32 nodes, 4.2 TB of RAM and 1.6 PB of disk space that costs the company several million dollars per year. As a test case, AWS took 2 billion rows of data and six of their most complex queries and compared the two systems. Using 2 16 TB nodes on Redshift for $3.65/hour – or $32,000 a year – they achieved at least 10x faster performance.
On-demand pricing for the new service will start at $.85 per hour, but with a three-year reserved instance, the cost is less than $.23 an hour. Jassy points to analyst data that puts the cost of a typical data warehouse at $19,000-$25,000 per TB/year, while under Amazon's reserved pricing model, Redshift costs less than $1,000 per TB/year. The service is currently offered in limited preview with a full launch expected in early 2013.

Jassy also declared that AWS is lowering the price of its Simple Storage Service (S3) by 24-28 percent in the US Standard Region and enacting similar price reductions in all nine regions worldwide. This is the 24th time AWS has reduced its pricing. For the first TB, the service now starts at 9.5 cents per GB/month, down from 12.5 cents per GB/month.
Amazon has the lead time and economies of scale that allow it to basically outbid the competition on price, but that doesn't mean that its public cloud rivals are giving up or going home. The day before the re: Invent conference started, Google lowered its storage pricing by 20 percent. And then, on Thursday, presumably in response to the AWS decrease, Google lowered its prices again, by an additional 10 percent. Currently Google's storage pricing sits at .85 per GB/month (for the first TB), 10 cents cheaper than Amazon's. Google's reduced availability storage service is also cheaper than Amazon's. However Google does not have an equivalent to the low-cost Amazon Glacier service.
On Thursday, Amazon.com CTO Werner Vogels revealed two super-sized EC2 Instance Types, and another new service, the AWS Data Pipeline. Amazon's new Cluster High Memory and High Storage Instances are larger than the new compute instances that Google debuted earlier in the week. The Cluster High Memory instance (cr1.8xlarge), which comes with 240 GB of RAM and 2 x 120 GB solid-state storage drives, is intended for large scale in-memory analytics applications. High Storage instances, available in only one size, High Storage Eight Extra Large (hs1.8xlarge), comes with 117 GB of RAM sitting on 48 TB of disk space provided by 24 2 TB hard drives, connected with 10 Gigabit Ethernet networking.
The AWS Data Pipeline is a lightweight orchestration service for data-driven workflows, aimed at extracting data's "captive intelligence." Currently in a limited private beta, the service lets users move data through several processing steps to a targeted destination. It is pre-integrated with AWS data sources and also connects with 3rd party and on-premises sources.
The conference was named re:Invent, and the final day offered attendees a fireside chat with CEO Jeff Bezos and CTO Werner Vogels, in which the pair discuss several keys to invention, one of which is a willingness to fail or be misunderstood. "This allows you ramp up your rate of experimentation," notes Bezos. The discussion naturally comes around to cloud as the duo discuss how AWS has enabled companies to experiment faster and to be more innovative.
In closing Bezos offers some advice to aspiring entrepreneurs: 1) don't chase fads ("wait for the wave and pick something you're passionate about"); and 2) be customer-centric ("start with the customer and working backwards"). "Those two things will take you a long way, Bezos concludes.
Even after the conference has concluded, AWS news continues to roll in. Today the company announced that its AWS Marketplace now supports Windows-based software, including products from Parallels Software, MicroStrategy, and Quest Software. In addition, the AWS Marketplace now includes a Big Data category "for customers who want to analyze large amounts of data and are looking for ways to quickly solve Big Data problems."
May 23, 2013 |
The study of climate change is one of those scientific problems where it is almost essential to model the entire Earth to attain accurate results and make worthwhile predictions. In an attempt to make climate science more accessible to smaller research facilities, NASA introduced what they call ‘Climate in a Box,’ a system they note acts as a desktop supercomputer.
Read more...
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
Read more...
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/02/2012 | AMD | Developers today are just beginning to explore the potential of heterogeneous computing, but the potential for this new paradigm is huge. This brief article reviews how the technology might impact a range of application development areas, including client experiences and cloud-based data management. As platforms like OpenCL continue to evolve, the benefits of heterogeneous computing will become even more accessible. Use this quick article to jump-start your own thinking on heterogeneous computing.