A New Face in Data Caching

By Derrick Harris

December 3, 2007

In this Q&A, ScaleOut Software founder and CEO William Bain discusses his company’s distributed data caching solutions, including how they are being adopted by e-commerce and financial services customers, as well as how ScaleOut’s products differentiate themselves from the competition.

—–

GRIDtoday: First, can you give an explanation of ScaleOut Software’s distributed caching technology? How does it work and what are the important distinctions between distributed caching and traditional database-driven architectures?

WILLIAM BAIN: ScaleOut StateServer (SOSS) creates a distributed, in-memory object cache that spans the servers in a grid or server farm. Stored objects are globally accessible across the grid using intuitive application programming interfaces (APIs). SOSS employs a highly integrated architecture that combines automatic data partitioning and dynamic load balancing for scalability, transparent local caching for fast access, and intelligent data replication for high availability. For fast deployment and simplified management, caching servers automatically discover and join the distributed cache, which self-heals after server or network outages.

SOSS automatically partitions all of the distributed cache’s stored objects across the grid and simultaneously processes access requests on all servers. This reduces access times and scales the overall throughput of the distributed cache. It also avoids “hot spots” that can arise if objects are stored on the servers where they are created.

As servers are added to the grid, SOSS automatically repartitions and rebalances the storage workload to scale throughput. Likewise, if servers are removed, SOSS coalesces stored objects on the surviving servers and rebalances the storage workload as necessary.

Two levels of internal caching are employed to ensure the fastest possible access times. These caches are integrated into SOSS to automatically accelerate performance without involving the developer in configuring and coordinating multiple caches. One level of internal caching holds objects within the StateServer service process on each server. This speeds up repeated accesses to these objects by avoiding the networking overheads required to copy them from remote hosts. A second level of internal caching holds de-serialized objects within SOSS‘s client libraries. This cache sidesteps CPU overheads required to retrieve objects when they are repeatedly accessed from the distributed cache.

SOSS ensures that cached data is never lost — even if a server in the grid fails — by replicating all cached objects on up to two additional servers. If a server goes offline or loses network connectivity, SOSS retrieves its objects from replicas stored on other servers in the grid, and it creates new replicas to maintain redundant storage as part of its “self-healing” process. SOSS uses a patent-pending, scalable, point-to-point heartbeat architecture that efficiently detects failures without flooding the server grid’s network with multicast heartbeat packets. Heartbeat failures automatically trigger SOSS‘s self-healing technology, which quickly restores access to cache partitions and dynamically rebalances the storage load across the grid.

Using a distributed cache in place of a database server (DBMS) to store data has the dual advantages of very high performance with essentially unlimited scalability. The most common data stored in a distributed cache is mission-critical, but relatively short-lived data, called “workload data.” This type of data includes session-state, shopping carts, cached database results, SOAP requests, financial data, grid computing results, and other rapidly accessed, fast-changing application data. To provide global accessibility, applications historically have stored workload data in a centralized, back-end DBMS so that it can be retrieved from any server and preserved in case of server outages. However, database servers are designed to handle long-term, line-of-business data, such as inventory, purchase orders, billing records, and other long lived business data. As the following table illustrates, workload data have different characteristics that make it poorly suited for storage in database servers:

Characteristic	Line of Business Data	Workload Data
Volume	High	Low
Lifetime/turnover	Long/slow	Short/fast
Access queries	Complex	Simple
Data preservation	Critical	Critical, but reproducible
Access:update ratio	>4:1	~1:1
Fast access and update	Less important	Very important

In addition, database servers can be costly (especially if clustering is used), and traffic to and from the data storage tier creates a bottleneck that impacts both performance and scalability. Database caches alone aren’t the answer because they can’t accelerate updates to fast-changing workload data. Distributed, in-memory caching solves these problems.

Gt: What business problems are driving the need for distributed caching solutions? In what markets are these needs most pressing?

BAIN: There are two general types of business problems that are driving the adoption of distributed caching. One is the need of e-commerce sites running on server farms to simultaneously scale and be highly available as their traffic increases. Without distributed caching, developers have to choose between high availability using a DBMS for scalability using in-process storage. Distributed caching simultaneously solves both these issues.

The second driver is the grid computing market, especially the financial services vertical, where there are extreme pressures to wring out every microsecond of latency and to maximize application throughput. Over the past few years, the decline in exponential growth of CPU speed has stimulated a resurgence in the use of grid-based computing, which has provided important performance gains. However, data access technology continues to lag far behind. Most data used in grid computing today is either maintained in a database until needed by the grid or delivered sequentially to the grid by a master control node, and even interim results are frequently stored in a database. These techniques drastically and unnecessarily lengthen the overall compute time. Distributed caching avoids these limitations by reliably hosting application data in memory within the compute grid’s servers, making it simultaneously available to all compute nodes.
Gt: What does ScaleOut’s product line look like? How do its different solutions address different needs?

BAIN: ScaleOut Software’s flagship product, StateServer, provides a distributed cache as described earlier. SOSS includes comprehensive APIs for storing data objects, and it also transparently stores ASP.NET session state on ASP.NET server farms. Some customers only need a solution for ASP.NET session-state, and to meet that need, we provide ScaleOut SessionServer, which transparently stores ASP.NET session-state but does not include the APIs.

ScaleOut Software also has released two other important products. ScaleOut GeoServer replicates stored objects between SOSS caches running on server farms at different sites. This enables multiple datacenters to stay fully protected against site-wide failures. GeoServer’s capabilities help IT managers meet the stringent performance and uptime needs of high-end Web sites and other mission-critical applications. ScaleOut Remote Client lets client applications running on networked computers remotely access an SOSS distributed cache. In many situations, it is more convenient to deploy the SOSS distributed cache on its own dedicated server farm instead of co-locating it on a Web server farm or compute grid. This adds a new level of flexibility to the deployment of SOSS‘s distributed cache by allowing it to be hosted on a server farm tailored for distributed caching and accessed by numerous remote clients.

ScaleOut Software’s currently released products support .NET/Windows environments. In early 2008, we also plan to offer a Java/Linux version of our products. This version will be completely interoperable between .NET/Windows and Java/Linux, allowing any combination of clients and servers to coexist and use a single distributed cache storage environment.
Gt: How has ScaleOut approached the area of distributed caching differently than other providers in this space? What about distributed database solutions?

BAIN: First, it is important to distinguish SOSS from an object-oriented database like Objectivity/DB or GemStone. These products were designed for long term storage of object-oriented data and not for distributed data caching to scale application performance with high availability.

[Oracle] Coherence, [GemStone] Gemfire EDF, and GigaSpaces all provide competitive distributed caching products. The products generally have a Linux/Unix/Java heritage, although they provide interoperability to Windows .NET. In contrast, ScaleOut StateServer was designed from the outset to be portable across Windows and Linux environments and to deliver fully native performance (instead of just interoperability) in both environments. Also, we have focused on .NET deployments to date, so we have a strong, long-term presence in .NET that these competitors cannot match.

From an architectural viewpoint, SOSS is distinguished from its competitors in these key aspects:

SOSS was designed from the ground up for scalable performance and high availability. We did not extend a single server cache into a distributed environment. Instead, SOSS was conceived as a scalable, distributed cache from its inception in 2003. The benefit in doing this is that each and every feature, such as distributed locking and remote access, fits into a coherent architecture which is both scalable and highly-available.

To maximize performance with the simplest possible deployment model, SOSS was designed as an integrated caching solution instead of as a set of building blocks that the user combines together. For example, cache partitioning is internally managed and integrated with a dynamic load-balancer. This maximizes scalability and automatically handles changes to the server membership. Likewise, data replication to a subset of servers is “baked” into the product for high availability and also is integrated with the load-balancer. This gives the user the simplest possible view of the distributed cache. SOSS also includes transparent local caching that is automatically kept coherent with the distributed cache. The benefit of all this is that the user does not have to take the time to architect and then tune a distributed caching solution out of building blocks. We have made sure that it all works together seamlessly and delivers the highest possible performance.

SOSS is the easiest caching product to deploy and manage. Our goal was to make SOSS as easy as possible to configure and run by automating almost all management actions. When the user installs SOSS, he/she only has to select the network subnet to get the cache running. At that point, the SOSS service automatically discovers the other caching hosts and joins the distributed cache. If a failure occurs, SOSS detects the surviving hosts and self-heals to restore its redundancy. The use of multicast for self-discovery keeps this process fully automatic (easier than managing configuration files) and it has very low network overhead.

Gt: Can you speak a little about your customer base — what it looks like in terms of numbers, specific users/ use cases, major industries, etc.?

BAIN: ScaleOut Software’s products are running on thousands of servers at nearly 150 customers across a wide range of industries. Initial adoption of SOSS came from customers with e-commerce applications needing to transparently store session-state information on Web server farms. In late 2005, we observed strong interest in the use of our APIs to cache many types of application data. This trend has quickly grown. More recently, the financial services segment has rapidly ramped up with its need to handle very large computational loads over decreasing time frames. Most of these companies have deployed grid computing environments running tens and hundreds of servers in each compute grid. They have turned to distributed caching to boost performance and scalability while maintaining high availability for stored data.
Gt: How would you rate the overall demand for distributed caching solutions right now, and how do you see the level of demand changing in the years — or even months — to come?

BAIN: Both the demand for distributed caching and the range of applications are rapidly growing. Developers have discovered that data access is a bottleneck which limits the performance and scalability of their grid-based applications. Distributed caching is the key technology that can address these challenges. At the same time, distributed caching solutions have evolved in both their functionality and ease of use, opening up this technology to an ever widening group of architects and application developers who lack specialized training in distributed computing. In fact, we are seeing department-level organizations increasingly taking advantage of distributed caching, thereby creating a much larger market opportunity.

Especially in financial services, we see an exploding interest in distributed caching because the volume of daily transactions that need to be processed is growing exponentially, and the market is extremely competitive. Application developers need effective, easy to use tools that can extract the highest possible performance with the least amount of programming effort. Distributed caching fills an important need in this market, and ScaleOut Software’s focus on high integration and ease of use keeps the learning curve for developers to a minimum.
Gt: Speaking of the future, do you see distributed databases — whether in-memory caches or simply data grids — becoming the norm at any point? Are traditional approaches just too slow for today’s increasing needs in terms of low latency, etc.?

BAIN: With the tail-off in the historic, exponential growth rate of CPU clock rates, multi-core and multi-server architectures will increase in dominance. This also will accelerate the adoption of server virtualization. Distributed caching provides the “glue” which ties all of these elements together to form a scalable processing platform for tomorrow’s applications.

As with many other technologies, a tipping point of adoption occurs when a critical mass of understanding, experience and a positive and proven cost-benefit ratio is reached. Starting with applications where there is pressure to remove latency and increase performance, we expect distributed caching to emerge as an essential component of application development platforms within the next few years.
Gt: Is there anything else you’d like to add about ScaleOut Software’s technology or products, or about the distributed data market in general?

BAIN: ScaleOut Software is focused on playing a leadership role in this exciting market. Our architectural approach is based on almost 30 years of experience developing parallel computing solutions for scientific and commercial applications. This experience shows us that distributed caching is the foundation of a distributed computing platform that can dramatically reduce development time and boost application performance. At the core, distributed computing platforms need to be easy to target by application developers, and they need to be easy to deploy and manage. You can expect ScaleOut Software to roll out new technologies over time that further advance our technology leadership in distributed computing.

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Research senior analyst Steve Conway, who closely tracks HPC, AI, Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, and this day of contemplation is meant to provide all of us Read more…

Intel Announces Hala Point – World’s Largest Neuromorphic System for Sustainable AI

April 22, 2024

As we find ourselves on the brink of a technological revolution, the need for efficient and sustainable computing solutions has never been more critical. A computer system that can mimic the way humans process and s Read more…

Empowering High-Performance Computing for Artificial Intelligence

April 19, 2024

Artificial intelligence (AI) presents some of the most challenging demands in information technology, especially concerning computing power and data movement. As a result of these challenges, high-performance computing Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

2024 Winter Classic: Texas Two Step

April 18, 2024

Texas Tech University. Their middle name is ‘tech’, so it’s no surprise that they’ve been fielding not one, but two teams in the last three Winter Classic cluster competitions. Their teams, dubbed Matador and Red Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Resear Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics — announce Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Google Addresses the Mysteries of Its Hypercomputer

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel’s Xeon General Manager Talks about Server Chips

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

Click Here for More Headlines

HPCwire is a registered trademark of Tabor Communications, Inc. Use of this site is governed by our Terms of Use and Privacy Policy.

Reproduction in whole or in part in any form or medium without express written permission of Tabor Communications, Inc. is prohibited.

Leading Solution Providers

Off The Wire

Industry Headlines

April 23, 2024

April 22, 2024

April 19, 2024

April 18, 2024

Subscribe to HPCwire's Weekly Update!

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

AI Saves the Planet this Earth Day

Intel Announces Hala Point – World’s Largest Neuromorphic System for Sustainable AI

Empowering High-Performance Computing for Artificial Intelligence

Kathy Yelick on Post-Exascale Challenges

2024 Winter Classic: Texas Two Step

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

AI Saves the Planet this Earth Day

Kathy Yelick on Post-Exascale Challenges

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

MLCommons Launches New AI Safety Benchmark Initiative

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

Nvidia H100: Are 550,000 GPUs Enough for This Year?

Synopsys Eats Ansys: Does HPC Get Indigestion?

Intel’s Server and PC Chip Development Will Blur After 2025

Choosing the Right GPU for LLM Inference and Training

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

Google Addresses the Mysteries of Its Hypercomputer

How AMD May Get Across the CUDA Moat

Leading Solution Providers

Contributors

Tiffany Trader

Editorial Director

Douglas Eadline

Managing Editor

John Russell

Senior Editor

Kevin Jackson

Contributing Editor

Ali Azhar

Contributing Editor

Alex Woodie

Contributing Editor

Addison Snell

Contributing Editor

Drew Jolly

Assistant Editor

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

China Is All In on a RISC-V Future

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

Eyes on the Quantum Prize – D-Wave Says its Time is Now

GenAI Having Major Impact on Data Culture, Survey Says

The GenAI Datacenter Squeeze Is Here

Intel’s Xeon General Manager Talks about Server Chips

The Information Nexus of Advanced Computing and Data systems for a High Performance World

Share

Copy short link