The compute Grid is well understood today -- but much less time has
been devoted to getting data where it needs to be, when it needs to be
there, as well as to how this effort is managed.
Enterprise users say the ability to manage data on Grids is a key
requirement for accelerating Grid deployments within their IT
organizations. Some large enterprises are finding that limitations in
data management capability mean they must hold off on evolving their
Grid deployments. Those that have moved forward have usually done it
either through internal work, or through customized or cutting-edge
offerings from vendors. Most early adopters have long-term plans to
extend their activities from initial beachheads to multi-application
and cross-organizational Grids. But without proper data management
tools in place, applications will not perform well on top of a Grid
architecture, and the cost and performance advantages of implementing a
Grid will not be realized.
Commercial enterprise Grids require a data management infrastructure
that allows end users and applications to share information, regardless
of where it resides, and provides secure access to heterogeneous
databases, middleware, file systems and storage systems. Traditional
data management techniques are well established, but they were designed
to run on centralized mainframe or client/server architectures and need
to be adapted and extended for Grid architectures. If Grids are to
progress into mainstream commercial use, a model for transactional
Grids is needed that can support the kind of transactions that underpin
commercial organizations. Increasing the availability of commercial
applications for use on Grids is seen as key to driving accelerated
adoption. Some combination of caching, data streaming, replication,
global resource namespaces, data movement, data transformation, data
quality and storage volume virtualization may be required, depending on
the application and system architecture. As it stands now, no single
approach -- with the exception that a virtualized environment is
necessary -- or single vendor or group has a leadership position, and
no one can address data management on every part of the stack. The
challenge has been characterized variously as creating the data Grid,
storage Grid, information Grid or integration Grid. The 451 Group
believes the ability to manage data on Grids is the key to all of these.
Many enterprises, vendors and users have identified the transformation
to a service-oriented architecture (SOA) as a strategic, long-term goal
that can better align business with IT and improve responsiveness to
changing conditions. Financial services companies, for example, see
Grids as the underpinning for SOAs, which cannot be implemented without
sophisticated data management techniques. A SOA uses short transactions
and large volumes of associated data elements. For many organizations,
SOA is the future for their enterprise IT environments. Grid computing
is seen as the infrastructure model, and SOA as the application model.
But SOA is not an exclusive role for Grid technology, which is also
regarded as the underpinning for utility computing, a service delivery
model. Equally important is how Grid technology relates to event-driven
services, messaging, database systems, networking systems and legacy
assets.
Grid Vendors
Grid middleware/scheduling vendors themselves have not ignored data
management issues, especially as they seek to address a broader piece
of the Grid "stack," penetrate new markets and move beyond
high-performance computing Grids. The problem is that data management
is not part of their core skill set.
Platform Computing has some rudimentary caching capabilities in
Symphony, but it typically partners for data management functions.
DataSynapse has added data management virtualization functions to
GridServer, which incorporates some of its distributed GridCache. The
result is that more scalable, transactional applications that were once
unsuitable for Grid deployment can now be run on GridServer. United
Devices, perhaps the most unashamedly "compute-oriented" of this group,
says it will partner for most data management functions and build some
of its own, but it has yet to expand on this plan.
Major Players
The major vendors -- as always -- can have a huge influence over
technology directions; although they are not always working at the
cutting edge of technology. Oracle's marketing of its database as 10g
(the "g" is for "Grid") has raised awareness of Grid technology to new
levels, despite the fact that much of what Oracle is actually shipping
is not viewed as a "real" Grid implementation by many. It is really
database clustering. Oracle's view is that clustering is a good way to
implement Grid capabilities -- without heterogeneity. It sees customers
moving from infrastructure consolidation projects and on to Grids.
Typical offerings rely on federating access to other resources but not
integration, although Oracle's answer is that customers should put it
all in an Oracle database and then these concerns go away.
However, Oracle has done a good job of highlighting the lack of
transactional support in Grids with its associated low-latency,
high-volume data requirements, while other companies have yet to face
the challenge -- although startups such as CipherGrid are beginning to
address this issue.
Microsoft has not talked much about its plans for Grid technology until
recently, and it has confined the work it has done to the
high-performance computing sector. But the signs are that Grid
technology is about to enter the mainstream at the company. The launch
of SQL Server 2005 later this year and Windows Server Compute Cluster
Edition next year are the key events to look out for. In the longer
term, Microsoft is aiming to provide what it calls the "unified Grid,"
virtualizing the resources customers have in a heterogeneous
environment.
The addition of Tony Hey, former UK e-Science initiative director, as a
corporate vice president with Microsoft's technical committee, is
another important indicator of Microsoft's future with Grid computing.
With the Global Grid Forum having agreed on a vision for the Open Grid
Services Architecture that does not mandate either IBM's WS-Resource
Framework or Microsoft's WS framework for its implementation, Hey
believes there is now a way for Microsoft to participate in the Grid
industry. He expects Microsoft will play a role in open Grid standards
and then will implement them into Windows.
The big server vendors all have plans for on-demand, utility or
datacenter automation frameworks that will incorporate Grid technology
as their underlying infrastructure. IBM views data integration as an
important part of the Grid environment, relying heavily on information
virtualization. IBM's vision is to virtualize all of the information in
a distributed system -- data sources from local or distributed file
systems or databases/representations. To accomplish this, it would then
wrap a global name around it and assign it a policy for how it is to
appear. It is an aggressive technology direction, but IBM has invented
or been responsible for many of the most advanced data management
techniques.
Hewlett-Packard has plenty of Grid activities under way but hasn't made
much noise about them, partly because it decided early on that Grid
computing is going to be a horizontal, transformational technology that
will need time to mature because of the profound organizational and
cultural challenges it implies, and because of the time it will take to
turn everything into a service.
Sun's Grid Engine distributed resource manager and workload scheduler
remains the mainstay of its Grid middleware offering. More
sophisticated data management capabilities, including additional
caching and replication technologies, are scheduled to be included in
phase three of its Sun Grid Storage Utility buildout, and Sun will
either partner or acquire to obtain these.
Summary
A mixture of approaches -- data movement, replication and data
federation -- will be necessary to handle the growing number of
disparate data sources, including those outside of the database, as
well as the growing number of devices that need to access them within
enterprise IT environments. The 451 Group expects to see broader use of
federated data access, distributed main memory and local disk caching
techniques in future products. The 451 Group also expects to see
increasing data management support for Grids embedded within
application servers and databases. But because this implies a return to
a more centralized application-server approach that does not fit easily
into Grid architectures, an alternative development platform-oriented
model supporting multiple programming language interfaces will also
continue to be a requirement.
For more information about this topic, please visit
www.the451group.com/intake/gridtoday-aug05.
About William Fellows
William Fellows is a principal analyst at New York-based The 451 Group
-- an independent technology industry analyst company focused on the
business of enterprise IT innovation.