How do we know when a new technology is the real thing or just a fad?
Furthermore, how do we value the significance of a new technology, and
when is new technology a tactical or a strategic decision? In this
article, I will discuss why Grid and SOA are here to stay. I will also
describe the technology "product stack" in order to identify strategic
from tactical, and I will propose best practices techniques for
securing ROI and offering resilience to change and early adoption risks.
Some would say that Grid and SOA are not revolutionary concepts, but
rather evolutionary steps of enterprise distributed computing. Make no
mistake, though: together, the technologies have the potential and the
power to bring about a computing revolution. Grid and SOA may seem
unrelated, but they are complementary notions with fundamentally the
same technology underpinning and common business goals. They are
service-based entities supporting the adaptive enterprise.
So, let's talk about the adaptive, or agile, enterprise and its
characteristics. The only constant in today's business models is
change. Constant change in the way of doing business exists either
because the company is out of focus or because of new competitive
pressures: today we are product-focused, tomorrow we are
client-centric. Re-engineering the enterprise is no longer the final
state, but more of an ongoing-effort. Consider six-sigma and the
Business Process Management (BPM) initiatives. Integration is not an
afterthought anymore; most systems are built with integration as a hard
requirement. There are changes of underlying technology that are
apparent across all infrastructures and applications. The fact that new
hardware delivers more power for less money proves that Moore's law is
still valid. And last, but most challenging, are the varying
requirements of processing compute power. Clearly, over-provisioning
can only lead to underutilization and overspending, both undesirable
Information systems have to support the adaptive enterprise. As David
Taylor wrote in his book Business Engineering with Object Technology:
"Information systems, like the business models they support, must be
adaptive in nature." Simply put, Information systems have two layers:
software and hardware supporting and facilitating business requirements.
SOA decouples business requirements and presentation (user interface)
from the core application. Thus, shielding the end-user from
incremental changes and visa versa: localizing the effect of code
change when requirements adapt to new business conditions.
Grid software decouples computing needs from hardware capacity. It
inserts the necessary abstraction layer that not only protects the
application from hardware change, but also provides horizontal
scalability, predictability with guaranteed SLAs, fault tolerance by
design and maximum CPU utilization.
SOA gave rise to the notion of the enterprise service bus, which can
transform a portfolio of monolithic applications to a pool of highly
parameterized service based components. A new business application can
be designed by orchestrating a set of Web services already in
production. Time to market for a new application can be reduced by
orders of magnitude. Grid services virtualize compute silos suffering
from under-performance or under-utilization and turns them into
well-balanced, fully utilized enterprise compute backbones.
SOA provides an optimal path for a minimum cost re-engineering or
integration effort for a legacy system. In many cases, legacy systems
gain longevity by replacing a hard-wired interface with a Web services
layer. The Grid toolkit can turn a legacy application that hit the
performance boundaries of a large SMP box to an HPC application running
on a farm of high-powered, low cost commodity hardware.
Consider a small to medium enterprise with three or four vertical lines
of businesses (LOB) each requiring a few turnkey applications. The
traditional approach would be to look at the requirements of each
application in isolation, design the code and deploy on hardware
managed by the LOB. What is wrong with that approach? Well, lines of
businesses most certainly share a good number of requirements, which
means the enterprise spends money doing many of the same things
multiple times. And what about addressing computing demands to run the
dozen or so applications? Each LOB has to do its own capacity
Keeping a business unit happy is a tight walk between
under-provisioning and over-spending. SOA is an architectural blueprint
that delivers on its promise of application reuse and interoperability.
It provides a top to bottom approach in developing and maintaining
applications. In this case, small domains of business requirements turn
into code and are made available to the rest of the enterprise as a
Grid, on the other hand, is the ultimate cost-saving strategic tool. It
can dynamically allocate the right amount of compute fabric to the LOB
that needs it the most. In Grid's simplest form, the risk and analytics
group can have near-time response to complex "what if" market scenarios
during the day, and the back office can meet the critical global
economy requirements by using most of the compute fabric during the
night window, which is getting smaller and smaller.
Next, let's review the product stack. First, I need to make a
distinction between High Performance Computing (HPC) and Grid. HPC is
all about making applications to compute fast -- and one application at
a time, I might add. Grid software, at large, orchestrates application
execution and manages the available hardware resource or the compute
fabric. There is further distinction based on the geographic
collocation of the compute resource (i.e., desktop computers,
workgroup, cluster and Grid). Grid virtualizes one or more clusters,
whether they are located on the same floor or half way around the
world. In all cases, hardware can be heterogeneous and with different
In this article, I refer to the available compute fabric as the Grid at
large. HPC applications started on super computers, vector computers
and SMP boxes. Today, Grid offers a very compelling alternative for
executing HPC applications. By taking a serially executing application
and chunking it into smaller components that can run simultaneously on
multiple nodes, the compute fabric, you can potentially improve the
performance of an application by a factor of N, where N is the number
of CPUs available on the compute fabric. Not bad at all, but admittedly
there is a catch. Finding the parallelization opportunity or chunking
is not always a trivial task and may require major re-engineering. That
sounds invasive and costly, and the last thing one wants is to make
logic changes to an existing application, adapt a new programming
paradigm, hire expensive niche expertise and embark on one-off
development cycles taking time away time form core business competence.
They good news is that several HPC design patterns are emerging. In
short, there are three high-level parallelization patterns: domain
decomposition, functional decomposition and algorithmic
parallelization. Domain decomposition, also known as "same
instructions, different data" or "loop level parallelization," provides
a simple Grid-enablement process. It requires that the application is
adapted to run on smaller chunks of data (e.g., if you have a loop that
iterates 1 million times doing the same computation on different data,
the adapter can chunk the loop into, say, 1,000 ranges and do the same
computation using 1,000 CPUs at the same time in parallel). OpenMP's
"#pragma omp parallel" is a pre-compiler adapter supporting domain
Functional decomposition comes in many flavors. The most obvious flavor
is probably running in your back-office batch cycle: a set of
independent executables readily available to run from the command line.
In its more complex variety, it might require minimum instrumentation
or adaptation of the serial code.
Algorithmic parallelization is left for very specific domain problems
and usually combines functional and domain decomposition techniques.
Such examples include HPC solvers for Partial Differential Equation,
recombining trees for stochastic models and global unconstrained
optimization required for a variety of business problems.
So, here is the first and top layer of the product stack: the
adaptation layer. Applications need an non-invasive way to run on a
Grid. This layer provides means that map the serial code to parallel
executing components. A number of toolkits with available APIs are
coming to market with a varying degree of abstraction and integration
effort. Clearly, different types of algorithms and applications might
need a different approach. Therefore, a tactical solution may be
required. Whatever the approach, you want to avoid logic change of
existing code and use a high level paradigm that encapsulates the
rigors of parallelization. In addition, you should look for a toolkit
that comes with a repeatable best practices process.
To introduce the next two layers, consider the requirements for sharing
data and communicating results among the decomposed chunks of work.
Shared data can be either static or intermediate computed results. In
the case of static data, a simple NFS type of solution or a database
access will suffice. But if the parallel workers need to exchange data,
distributed data shared memory services might be required. So, the next
layer going down the stack provides data transparency and data
virtualization across the Grid. Clearly, it is a strategic piece of the
puzzle, and high performance and scalability is critical for the few
applications that need these qualities of services.
Communication among workers gives way to the classic middleware layer.
One word of advice: make sure that your application is not exposed to
any direct calls of the middleware, unless, of course, you have time to
develop and debug low level messaging code. Better yet, make sure you
don't have anything to do with middleware calls and that the
application stack provides you with a much higher API abstraction.
So, you've developed your SOA HPC applications and all the LOBs are
lining-up to use the compute fabric. How do you make sure that
applications compute in a predictable fashion and within a
predetermined timelines? How do you assure the horizontal scalability,
reliability and high availability? This brings us to the most important
part of the stack -- the Grid software. The Grid software provides all
the quality of services that make the product stack industrial-strength
and mission-critical-ready: workload and resource management; SLA based
ownership of resources; fail-over; cost-accounting; operational
monitoring for 24x7 enterprises; horizontal scalability; and maximum
use of compute capacity. The core of this layer implements an open
policy-driven distributed scheduler.
A word of caution: resist the temptation to roll out your own solution.
Just answer this: If you were to implement a J2EE application, would
you write your own application server? A last word of advice: as
rapdily as standards are evolving and products are maturing, it is
important to pick your vendors wisely. Get a vendor that will be around
tomorrow and that has the technical expertise your enterprise will need
to extend the product and support your 24x7 operations.
Technologies cannot exist without real business benefits -- we've tried
this back in the dot.bom days, right? Clearly, SOA and the Grid
software stack are mature, address real, tangible business benefits,
and fully support the adaptive enterprise and the pragmatic reality of
change. The beauty of a Grid and SOA implementation is that it does not
have to be a big-bang approach to bring benefits. Start with your batch
cycle, the time-consuming custom-built market risk application or the
Excel spreadsheet running at the trader desk that takes 12 hours to
complete. Then, instrument your first HPC and take advantage of idle
CPU cycles, or transition an application from an expensive SMP machine
to commodity hardware. You will immediately see ROI and business
benefits. Be prepared for the unpredictable volume spikes that business
growth opportunities bring with them.
Until next time: get the Grids crunching.
About Labro Dimitriou
Labro Dimitriou is a subject matter expert in HPC and Grid. He has been
in the fields of distributed computing, applied mathematics and
operations research for over 23 years, and has developed commercial
software for trading, engineering and geosciences. Dimitriou has spent
the last four years designing enterprise HPC and Grid solutions in
finance and life science. He can be reached via e-mail at LDimitriou@platform.com