A
Computing Fabric consists of nodes--packages of processors, memory
and peripherals--that are linked together by an interconnect. Within
the Fabric are regions of nodes and interconnections that are so
tightly coupled they appear to be a single node. These are cells.
Tight
coupling within a cell is achieved with hardware, software or both,
although the performance of the resulting cell varies significantly
with the coupling implementation.
Cells
in the Fabric are then loosely coupled with each other--a loose
coupling of cells does not appear the same as a single node. The
Fabric as a whole--or each cell within--can grow or shrink in a
modular fashion, meaning nodes and links can be added and removed.
Nodes from the Fabric surrounding a cell may join that cell, and
nodes within a cell may leave that cell and join the surrounding
Fabric. Cells can divide as well as fuse.
These
characteristics can be summed up by saying that the boundaries of
a Fabric, and the cells within, are potentially fluid.
Sophisticated
software and hardware will be required to support the cellular characteristics
of Computing Fabrics. Initially, the combination of a distributed
operating system running on a distributed shared-memory architecture
implemented using a modularly scalable interconnect will produce
the desired results.
In
time, as these technologies and a fourth, transparent, automated
object distribution make their descent into the commodity space,
clusters and networks will take on these cellular characteristics
and become Computing Fabrics.
Distributed
shared-memory architectures
Each
cell of a Computing Fabric must present the image of a single system,
even though it can consist of many nodes, because this greatly eases
programming and system management. SSI (Single System Image) is
the reason that symmetric multiprocessors have become so popular
among the many parallel processing architectures.
However,
all processors in an SMP (symmetric multiprocessing) system, whether
a two-way Pentium Pro desktop or a 64-processor Sun Ultra Enterprise
10000 server, share and have uniform access to centralized system
memory and secondary storage, with costs rising rapidly as additional
processors are added, each requiring symmetric access.
The
largest SMPs have no more than 64 processors because of this, although
that number could double within the next two years. By dropping
the requirement of uniform access, CC-NUMA (Cache Coherent--Non-Uniform
Memory Access) systems, such as the SGI/Cray Origin 2000, can distribute
memory throughout a system rather than centralize memory as SMPs
do, and they can still provide each processor with access to all
the memory in the system, although now nonuniformly.
CC-NUMA
is a type of distributed shared memory. Nonuniform access means
that, theoretically, memory local to a processor can be addressed
far faster than the memory of a remote processor. Reality, however,
is not so clear-cut. An SGI Origin's worst-case latency of 800 nanoseconds
is significantly better (shorter) than the several-thousand-nanosecond
latency of most SMP systems. The net result is greater scalability
for CC-NUMA, with current implementations, such as the Origin, reaching
128 processors. That number is expected to climb to 1,024 processors
within two years. And CC-NUMA retains the single system image and
single address space of an SMP system, easing programming and management.
Although
clusters are beginning to offer a single point for system management,
they don't support a true single system image and a single address
space, as do CC-NUMA and SMP designs. Bottom line: CC-NUMA, as a
distributed shared-memory architecture, enables an easily programmed
single system image across multiple processors and is compatible
with a modularly scalable interconnect, making it an ideal architecture
for Computing Fabrics.
Modularly
scalable interconnects
A
Computing Fabric must support modular growth (and shrinkage). Most
multiprocessors lack modularity: A system that's expected to grow
to 32 or 64 processors must be ordered equipped to handle its maximum
of 32 or 64 processors, even if it's delivered with only eight to
begin with.
The
culprit is the system bus or crossbar switch that most multiprocessors
use to interconnect processors, memory and peripherals. The bus
or switch must be sufficiently big and fast to support the maximum
number of processors regardless of how few processors the system
has when ordered.
In
contrast, an SGI Origin can start out with eight processors and
grow to 128 without requiring the upfront purchase of the entire
infrastructure. The Origin accomplishes this with a modular interconnect,
called Craylink, that can be expanded in the field as processors
are added to the system. In other words, the Fabric grows with the
system, made possible by a proprietary SGI application-specific
integrated circuit called the Spider chip, with system cost scaling
linearly with system size.
The
modularity of the Fabric also means it doesn't present a single
point of failure, as do the buses and centralized crossbar switches,
and the modified hypercube connectivity automatically routes around
failed or saturated links. A current 128-processor Origin 2000 provides
20.5G bps of bandwidth between processors on opposite sides of the
system, called bisection bandwidth.
Modular
interconnects are not an invention of SGI, however. They've been
integral to massively parallel systems for well over a decade--found,
for example, in the nCube2, Thinking Machines Corp.'s Connection
Machine, the Alpha-based Avalon A12 and others. The news is SGI's
forthcoming implementation of rich modular interconnects in Intel
space, coupled with CC-NUMA.
Distributed
operating systems
The
cellular boundaries of a Computing Fabric must be easily reconfigurable.
In SGI's Origin 2000, power domains--units of processors, memory
and switching components that share a power supply--form a minimal
cell and run a copy of Cellular Irix. One or more cells can be taken
out of service, repaired or replaced, then returned to service--all
while others remain operational.
Assemblies
of power domains can similarly be treated as a cell. SGI's Cellular
Irix enables massively distributed operation while improving availability
and serviceability, giving Unix a dramatic edge over Windows NT.
Hypernetworks
Cells
within Computing Fabrics will also need to be loosely coupled while
maintaining high performance. Gigabyte system networks, the next
generation of SANs (system area networks), will layer the Scheduled
Transfer protocol atop HIPPI-6400 and its successors for VIA (Virtual
Interface Architecture) compatibility.
HIPPI-6400
switches will accept network interface cards for Gigabit Ethernet,
fiber channel and other protocols, supporting bandwidth aggregation
(e.g., 900-port gigabit Ethernet switches). These next-generation
SANs may ultimately merge with modularly scalable interconnects
to form hypernetworks.
Transparent
automated object distribution: Tight coupling, necessary to form
cells in a Computing Fabric, can also be achieved in software, where
the operating system handles distribution of objects, rather than
memory pages, across coupled machines. This frees programmers from
low-level details, allowing them to work at a high level of abstraction
while optimizing performance under varying conditions.
The
Millennium project at Microsoft Research supports transparent distribution
of COM (Component Object Model) and COM+ objects. To speed performance
across a SAN that is tightly coupled in this fashion, DCOM (Distributed
COM) is being layered atop VIA, providing a huge reduction in the
time it takes for objects and method invocations to travel between
nodes.
Interoperability
is critical to this vision if Computing Fabrics are to be fully
exploited. Either a standard modular interconnect must be adopted
by all--a dubious prospect--or a standardized interface between
nonstandard interconnects must be available, which is a more likely
scenario.
An
example of this is the Scheduled Transfer protocol for supporting
heterogeneous VIA clusters, now in the standardization process.
Seamless
connectivity is only half the equation. Although support for heterogeneous
Fabrics is primarily a political problem, creating heterogeneous
cells is a technical issue, likely to require significant R&D.
Two or more Fabrics running different distributed operating systems
could loosely couple as a cluster, but it would be superior if a
variety of distributed operating systems could tightly couple to
form a cell, collaborating in support of a single system image and
a single object space.
Microsoft's
Millennium has the potential to accomplish this with software. Funding
such efforts should not be a problem if Computing Fabrics take off,
as we anticipate they will.
Erick
Von Schweber and Linda Von Schweber are principals of Infomaniacs,
a think tank, specializing in technology convergence. They can be
reached at thinktank@infomaniacs.com
or www.infomaniacs.com.
|