Computing Fabrics FAQ - by Linda Von Schweber & Erick Von Schweber

Computing Fabrics (1998-2003)

On May 19, 2003: Eric Lundquist, Editor-in-Chief of eWeek, recognized that IBM's On-Demand Computing, HP's Adaptive Enterprise, and Sun's N1 are all movements towards Computing Fabrics as we first predicted them in 1998.

On January 7, 2002: eWeek called our 1998 Computing Fabrics Cover Story "Prescient"
and declared The Grid, a subset of Computing Fabrics, "The Next Big Thing".

Riding the
Third Wave

In the News 2002-2004

Computing's Next Wave 1998
(The First Report)

The Next Big Thing 2002
Computing Fabrics & Grids

The Three Waves of Computing

Architecture

Defined & Compared

Resources

Conferences & Workshops

The Bigger Picture

Resources

FAQ (1998)

Q: Can Computing Fabrics really operate over the WAN and broadband?

A: The broadband will constitute an interconnect with which to "loosely couple" cells, clusters of nodes that are local and tightly coupled themselves. This will require predictability of latencies for utmost performance, but that can be achieved through QoS measures debuting over the next 5 years. Systems will not use hardware approaches to tightly couple over broadband connections. It is the essence of fabrics that they utilize both types of couplings, tight and loose, but in a more flexible manner than current networks and clusters.

Q: Won't the additional cost of Computing Fabric nodes and interconnects, compared with desktop PCs and Ethernet, slow their adoption?

A: Employees throughout corporate America may have little to say about what desktop gets purchased for them – even less if the functional equivalent, a fabric node, no longer sits on their desk! TCO was picked up and embraced mighty fast and if Computing Fabrics bear out then the cost savings they’ll accrue through more effective use of ambient cycles will justify purchasing the machines that save the most money in the final analysis, not just the cheapest machines.

Besides this, some cells will be formed using Automated Transparent Object Distribution (the strategy behind Microsoft's Millennium), requiring little more than today's desktops.

Q: How will security be handled on Computing Fabrics?

A: This is a big issue with many sub-issues. For example, what does it mean to secure a remote memory read or write? How would you deal with the overhead and still maintain ultra-low latencies and preserve cache coherence? However, there are plenty of benefits here to justify the R&D to attack these problems, as well as sufficient time and dollars to do so. Tight coupling using software at the object level will incur fewer security issues than tight coupling with hardware at the memory page level.

Q: Will Computing Fabrics support legacy compatibility?

A: Existing multithreaded applications should run on Computing Fabrics, if not fully exploit their unique properties of dynamic reconfigurability and fluid system boundaries. Beyond this there are issues such as incorporating equipment currently owned or purchased over the next several years (which will be legacy within Computing Fabrics' evolutionary time frame) into the fabric. Cellular IRIX from SGI will support both CORBA and DCOM thus making fabrics employing IRIX interoperable with most legacy systems.

Q: The enterprise seems a likely place for Computing Fabrics to take hold, but are fabrics really likely to take off in the consumer space?

A: The enterprise is where fabrics will initially be taking off, as an outgrowth of the current interest in clustering, especially as the problems that clustering is targeted at have been so poorly addressed by most vendors.

Moving beyond the enterprise, into the consumer realm, ones finds what’s being planned by telcos for residential neighborhoods, distributing massive processing power through them, heavily utilizing distributed services.

Now add to that certain technologies that impact the human interface for consumers, including 3D, media integration, and the convergence of devices such as game consoles, set tops, and PCs. These will be a strong motivator for Computing Fabrics outside of the enterprise because they significantly advance the user experience, and they eat up lots of computational resources, many of which need to be near the end user because of the speed of light.

Visualize the progression of Computing Fabric as beginning with many smaller fabrics, first in the enterprise, that in time join up to become fewer, larger fabrics. Neighborhoods of processors and memory within these fabrics exhibit a single system image through tight coupling. These neighborhoods of tight coupling are themselves loosely coupled with each other. And due to the distributed OSes and interconnects these fabrics will employ, the boundaries of the neighborhoods will not be rigid but fluid. Feeding and supporting all this will be economies of scale and reuse far greater than today’s. These same principles will in time apply to processors distributed throughout residential areas.

Q: Are Computing Fabrics just about scalability?

A: Computing Fabrics address scalability but that’s only one small slice of a wide panorama. Improvements in scalability are largely a quantitative change. Computing Fabrics will be a qualitative change (as well as quantitative) in that we’re no longer talking about networking fixed-architecture systems but seeing system architecture converge with network architecture, with architecture itself ultimately becoming a dynamic variable - Architecture On Demand.

Q: Are Computing Fabrics really "in sight"?

A: When is a new era of technology within sight? Technologies often first enter into classified "black" projects long before the public even gets a whiff of them. Then they enter unclassified military usage, then on into academic research, on into the high-end of the commercial marketplace, and only after many years do they enter the mainstream. At which point are they "near us", "within sight", or "upon us"? The intelligence community would say the next era is upon us while many in education are nowhere near client/server! The technologies of Computing Fabrics have been and continue to be proven and are also being scaled. Within two years they will be implemented with mass-market microprocessors.

Q: Isn't it a well established fact that NUMA machines have higher latencies than SMP machines do, and always will?

A: This is not just an architectural issue, it is also involves engineering and implementation factors. The normative latency of some SMPs exceeds the worst case latencies of some NUMA implementation. The team responsible for Craylink at SGI/Cray Research has a project to extend Craylink with three major milestones. Within 2 years time they will expand the system size that Craylink can support by extending the reach of Craylink cables to encompass a very large room. In three years time (from the present - late 1998) they will expand the bandwidth of Craylink. And in 5 years time SGI will extend the range again, this time dramatically, to support distribution throughout a building and beyond.

One possible implementation to achieve this is to add a cache coherency protocol on top of the follow-on to SuperHIPPI, but that is just one direction being considered. These three extensions to Craylink are being pursued in parallel by SGI.

Another approach, being pursued by Microsoft in their Millennium research project, automates the distribution of COM+ objects around a network (unlike today where programmers must decide the location of client and server objects). Furthermore, they are layering DCOM over VI Architecture links to achieve very low latencies for remote method invocations. HIPPI-6400 will support distances up to 10km. With VIA running over HIPPI-6400 and DCOM over VIA, and Millennium's Continuum automating the distribution of COM+ objects, a distributed object space (as contrasted with distributed shared memory) could reach out over far more than a quarter of a mile.

Q: Will special programming tools or paradigms be required, like MPI or PVM?

A: Programming a Computing Fabric will most likely resemble programming a distributed shared memory machine. This involves great thread packages and great compilers, just as it does for programming centralized shared memory machines (SMPs), although there are differences in the application on these two architectures.

MPI (Message Passing Interface) and PVM (Parallel Virtual Machine) are not required. MPI and PVM are used by programmers of massively parallel machines (and on networks of workstations where supported by a utility) to obtain portability between parallel architectures and implementations. They explicitly support programming distributed "Non-Shared" memory computing, where each processor has its own memory with it own address space and message passing is used to coordinate function invocations, reads, writes, etc. amongst the ensemble. PVM was developed at Oak Ridge National Lab in ‘89 to run across a network of UNIX boxes. MPI began at a workshop on message passing in ’92 and the first version was published a little over a year later, making its big debut at Supercomputing ’93 in November of that year.

Computing Fabrics is not network computing, which primarily means distributed file systems and more recently distributed objects. What we’re moving into is a convergence of the loosely coupled programming model of networks and MPP (distributed processing) with the more abstract, transparent programming of SMPs, where the programmer is shielded from many of the details of the underlying distribution, since all memory is shared. It should be pointed out that the SGI Origin 2000, a precursor of a Computing Fabric, can also be programmed with a parallel library, such as MPI, when message passing best suits the needs of the developer.

Q: I haven’t heard anyone talk about dynamic reconfiguration especially in the sense of interconnects since that could disrupt the entire adaptive routing algorithms used for these type of systems causing all sorts of hotspots and contention especially if the applications didn’t have the same requirements, which they most probably wouldn’t.

A: The architecture of the SGI Origin2000 and its routing system was designed so that hardware makes routing decisions while software can reconfigure the network. Right now this reconfiguration is limited to avoiding faults and better using resources, but this will be changing. Microsoft is also working on this as part of the Millennium project at Microsoft Research, very much focused on creating fluid system boundaries. These elements differentiate Computing Fabrics from mere networks of bus- and switch-based multiprocessor systems with their inherently rigid system boundaries.

Q: Don't Beowulf systems have a great deal of similarity to Computing Fabrics? I understand these systems have even enabled screensavers that can detect idle times automatically and log your machine into the "fabric" and begin work on a shared problem.

A: Though focused on Linux, Beowulf is a great project but like Linux is not a commercially backed one, limiting its applicability in the enterprise. There is even a 1,000 processor Beowulf system under construction using Alphas. Beowulf systems do not even utilize a low-latency link between systems, because they are intended to exploit technology that is commodity "today". Also, software tends to follow hardware, often by significantly long stretches. Today we’re seeing the beginnings of the hardware for Computing Fabrics. It will motivate the development of the software. Beowulf is a software solution to provide some measure of distributed processing on today’s commodity hardware.

Q: How much will it cost to put a a scalable interconnect on each desktop in order to create a Computing Fabric across an enterprise? If it is cheaper to use a non-scalable one I suspect that one will win. And do we really need a hypercube on every desk?

A: Despite the fact that in the mid eighties Danny Hillis suspected that the Connection Machine would spawn desktop hypercubes in 10 years, no, we do not need a hypercube on the desktop, that would be a perversion of the direction technology is headed. Rather, the machines that replace desktops will participate in a fabric that has hypercube regions to its topology. At first, say 3-5 years off, it will indeed be pricey to use cache coherent scalable interconnects. Now, although NCs haven’t been a big hit (and rightly so) TCO has caught on, and so too will TCC, Total Cost of Cycles. This could well make widespread use of modularly scalable interconnects very attractive as a way of exploiting an organization’s cycles. Besides this, there are several human interface directions at work that mandate significant power (processing cycles and cache) out near the users but not dedicated to a specific user 24x7x365. These will weigh heavily as motivation towards distributed shared processing and Computing Fabrics.

Q: Is all this really necessary to run Word?

A: Ever increasing cycles will be needed to enhance the human-computer interface, and that power better be "near" the user though it needn’t be on their desktop. Desktops themselves may disappear (literally the desk tops – the computers will depart with them. More workers will be peripatetic and mobile, as well as working from home and in the field). The first place fabrics will catch on is in the heart of the enterprise, an evolution of server farms. But financial pressure will likely cause the assimilation of whatever succeeds the desktop. Two kinds of connectivity will be present (at least). Information appliances and personal interface devices will connect in using wireless RF and scattered IR (and other technologies from DARPA projects). These will certainly not support cache coherent SSI but only loosely coupled distributed processing. Its what’s behind the walls that’s likely to become tightly coupled but distributed so as to remain relatively close to the users.

Q: Will Computing Fabrics create a totally homogeneous computer architecture from the bottom of the industry to the top?

A: There will still be layering in the industry and technological innovations specific to the realm of problems being solved. But the similarities will be greater than the differences, that fabric clusters in academia operating at teraflops will be roughly equivalent to small fabrics in the enterprise, but the superfabrics will use processor variants with bigger caches, perhaps extended superscalar architecture, the latest, fastest SuperDuperHIPPI, while the small fabrics at the department level will use last year’s variants. The point is that for the first time these technologies are variants of one another, not altogether different beasts. It means that one year’s superfabric technology can directly become a commodity fabric technology within years, not decades.

Q: How do Computing Fabrics in residential neighborhoods help manufacturers get closer to the consumer?

A: When the consumer is literally embedded in massive processing many new things become possible. The heart and soul of a company, caught in its 3D multimedia knowledgebase, can be locally instantiated for the consumer, allowing them to configure the products and services based on the production and support capabilities of the vendors or a cooperative of vendors. Such vendors will virtually "fuse" their demo, warehousing, and production spaces with the consumers’ space, sometimes multiple vendors at a time in a comparison shopping and bid situation. This is "getting close" to the consumer and demands lots of power. Do you need this power to send the customer an invoice? No, but that’s not the kind of cozying up being considered.

Q: Why should business begin making plans now for Computing Fabrics when they’re still years away?

A: Infrastructure, tooling, training, corporate structure, and capital investment costs big, very big, and can chain a company so tightly to the past that they can’t get free of it. Second, business should consider fabrics in their business model (analyses and planning – not taking immediate action), which does look out beyond next Christmas. Technology vendors will be the first who will need to come to grips with fabrics so that they can exploit the trend rather than become a casualty of it.

In retrospect, would organizations have wanted to become aware of the PC back in the 70’s, would minicomputer vendors have wanted to know about distributed processing based on commodity microprocessors, and would businesses have wanted an advance warning of the coming web? The answers are trivially easy: yes, yes, and yes.

Q: Won't Computing Fabrics run into memory locality problems and contention that their distributed shared memory architecture can't address?

A: The Computing Fabrics landscape will likely have very large ensembles of processors, probably making today’s 64 and 128 processor machines seem diminutive. So in this future let’s compare a cluster of 16 Sun Starfire SMP servers, each with 64 processors, to a single SGI Origin with 1,024 processors. If you’ve got a problem that fits in the address space of a single one of the Sun servers then you can claim uniform latencies. But many of these huge arrays will handle huge problems that only work nicely in a contiguous address space. First, these are not going to run on the Sun, as the problem must be broken up into parts that can be distributed to each node (of 64 processors) in the Sun cluster. So let’s say you do just that. What about the latency of travelling from a processor on one SMP to a processor on a different SMP in the cluster? Is this latency going to be uniform with the latency across the crossbar switch in a single Sun server? No it won't. In fact, its likely to far exceed the end-to-end latency in the 1,024 processor Origin. So, if we say this is bad for the Origin then we’ll have to say it’s doubly bad for the Sun, which carries over to all loosely coupled clusters of SMPs.

All architectures make design decisions that invoke tradeoffs. The question is not if a particular architecture has deficiencies, they all do, but rather does an architecture make wise tradeoffs given the kinds of applications it is known in advance that it will be used for, as well as the many areas that it will ultimately be applied to. In pursuit of modularly extendable fabrics SMP just doesn’t cut it, except at the nodes of the architecture, with hardware and software used to maintain cache coherence between these SMPs. This is fine because today’s clusters of Small SMP’s are likely to evolve with fabric technology to become exactly what the SGI machines are becoming – commodity processors and all, but they’ll be lots cheaper. That’s one of the main points here, as the Origin architecture goes Intel similar functionality will come to clusters of commodity machines. This is big news for the future of computing, as clusters become systems and systems become clusters, and the distinction disappears. What’s the source of this revolution? It’s the 3 key technologies identified herein, the distributed shared architecture, the rich modularly scalable interconnect, and the Cellular OS, on their way into Intel space.

Concerning non-parallelized code, if you mean code that does not take advantage of a multithreaded package it won't function on an SMP either. Effective multithreading is all that’s "minimally" required to utilize an SMP, a CC-NUMA machine, or a Computing Fabric. Now if by parallelization you mean instead rearchitecting software to explicitly use MIMD, SIMD, or vector parallelization of loops – that’s what almost everyone is trying to avoid and why a single address space, whether in an SMP or in CC-NUMA, is so desired. Programming to a message passing model is vastly different than programming to a shared memory model, whether that memory is in reality centralized or distributed. Programs based on message passing can be run sub-optimally on an SMP, run less sub-optimally on some NUMA architectures, and run very well on clusters and MPPs. Programs explicitly written to an SMP model can run sub-optimally on CC-NUMA but not at all on a cluster or MPP. Since programs such as RDBMSes are for the most part (but not entirely) programmed to the SMP model they will be better supported and more easily ported to a fabric that supports distributed shared memory (e.g., ccNUMA) than a message passing cluster or MPP (note Oracle’s multi-year painful experience in porting to the nCUBE2 MPP from a code base optimized for Sequent SMP). Lastly, on contention and hotspotting, these are minimized by the Cellular OS as well as the hardware that provides cache coherency. To reiterate, no architecture solves all problems free of side effects. SMPs don’t. ccNUMA doesn’t. MPP sure doesn’t. Moving to a loosely coupled approach using distributed objects doesn’t either, in fact that only invokes multiple passes through the entire protocol stack unless layered on ST or VIA – talk about contention!

Q: How does the SGI Spider chip enable Craylink to function without contention and without running into an N-squared increase in system cost and complexity?

A: The Spider chip itself is non-blocking. Since each of its 6 ports supports 4 virtual channels there can be contention in that a packet with a recent age may wait while the on-chip arbiter enables an older packet to progress though the crossbar. This is a design decision and alternatives are being explored with the next version of the Spider chip which will also offer an increased number of ports (exact increase not yet disclosed). The fact that to create a totally non-blocking NxN switch requires on the order of N squared Spider chips is not relevant to the design of the SGI Origin as it does not use this architecture - it employs a variant of a hypercube.

The cost of the system does scale close to linearly with a single discontinuity when expanding from 64 to 128 processors, then the number of Spider chips required resumes a linear increase. In terms of backplane connections between Spider chips these too increase in a direct linear relationship to the number of processors. Only the number of Craylink cables between Spider chips departs from strict linearity, but only slightly so, in that doubling the number of processors with this architecture from 128 to 256 ups the required Craylink count from 112 to 236 rather than 224 – a minor departure. While bandwidth per processor remains constant as the system is grown it is true that latency increases between widely separated processors – this is still NUMA after all and that is a design tradeoff. However, actual measured latencies for widely separated processors in a large Origin 2000 are actually less than the "constant" latency between processors and memory in many SMPs.

Q: Don’t hypercubes of the SGI Origin design decrease system bandwidth with increasing size, meaning a fabric based on this design cannot adequately scale?

A: The hierarchical fat hypercube topology does not "scale down" as nodes are added, provided that the metarouters used by the system between hypercubes increase in dimension as the system grows – precisely the pattern followed by SGI engineers. For example, taking a 1,024 processor system built with this architecture and cutting it in two (bisection) yields 8 processors per connection, exactly the same bisection bandwidth as a similarly connected system of 16 or 32 processors. The key is to increase the dimensionality of the metarouter as the system grows. For example, a 128 processor Origin uses 8 2D metarouters (essentially rings), but a 1,024 processor Origin will use 8 5D Hypercubical metarouters. It should added that this use of metarouters will be advantageous when the interconnects are distributed through a building, though that’s 5 years off.

As for bristling (or the similar topology of cube connected cycles) this does indeed impact bandwidth per processor, hence it is wise to decide on the bristling factor before making design decisions on the network. It appears that SGI has indeed done their homework here. The bristle factor (the number of processors supported by each router) begins at 4 processors per router for configurations of the Origin2000 up to and including a 64 processor configuration. But beginning with 128 processors (and up once they begin shipping) the bristling decreases to on average 2 processors per router. Why? Because the metarouters proportionately expand, with Spider chips in the metarouter joining up with Spider chips on node boards to form virtual super routers of the fat hierarchical hypercube. Bottom line: bandwidth scales linearly with system size (with one discontinuity), latency is non-uniform by architectural choice, and cost scales so close to linearly with size that it deserves to be called linear scaling.

Links

References (1998)

FAQ (1998)

By Linda Von Schweber
& Erick Von Schweber
Copyright 1996-2004 by Infomaniacs. All Rights Reserved.
Updated May 28, 2003