Information Unbound #4 - by Erick Von Schweber

Information Unbound Archive 1998 - 2000
An On-line Column by Erick Von Schweber

# 1: Microsoft's Millennium approach to Computing Fabrics presents weighty challenge to CORBA and Java

# 2: Computing Fabrics compared with distributed and parallel software technologies

# 3: Computing Fabrics - The BIGGER Picture
(a 4 part series)

# 3-1: Computing Fabrics extend the cell phone paradigm

# 3-2:
Virtual Personal Supercomputers (VPSCs)

# 3-3:
Computing Fabrics and next generation Interface Devices
# 3-4:
Hyper-Reality - A Wild and Crazy GUI

# 4: Is History Repeating Itself?
The low-level procedurality of the past returns to haunt IT.

# 5: Object Based Semantic NetworksTM
Toward a Fusion of Knowledge Representation and the Relational Model

Information UNBOUND # 4

Is history repeating itself? The low-level procedurality of the past returns to haunt IT November 1, 1999 *Executive Summary:* A serious data processing issue, first observed in the sixties and later remedied by relational databases, has resurfaced in the nineties. The explosion of the Internet and the need to inter-network systems provoked this reappearance by necessitating low level procedural approaches to support systems integration, distributivity, and to handle complexity. This return to low level procedural approaches has once again severely cut IT productivity and growth, with effects cascading to practically every industry that relies heavily on IT. A renewed application of relational principles to contemporary networked and distributed systems could go a long way towards a resolution, but even more is needed. We introduce Active Models as a potential solution.
The Ghost of sixties DP Cobol, JCL, punch card readers, and the backlog – not all remembrances of the sixties are fond nostalgia. Unfortunately, a serious issue, first felt in sixties data processing circles, has once again reared its ugly head. The most pernicious problem with sixties DP was the need to resort to low level, procedural programming approaches to accomplish, well, nearly anything. What do I mean by low level procedural approaches? I mean the need to specify not just what results you want of an operation but also how the results are to be achieved, often by way of an iterative or recursive procedure, and typically described at a low level of abstraction, e.g., explicitly moving bits to and from registers when the goal is to copy a file between devices. Low level procedural methods stand in stark contrast to high level declarative methods where you describe what the result should be at a level of abstraction appropriate to the problem, e.g., a SQL statement like “Select clone_date, generation_num, sex from Workers where worker_id = ‘THX1138’ ”. Back to the sixties. The need to employ low level procedural methods for every task large and small in turn meant that any function that had not already been implemented and scheduled to run simply had to wait – a week, a month, a year - sometimes an eternity. Ted Codd, then at IBM, realized that low level, procedural approaches were at the root of the applications and maintenance backlog, and devised the relational model of data to address the issue. Ted was phenomenally successful, both in his analysis of the problem and in the development of a solution. Today, with a point and a click, we can construct complex reports drawing on relational data spanning numerous physical devices with little or no awareness of the complex, low level procedural gyrations performed beneath the covers that make the magic happen. This is good. Even better is that we can fuse data from multiple departmental systems (provided the systems are of like kind from the same vendor – more on this in a moment), say for a new enterprise application, with little more trouble than the previously considered report. But watch what happens when we need to tie disparate systems from one company (or department!) together with another’s, as in automating the supply chain for business-to-business e-commerce, or even fusing systems within an enterprise, as in EAI. Before you know it someone’s brought in the middleware - TP monitors, distributed objects, messaging, perhaps all of them - and we’re right back to low level procedural coding (this time, it must be admitted, with an object twist that eases the maintenance tasks that will ultimately be required). This is not so good. Even with great tools, low level procedural coding demands skilled programmers who are both costly and in short supply. Additionally, the use of object or messaging middleware requires talented software architects who are even more costly and in even shorter supply. Then on top of all this we frequently add new data types, including media, and complex data structures, found for example in the nested parts assemblies needed to support mass customization. These accelerate the return to low level procedural methods (I was tempted to say “accelerate the return to the Dark Side” but I restrained myself :-). The table below provides a point-by-point summary of then versus now. Many of the data processing issues of the sixties that inspired Ted Codd to create the relational model of data have reappeared in the 90’s in a new guise.
Then	Now
Need to interconnect the systems of disparate departments (to create greater efficiencies and address new opportunities)	Need to interconnect the systems of disparate enterprises (to support business to business e-commerce, EAI, and new opportunities)
Interconnecting departmental systems requires 3^rd generation methods and coding (Cobol, PL/1)	Interconnecting enterprise systems requires 3^rd generation methods and coding (TP, distributed objects, messaging middleware, Java)
3^rd generation methods and coding are procedural and operate at a low level of abstraction – they take too long and require skilled analysts and programmers	New 3^rd generation methods and coding are procedural and operate at a low level of abstraction – they take too long and require skilled architects, analysts, and programmers
Need to develop and deploy applications faster and at lower cost (to relieve the maintenance backlog)	Need to develop and deploy applications faster and at lower cost (to keep up with competitors running on Internet time)
Interactive time sharing opens up systems to non-expert users requiring a higher level interface (than users of batch systems)	Web and Intranet access opens up systems to untrained users requiring a higher level interface and greater intelligence on the part of the systems (than trained users of single purpose enterprise systems)
On-line applications must support more varied data structures requiring a data dictionary for metadata management	Web and Intranet applications introduce complex data, knowledge management, and media support requirements

So, nearly every time we seek to integrate disparate information resources, support real world distribution of data, redistribute processing, cater to untrained users, manage knowledge, or deliver adequate performance in the face of complexity, we end up back in bed with expensive low level procedural approaches. The ghosts of sixties DP haunt us still. Companies are in a massive race today to exploit information resources, mine opportunities for partnering, alliances, mergers, and convergence, and approach new markets typically filled with totally naïve users. The systems we want to integrate are far larger, more complex, and more numerous than the corporate and departmental systems of the past. While low level procedural methods remain warranted for software engineering and for specific applications (e.g., embedded systems that must exploit a special purpose processor), generally what we need is a high level, non-procedural alternative for the bulk of new development including e-commerce, EAI, ERP, and CRM. If Ted Codd was right in his assessment in the sixties - and he was very right judging by the unparalleled success of relational database and tool vendors and corporate America’s massive buy in - then this same assessment is even more valid today as we prepare to enter the next millennium. A reapplication of relational principles to the distributed and extended applications of today would go a long way towards addressing the problem Imagine for a moment that all the world ran the same vendor’s DBMS (Just pick your favorite here. After all, this is a fantasy!). Need an application at your company to talk to an application at your new customer or supplier? No problem. You wouldn’t have to write any 3GL code (not even Java), no ODBC, no middleware, no XML – just use SQL to declaratively update your data dictionary and your trading partner’s to create a distributed database (add appropriate security privileges and appropriate relational views to provide the required semantic and syntactic conversions) and you’re in e-business. What a lovely dream. But then you wake up to find that distributed database operations receive only marginal support when even two vendor’s DBMSs are involved in the distribution scheme. Despite this limitation our dreamy excursion does serve to make an important point: the relational model provides a high level declarative interface to data and functionality, effectively hiding the low level procedural details below, even in distributed systems where WANs and LANs may separate physical data sources and numerous connectivity schemes may be at work on multiple levels to give the impression that the whole affair is seamless. While distributed object infrastructures like CORBA hide the details of many layers of network protocols beneath an object interface, the declarative, relational interface hides even the procedural details associated with object operations. So the relational approach does display advantages, even if those advantages are theoretical at this point in our discussion (due to the limits of shipping relational DBMSs). A less pie in the sky but none the less appealing approach is to push vendors to open up their DBMS architectures and support a wide selection of today’s connectivity tools (e.g., middleware) under the covers (meaning beneath the relational interface), thereby supporting heterogeneous distributed databases as Codd intended. You’d never have to wake up from your sweet fantasy. Obviously much more is involved here, including QoS issues, but I’ll ignore those for now since we’re dreaming again, but this time we’re not so deeply asleep. From a study of Oracle’s architectural evolution over the past three years we can speculate that this architectural transmogrification may already be in process at Oracle Corporation, but likely for vastly different reasons. It’s also possible that a new kind of software, we can call it systemware for the moment, could employ middleware in a modular fashion to fuse heterogeneous software subsystems (like DBMSs from multiple vendors, TP monitors, Java virtual machines, application servers, etc.) producing a distributed system that responds to high level declarative commands. Plenty of attention would have to be paid “under the covers” to implementations of systemware - issues like data caching and command optimization would be critical to deliver acceptable performance. (One can alternatively think of this approach as creating a meta-DBMS across the constituent DBMSs and other components – like a federated database but with a high level single system image.) Compared with the previous two approaches, which only address the issue of distributed data management, the systemware approach covers this base and in addition addresses the issue of high level declarative system integration. This pretty much remains untouched by the relational approach as do high level declarative techniques for dealing with complex data while maintaining high performance and support for knowledge management, More is needed to exorcise these ghosts and put them to rest once and for all Applications that practically write themselves: Application development is still too time consuming and too demanding of difficult to find and retain talent. Relational normalization breaks up application-level data structures into elemental data atoms (a process vital to insuring data integrity) but unfortunately fails to provide the means to reassemble these back into the data molecules and higher level non-relational structures (technically speaking, non-first normal forms) needed at the application level. The conceptual layer of a relational DBMS - the layer where relational views reside - provides a hint of what could be an answer as does Oracle’s Object Views, but despite scattered research this has never been fully developed nor commercialized. At Infomaniacs we’ve performed R&D in this area under the auspices of “Relational Synthesis” and consequently see an extended view mechanism as fertile ground for supporting applications that practically write themselves (via high level declarative techniques of course). Knowledge Management: Our systems need to manage more than a simple layer of metadata. Consumer oriented e-commerce requires managing complex knowledge thickets to enable searchers to easily find, customize, and purchase what they want (and conversely for businesses to find and target the customers they want). Business-to-Business e-commerce demands managing complex knowledge for inter-enterprise systems integration that must be deployed at an ever increasing rate (How quickly must this eventually happen? In time B2B integration should be dynamic, based on an automated process of self-discovery, or so we think. That’s fast!). EAI creates similar demands. The relational model of data was never intended to manage complex knowledge, and lacking modification, presents serious deficiencies and inefficiencies for doing so (one could argue that a whole new variety of data normalization techniques are needed to identify functional dependencies and remove data redundancies related to inheritance hierarchies – a core component of knowledge management). Can sophisticated artificial intelligence and knowledge representation systems be hybridized with the relational model of data? We know they can because we’ve done just that since the ‘80s. Performance and Ease of Use: Many developers have been forced into trading ease of use for performance, returning to hierarchical and network data management, now in the guise of object databases designed to manage complex data. We need the performance of object databases coupled with the non-procedural access and the ease of use of relational systems. This is possible but requires thinking outside of both the relational and object database boxes, something that both camps appear unwilling or unable to do in earnest. It is a common misconception that Codd’s rules of relational databases forbid the use of linked object structures, such as hierarchies, which can improve the performance of certain query and update operations. In fact, fully relational DBMSs can exploit numerous non-first normal form data structures at the physical level to achieve performance, e.g., nested structures where detail tuples are clustered with their corresponding header tuple. What the relational rules forbid is exposing these physical details “at the logical level” because this would have the consequence of forcing users and programs back into a low level procedural mode to do their job (that’s the job of the relational optimizer). Taking an Active approach The toughest challenge facing us now is to realize (and acknowledge) that in our rush to embrace the new opportunities presented by this inter-networked world we’ve forgotten a hard won lesson. Today, many are just plowing ahead with object orientation, distributed objects, Java, message oriented middleware, XML, and a list of tools too long to present here. These technologies have their place – they are powerful software engineering tools for the infrastructure, to join the ranks of optimizing compilers, dynamic linking, and inference engines. But none is suitable to serve as a general purpose application architecture. The lesson is simple and straightforward: We need to develop, operate, and maintain distributed, heterogeneous systems by way of high level declarative methods. It’s not time to throw away the relational philosophy, or even relegate it to but one subsystem among many (the database). It’s time to push relational for all it’s worth, leveraging our great big shiny new toolbox in the process. Indeed, without this powerful toolbox we would be unable to proceed more than a few steps in our newly charted direction. From whence forth do we begin this journey? Clearly we need a new paradigm for applications, a new application architecture that can integrate the required categories of components and provide the required services but do this while presenting a system image that can be interrogated and manipulated through high level declarative methods. At Infomaniacs we’ve been conceptualizing such a beast. We call it Active Models. Think of a componentized, distributed Object/Relational DBMS that has evolved to manage behavior as well as state and employs middleware in the same way today’s DBMSs support a wide range of network protocols. We recently applied Active Models in a architectural study for a client to support real-time multimedia collaboration of teams distributed across the globe and Active Models proved very effective (compared with client/server and n-tier architectures). Active Models has the potential to manage information across products from multiple vendors, dynamically redistribute data and processing over the network, deliver applications that practically write themselves, support knowledge management, and manage complex data with high performance and ease of use – all from a high level declarative interface. These being the very goals we established for ourselves at the outset we consider Active Models worthy of further research and development. The ghosts of sixties data processing won’t vanish by themselves and they’re costing us more every day we let them spook us. If we’ve piqued your interest and your enterprise could benefit from early involvement in the development of Active Models please contact Infomaniacs to discuss partnerships or engagements. If history does in fact repeat itself - and by all appearances it is doing so presently - then Active Models could become a major force in IT. Erick Von Schweber Information UNBOUND is produced by Infomaniacs. (C) Infomaniacs 1998. All Rights Reserved.

November 1, 1999

The Ghost of sixties DP

Many of the data processing issues of the sixties that inspired Ted Codd to create the relational model of data have reappeared in the 90’s in a new guise.

A reapplication of relational principles to the distributed and extended applications of today would go a long way towards addressing the problem

More is needed to exorcise these ghosts and put them to rest once and for all

Taking an Active approach