The Ghost of sixties
DP
Cobol,
JCL, punch card readers, and the backlog – not all remembrances
of the sixties are fond nostalgia. Unfortunately, a serious issue,
first felt in sixties data processing circles, has once again reared
its ugly head. The most pernicious problem with sixties DP was the
need to resort to low level, procedural programming approaches to
accomplish, well, nearly anything.
What
do I mean by low level procedural approaches? I mean the need to
specify not just what
results you want of an operation but also how
the results are to be achieved, often by way of an iterative or
recursive procedure, and typically described at a low level of abstraction,
e.g., explicitly moving bits to and from registers when the goal
is to copy a file between devices. Low level procedural methods
stand in stark contrast to high level declarative methods where
you describe what the result should be at a level of abstraction
appropriate to the problem, e.g., a SQL statement like “Select clone_date,
generation_num, sex from Workers where worker_id = ‘THX1138’ ”.
Back
to the sixties. The need to employ low level procedural methods
for every task large and small in turn meant that any function that
had not already been implemented and scheduled to run simply had
to wait – a week, a month, a year - sometimes an eternity. Ted Codd,
then at IBM, realized that low level, procedural approaches were
at the root of the applications and maintenance backlog, and devised
the relational model of data to address the issue.
Ted
was phenomenally successful, both in his analysis of the problem
and in the development of a solution. Today, with a point and a
click, we can construct complex reports drawing on relational data
spanning numerous physical devices with little or no awareness of
the complex, low level procedural gyrations performed beneath the
covers that make the magic happen. This is good. Even better is
that we can fuse data from multiple departmental systems (provided
the systems are of like kind from the same vendor – more on this
in a moment), say for a new enterprise application, with little
more trouble than the previously considered report.
But
watch what happens when we need to tie disparate systems from one
company (or department!) together with another’s, as in automating
the supply chain for business-to-business e-commerce, or even fusing
systems within an enterprise, as in EAI. Before you know it someone’s
brought in the middleware - TP monitors, distributed objects, messaging,
perhaps all of them - and we’re right back to low level procedural
coding (this time, it must be admitted, with an object twist that
eases the maintenance tasks that will ultimately be required).
This
is not so good. Even with great tools, low level procedural coding
demands skilled programmers who are both costly and in short supply.
Additionally, the use of object or messaging middleware requires
talented software architects who are even more costly and in even
shorter supply. Then on top of all this we frequently add new data
types, including media, and complex data structures, found for example
in the nested parts assemblies needed to support mass customization.
These accelerate the return to low level procedural methods (I was
tempted to say “accelerate the return to the Dark Side” but I restrained
myself :-). The table below provides a point-by-point summary of
then versus now.
Many of
the data processing issues of the sixties that inspired Ted Codd
to create the relational model of data have reappeared in the 90’s
in a new guise.
|
So,
nearly every time we seek to integrate disparate information resources,
support real world distribution of data, redistribute processing,
cater to untrained users, manage knowledge, or deliver adequate
performance in the face of complexity, we end up back in bed with
expensive low level procedural approaches. The ghosts of sixties
DP haunt us still.
Companies
are in a massive race today to exploit information resources, mine
opportunities for partnering, alliances, mergers, and convergence,
and approach new markets typically filled with totally naïve users.
The systems we want to integrate are far larger, more complex, and
more numerous than the corporate and departmental systems of the
past. While low level procedural methods remain warranted for software
engineering and for specific applications (e.g., embedded systems
that must exploit a special purpose processor), generally what we
need is a high level, non-procedural alternative for the bulk of
new development including e-commerce, EAI, ERP, and CRM.
If
Ted Codd was right in his assessment in the sixties - and he was
very right judging by the unparalleled success of relational database
and tool vendors and corporate America’s massive buy in - then this
same assessment is even more valid today as we prepare to enter
the next millennium.
A reapplication
of relational principles to the distributed and extended applications
of today would go a long way towards addressing the problem
Imagine
for a moment that all the world ran the same vendor’s DBMS (Just
pick your favorite here. After all, this is a fantasy!). Need an application at
your company to talk to an application at your new customer or supplier?
No problem. You wouldn’t have to write any 3GL code (not even Java),
no ODBC, no middleware, no XML – just use SQL to declaratively update
your data dictionary and your trading partner’s to create a distributed
database (add appropriate security privileges and appropriate relational
views to provide the required semantic and syntactic conversions)
and you’re in e-business. What a lovely dream. But then you wake
up to find that distributed database operations receive only marginal
support when even two vendor’s DBMSs are involved in the distribution
scheme.
Despite
this limitation our dreamy excursion does serve to make an important
point: the relational model provides a high level declarative interface
to data and functionality, effectively hiding the low level procedural
details below, even in distributed systems where WANs and LANs may
separate physical data sources and numerous connectivity schemes
may be at work on multiple levels to give the impression that the
whole affair is seamless. While distributed object infrastructures
like CORBA hide the details of many layers of network protocols
beneath an object interface, the declarative, relational interface
hides even the procedural details associated with object operations.
So the relational approach does display advantages, even if those
advantages are theoretical at this point in our discussion (due
to the limits of shipping relational DBMSs).
A
less pie in the sky but none the less appealing approach is to push
vendors to open up their DBMS architectures and support a wide selection
of today’s connectivity tools (e.g., middleware) under the covers (meaning beneath the relational interface), thereby
supporting heterogeneous distributed databases as Codd intended.
You’d never have to wake up from your sweet fantasy. Obviously much
more is involved here, including QoS issues, but I’ll ignore those
for now since we’re dreaming again, but this time we’re not so deeply
asleep. From a study of Oracle’s architectural evolution over the
past three years we can speculate that this architectural transmogrification
may already be in process at Oracle Corporation, but likely for
vastly different reasons.
It’s
also possible that a new kind of software, we can call it systemware
for the moment, could employ middleware in a modular fashion to
fuse heterogeneous software subsystems (like DBMSs from multiple
vendors, TP monitors, Java virtual machines, application servers,
etc.) producing a distributed system that responds to high level
declarative commands. Plenty of attention would have to be paid
“under the covers” to implementations of systemware - issues like
data caching and command optimization would be critical to deliver
acceptable performance. (One can alternatively think of this approach
as creating a meta-DBMS across the constituent DBMSs and other components
– like a federated database but with a high level single system
image.)
Compared
with the previous two approaches, which only address the issue of
distributed data management, the systemware approach covers this
base and in addition addresses the issue of high level declarative
system integration. This pretty much remains untouched by the relational
approach as do high level declarative techniques for dealing with
complex data while maintaining high performance and support for
knowledge management,
More is needed
to exorcise these ghosts and put them to rest once and for all
Applications
that practically write themselves: Application
development is still too time consuming and too demanding of difficult
to find and retain talent. Relational normalization breaks up application-level
data structures into elemental data atoms (a process vital to insuring
data integrity) but unfortunately fails to provide the means to
reassemble these back into the data molecules and higher level non-relational
structures (technically speaking, non-first normal forms) needed
at the application level. The conceptual layer of a relational DBMS
- the layer where relational views reside - provides a hint of what
could be an answer as does Oracle’s Object Views, but despite scattered
research this has never been fully developed nor commercialized.
At Infomaniacs we’ve performed R&D in this area under the auspices
of “Relational Synthesis”
and consequently see an extended view mechanism as fertile ground
for supporting applications that practically write themselves (via
high level declarative techniques of course).
Knowledge
Management: Our systems need to manage more than a simple layer
of metadata. Consumer oriented e-commerce requires managing complex
knowledge thickets to enable searchers to easily find, customize,
and purchase what they want (and conversely for businesses to find
and target the customers they want). Business-to-Business e-commerce
demands managing complex knowledge for inter-enterprise systems
integration that must be deployed at an ever increasing rate (How
quickly must this eventually happen? In time B2B integration should
be dynamic, based on an automated process of self-discovery, or
so we think. That’s fast!). EAI creates similar demands. The relational
model of data was never intended to manage complex knowledge, and
lacking modification, presents serious deficiencies and inefficiencies
for doing so (one could argue that a whole new variety of data normalization
techniques are needed to identify functional dependencies and remove
data redundancies related to inheritance hierarchies – a core component
of knowledge management). Can sophisticated artificial intelligence
and knowledge representation systems be hybridized with the relational
model of data? We know they can because we’ve done just that since
the ‘80s.
Performance
and Ease of Use: Many developers have been forced into trading
ease of use for performance, returning to hierarchical and network
data management, now in the guise of object databases designed to
manage complex data. We need the performance of object databases
coupled with the non-procedural access and the ease of use of relational
systems. This is possible but requires thinking outside of both
the relational and object database boxes, something that both camps
appear unwilling or unable to do in earnest. It is a common misconception
that Codd’s rules of relational databases forbid the use of linked
object structures, such as hierarchies, which can improve the performance
of certain query and update operations. In fact, fully relational
DBMSs can exploit numerous non-first normal form data structures
at the physical level to achieve performance, e.g., nested structures
where detail tuples are clustered with their corresponding header
tuple. What the relational rules forbid is exposing these physical
details “at the logical level” because this would have the consequence
of forcing users and programs back into a low level procedural mode
to do their job (that’s the job of the relational optimizer).
Taking an Active
approach
The
toughest challenge facing us now is to realize (and acknowledge)
that in our rush to embrace the new opportunities presented by this
inter-networked world we’ve forgotten a hard won lesson. Today,
many are just plowing ahead with object orientation, distributed
objects, Java, message oriented middleware, XML, and a list of tools
too long to present here. These technologies have their place –
they are powerful software engineering tools for the infrastructure,
to join the ranks of optimizing compilers, dynamic linking, and
inference engines. But none is suitable to serve as a general purpose
application architecture.
The
lesson is simple and straightforward: We need to develop, operate, and maintain distributed,
heterogeneous systems by way of high level declarative methods.
It’s not time to throw away the relational philosophy, or even relegate
it to but one subsystem among many (the database). It’s time to
push relational for all it’s worth, leveraging our great big shiny
new toolbox in the process. Indeed, without this powerful toolbox
we would be unable to proceed more than a few steps in our newly
charted direction.
From
whence forth do we begin this journey? Clearly we need a new paradigm
for applications, a new application architecture that can integrate
the required categories of components and provide the required services
but do this while presenting a system image that can be interrogated
and manipulated through high level declarative methods. At Infomaniacs
we’ve been conceptualizing such a beast. We call it Active Models.
Think of a componentized, distributed Object/Relational DBMS that
has evolved to manage behavior as well as state and employs middleware
in the same way today’s DBMSs support a wide range of network protocols.
We
recently applied Active Models in a architectural study for a client
to support real-time multimedia collaboration of teams distributed
across the globe and Active Models proved very effective (compared
with client/server and n-tier architectures). Active Models has
the potential to manage information across products from multiple
vendors, dynamically redistribute data and processing over the network,
deliver applications that practically write themselves, support
knowledge management, and manage complex data with high performance
and ease of use – all from a high level declarative interface. These
being the very goals we established for ourselves at the outset
we consider Active Models worthy of further research and development.
The
ghosts of sixties data processing won’t vanish by themselves and
they’re costing us more every day we let them spook us. If we’ve
piqued your interest and your enterprise could benefit from early
involvement in the development of Active Models please contact Infomaniacs
to discuss partnerships or engagements. If history does in fact
repeat itself - and by all appearances it is doing so presently
- then Active Models could become a major force in IT.
Erick
Von Schweber
Information
UNBOUND is produced by Infomaniacs.
(C)
Infomaniacs 1998. All Rights Reserved.
|