Object Based Semantic NetworksTM
Toward a Fusion of Knowledge
Representation and the Relational Model
September 21, 2000
Executive
Summary: Organizations
today are experiencing an accelerating need to capture more
of the meaning present in data, be it in data warehousing, data
mining, web personalization, knowledge management, or intelligent
user interfaces catering to a new and rapidly expanding community
of users. The task of capturing meaning is typically left to
the application developer without support from underlying systems.
We consider what it will take for the approaches to knowledge
representation and inferencing, first developed in AI and robotics,
to be fused with the Relational Model of data at a fundamental
level. The solution, which we call Object Based Semantic Networks,
bridges and resolves a deep dichotomy of the relational philosophy,
namely that between data and metadata. Applying this approach
could lead to system software that directly supports enterprise-class
management of semantic information and knowledge.
|
Introduction
Researchers
in artificial intelligence have proposed and developed laboratory
and “point” solutions for capturing and representing real-world
knowledge for over 3 decades, yet these technologies have been glacially
slow to infiltrate the world of commercial products. This is surprising
considering the ongoing need to capture ever more knowledge of the
enterprise - its entities, attributes, processes, and actions -
all within a robust, industry standard framework that is accessible
to numerous and multifarious applications and employing commodity-off-the-shelf
software and systems. Ted Codd observed this trend in the late 1970’s,
acknowledging that the quest to capture more meaning is never ending,
and responded with RM/T, the extended Relational Model.
Today,
the need to extract and manage knowledge on a global scale is accelerating,
with nary a fundamental solution in sight. This author is of the
opinion that Ted was absolutely correct in his choice of strategy
- to push more of the knowledge down into the system technology
that manages our data, namely the RDBMS.
Beginning
in the early to mid-1980’s we began an assessment of AI knowledge
representation systems, searching for their strengths and weaknesses.
As our project progressed we discovered significant limitations
to these systems, of both a design and implementation nature. It
became clear that properties foundational to the Relational Model,
and of great value, were totally absent from the AI-inspired schemes.
An excellent example of one such foundational property is the Relational
Model’s basis in values for relation variables - and nothing but
values. Contrast this with the fundamental dichotomy imposed by
semantic networks and frame systems - nodes for concepts, instances,
and values vs. edges for relationships and properties.
As
this endeavor progressed we sought strategies for abstracting the
best qualities and principles of AI knowledge representation from
their research-oriented implementations, in order to reinstantiate these qualities using Relational
principles. This is not to say that we considered Relational to
be flawless - it too had its dichotomies and tradeoffs, e.g., the
distinction between data and metadata and the caste system this
created in terms of tools and roles (how far apart be the DBA and
the data entry clerk, but I digress).
So, off to
find synergy!
It
was gratifying to find that we could design and implement semantic
networks on top of a Relational footing, even supporting very advanced
features such as inheritance with exceptions using non-monotonic
reasoning and disambiguation of requests stated in natural language.
We first called these systems Value Based Semantic Networks, later
generalized to Object Based Semantic Networks. This, however, required
sophisticated application logic that was not an integral part of
the DBMS - rather it was supported by the application managing the
knowledge, relying on the RDBMS to implement set theory and carry
out its operations. This also meant that specific optimizations
could not be made because the RDBMS had no awareness that it was
managing data and knowledge-data (as opposed to only data).
On the bright side, we were able to implement all of the required
application logic declaratively in SQL (yes, in SQL, despite its
shortcomings).
But
what would it take to achieve these same ends within the RDBMS,
as opposed to atop it?
|
Objective
Researchers in knowledge representation
have identified many very powerful concepts. Below, I present three
of these ideas, all intimately related, by way of example. I then
outline how one approach to infusing the Relational Model with more
meaning, Codd’s RM/T, prevents the realization of these ideas directly
within the Relational context. Lastly, I sketch a direction to instantiate
these concepts using Relational principles. Consider it a demonstration
that such is indeed possible (not that this is the only way to go).
I encourage educated analysis and critique of this effort. Our hope
is to get us moving effectively and surely down this road without
further delay.
Three valuable
interrelated ideas from knowledge representation
Data unified
with metadata
A
semantic network is a level playing field for abstract concepts
and concrete instances - both can be attributed, enter into relationships,
and be created, modified, and managed with the very same tools and
interfaces. In contrast to RDBMSes (and importantly the tools and
culture that developed around them), which enforce a distinction
between relations of application data and the catalog relations
that manage data about the application data, semantic networks are
a flat structure where data, metadata, meta-metadata, and so forth,
form one continuous braid. An AI system may initially identify an
entity as an instance and only later, having expanded its context,
reclassify the entity as an abstraction. For example, a visual artifact
may at first be classified by an autonomous mobile robot only as
an entity. Then, as the robot’s context expands through exploration
of its environment, the entity is reclassified as a blockade.
Then with further exploration, furniture. Then a table.
Then a coffee table. This evolution presents no problem for
a semantic network - its levels of abstraction are fluid, not rigid.
In all fairness I must add that in the Relational Model both data
and metadata, while divided and so labeled, are both managed as
relations. But as we shall see later in this paper this treatment
does not in itself accomplish our objective.
Abstract concepts
can be attributed and enter into relationships (the equal of concrete
instances and treated uniformly)
A
conceptual abstraction, represented by a concept node in a semantic
network, can be directly attributed. For example, a concept for
elephant can be attributed with the property that an elephant
has an anatomical part called a trunk. This is accomplished
with the same straightforwardness as attributing a concrete instance
with a property, such as that Clyde was born in December of 1970.
Once the concept is attributed all of its instances are inferred
to possess the same attribute. So if Clyde is an instance of the
concept elephant, and elephants have the property of having
a trunk, then Clyde is inferred to have a trunk, without
the need to explicitly represent that Clyde has a trunk. Furthermore,
that Clyde’s elephant friend Bonnie possesses a trunk, and Clyde’s
father and mother have trunks, and so on.
Efficient inheritance
with support for exceptions
Let’s
add a bit of additional detail to the previous example. Suppose
we wish to represent that Clyde’s body, as an elephant, carries
out the process of cellular respiration. Rather than associate this
property directly with Clyde, a concrete instance, we observe that
elephants are mammals, which in turn are vertebrates, and so on
up the phylogenetic tree, until we come to animals, which are a
form of living organism. We note that every living organism maintains
a metabolic process, but in the case of animals this process is
cellular respiration (as opposed to, say, photosynthesis for plant-based
life or fermentation for some forms of single celled organisms).
This
achieves a logical simplicity and captures the functional dependency,
animal > metabolic activity = cellular respiration, once
and only once. From a Relational perspective one can think of this
as a form of conceptual normalization.
But
something is lacking. What if Clyde is unique, the product of a
fiendish experiment by a mad elephant scientist, named Dr. Frankenphant,
who has genetically altered Clyde’s biochemistry and transformed
him into a Cyber-elephant, employing cold fusion in each of Clyde’s
mitochondria, thereby displacing respiration? How do we override
the inference that Clyde respirates? As it happens we do not need
to go to such bizarre extremes with our example to discover this
commonplace type of problem.
|
Let’s
say that Clyde is not a run of the mill elephant. No, he’s not the
product of Dr. Frankenphant’s twisted genius. Rather, Clyde is a
royal elephant. And royal elephants are white. So while it is generally
correct to infer that elephants in general are gray things it would
be incorrect to infer that Clyde is a gray thing as he’s a royal
elephant and royal elephants are white things. (As an aside, notice
that this takes us into the fields of AI research known as default
reasoning and prototypes.) A general purpose semantic network can
be expanded to support exceptions to default inference as the figure
below illustrates.
In
this extended semantic network the inference engine discovers multiple
inferential paths.
- One
path concludes that Clyde is a gray thing, reasoning that Clyde
is a royal elephant, and royal elephants are elephants,
and elephants possess the property that they are gray,
and this path has an inference length of 3.
- Another
path concludes that Clyde is white, reasoning that Clyde is
a royal elephant, and royal elephants possess the property
that they are white, and this path has an inference length
of 2.
- Still
another path concludes that Clyde is not a gray thing, reasoning
that Clyde is a royal elephant, and royal elephants
do not possess the property that they are of color gray
(via an exception), and this path has an inference length of 2.
In
this fashion we retain the ability to effect conceptual normalization
- representing an assertion at the most general level possible -
while handling the oddball exceptions to the rule.
Taken
together, these three ideas from knowledge representation make for
a powerful system for modeling our knowledge of the world. But how
are we to take advantage of such techniques across commercial systems?
The next section examines just a couple of the limitation that emerge
when we attempt to utilize the facilities of an extended form of
the Relational Model.
Aside: There
is indeed much more to this story, including numerous complexities
for handling the inheritance of general relationships with support
for multiple inheritance and exceptions. An excellent reference
is David Touretzky’s “Mathematics of Inheritance Systems”, a formalization
of Scott Falhman’s NETL system for representing and using real
world knowledge (it is interesting to note that it was Scott Fahlman’s
work that inspired Danny Hillis to design and build the Connection
Machine).
|
Incompatibilities
between knowledge representation and an extended Relational Model
Codd
intended for RM/T to explicitly capture more of the meaning present
in the data in the database (as opposed to capturing it in
an application). One of the mechanisms provided to support this
is entity supertypes/subtypes. In RM/T an entity is represented
by the occurrence of exactly one tuple in a relation that represents
the entity’s type. For example, the entity in question, say a salesman
named Mike, could be represented by a tuple in a relation named
Salespeople that represents the entity type for salesperson.
But this is just the beginning. Perhaps we wish to capture that
salespeople are a specific kind of employees. Since employees constitute
an entity type, and are consequently represented by a relation named
Employees, it is possible in RM/T to define the entity type
Salespeople as a subtype of the entity type Employees
(and conversely that the entity type Employees is a supertype
of the entity type Salespeople).
Note:
This begins to come perilously close to the supertable/subtable
concept that Chris Date has convincingly argued is both questionable
and unnecessary, but in the case of RM/T it is the entity types
that are in a super/sub relationship and not tables per se.
Were
we to attempt to reinstantiate the Clyde example in RM/T, we would
find a tuple for Clyde in every relation for every entity type,
all the way up to living organism (and beyond). Further, we could
not simply note the fact that animals employ cellular respiration
as their metabolism once and once only, but would likely need to
represent this fact as an attribute value, once for each and every
animal, each represented by a tuple, in a relation that represents
the entity type living organism. That’s a lot of repetition!
Going
further, this author does not see how it would be possible to override
the explicit representation of properties, such as Clyde’s color.
Consider the two following options to handle this situation in RM/T.
Option 1:
Clyde is an entity of type Elephant, which would be a subtype
of type Gray Thing. A tuple representing Clyde then belongs
to a relation named Elephants and a related tuple, also
representing Clyde, belongs to a relation named GrayThings.
Option 2:
Clyde is an entity of type Royal Elephant, which would
be a subtype of the entity type WhiteThing. A tuple representing
Clyde then must belong to a relation named RoyalElephants
and a related tuple, also representing Clyde, belongs to a relation
named WhiteThings.
|
Object based
Semantic NetworksTM - A New Direction compatible with Relational
Principles
Clearly, RM/T, and the Relational
Model in general, are based on classical logic on which classical
set theory rests, and these models of data exist unaware of default
logics and non-monotonic reasoning. But this author questions whether
the structural component of a data model must mandate the logic,
which may be more appropriately enforced by the manipulative and
integrity constraint components of the model. Perhaps the structural
component can be sufficiently general so as to support non-classical
logic, and therefore extended forms of inference and reasoning.
o
Representation 1 - entity-as-type:
a relation whose tuples are instances of the entity type.
o
Representation 2 - entity-as-instance:
one (or more) tuples where each tuple belongs to a relation representing
a supertype of the entity.
o
entity-as-type: Royal Elephant would
be a relation that contains tuples representing instances
of royal elephants (including Clyde)
o
entity-as-instance: Royal Elephant
would also be a tuple that belongs to a relation representing
the entity type Elephant.
o
entity-as-type: Elephant would be
a relation that contains tuples representing subtypes of
elephants
o
entity-as-instance: Elephant would
also be a tuple that represents the elephant as a subtype
of mammal (by belonging to a relation that is an entity-as-type
representationsa of mammal).
|
Conclusion
Information
UNBOUND is produced by Infomaniacs.
(C)
Infomaniacs 1998. All Rights Reserved.
|
|