At the same time, within the database community, it
was recognized that there were several shortcomings
with commercial implementations of relational databases.
These included (but were not limited to) the following.
To get a better grasp of the advantages offered by OODBMSs and ORDBMSs
we must first understand the "problems" in SQL implementations of
the relational model.
Let's redraw the ACME tuple given above as it might be represented
in a data structure. The arrows indicate pointers.
So each Supplier has a Name and a parts field which is a pointer
to a Parts-list value. The Parts-list value is in turn a
list of pointers, each pointer points to an individual Parts
tuple, which is a structure of two atomic fields: PartName and Place.
Supplier Parts-list Parts
-------- --------- ------------------
|ACME | | part1 ------> | liquor | Paris |
| | | | ------------------
| parts-----------> | | ------------------
| | | part2 ------> | engine | Rome |
| | | | ------------------
-------- ---------
So current implemenations of RDBs (there is some debate as to
whether or not this restriction of simple domains applies to
the relational model itself)
restrict attribute values to simple domains.
To store the above information in a RDB, we would have to
un-nest the nonsimple attribute values, perhaps as follows.
If SQL supported abstract data types, the user could directly model
data values with complicated structure and operations.
Other issues to consider when adding an ADT are the following.
The following composite types are supported in SQL3.
The usual operators are supported on these types.
For instance setof permits intersection, union, difference, etc.
Also, aggregate operations: COUNT, SUM, MAX, etc.
are supported.
Finally, generators, are supported.
A generator is a 'looping constructs' that generates
each item in a collection.
So how do we use row?
First, we have to define what a row looks like.
We can use the ROW in one of two ways. First we could
useby setting up a column that contains ROWs directly.
In SQL3, to create a table with OID support we use the following
syntax.
History of OODBMS
Object-oriented database management systems (OODBMSs)
first started to be researched in the 1980s. At that time,
there were a few Object-Oriented Programming languages,
such as Smalltalk and C++. By the late 1980s and early
1990s it was recognized that O-O languages had some
very strong advantages over non-O-O languages for many
programming applications.
Primarly these were
But programmers also want to make objects persistent,
that is, object lifetimes should extend beyond the
single execution of a program.
OODBMS
Hellerstein (very insightfully) suggests that
an Object-Oriented Database Management System (OODBMS) can be seen as
trying "to add DBMS capabilities to an O-O language", in effect making
it a persistent O-O language. Consequently application programmers
who use OODBMSs typically write programs in a native O-O language
such as Java or C++, and the language has
some kind of Persistent class, Database class,
Database Interface, or Database API that provides DBMS functionality
as, effectively, an extension of the O-O language.
There are several OODBMS vendors, but persistent versions of O-O languages
currently have limited popularity.
ORDBMS
Hellerstein characterises
an Object-Relational Database Management System (ORDBMS) as
a system that "extends a relational database with O-O features".
In particular ORDBMSs offer
There are several ORDBMS vendors.
non-1NF Data
Consider the table shown below.
supplier | partToPlace
---------------------
ACME | part | place
| ------|------
| liquor| Paris
| engine| Paris
|
----------|----------------
DELCO | part | place
| ------|------
| engine| Rome
....
This table could not be stored directly in a RDB because the
partToPlace attribute has values which are from a nonsimple
domain, that is, a domain of composite values.
You should think of a value in a nonsimple domain as being
composed of smaller, distinct values. Common nonsimple domains
values include
lists, sets, arrays, bags, tuples, etc. In contrast values from
simple domains are atomic. Common simple domain values include
integers, chars, strings, floats, etc.
The problems, from a relational database perspective, with this
kind of composite value are listed below.
supplier | part | place
--------------------------
ACME | liquor | Paris
ACME | engine | Paris
DELCO | engine | Rome
....
But this is a less than ideal situation since we have lost the
"natural" structure in the data, and we have (needlessly) duplicated
some values, e.g., ACME appears twice in the above table.
Need for Abstract Data Types
A second problem with current RDBs is that they have a very limited
and restrictive type model. Stonebraker gives an excellent example
to illustrate this point. Assume that we want to store a RECTANGLE
in a database. We could store the RECTANGLE by storing the following
four (integer) coordinates: upperLeftX, upperLeftY,
lowerRightX, lowerRightY.
Given this storage structure suppose we want to query whether
a pair of rectangles, one each from relation R and S,
overlap.
After a little thinking we come up with
the following SQL query.
SELECT *
FROM R, S
WHERE NOT (R.upperLeftX > S.lowerRightX OR
S.upperLeftX > R.lowerRightY OR
R.upperLeftY < S.lowerRightY OR
S.upperLeftY < R.lowerRightY);
From a programming point of view,
it would be cleaner if we could write the following program instead.
SELECT *
FROM R, S
WHERE overlaps(R, S);
The latter program is also more amenable to query optimization,
and supports code reuse.
For example, suppose R and S were POLYGONS or CIRCLES rather
than RECTANGLES. The former query would break, but not the latter.
O-O features in SQL3
Assume that we have our standard student world, where students
are enrolled in various subjects and we decide that we would
like to have a photo of each student.
Below we give an SQL3 statement for creating the table and
we then discuss the nonSQL2 aspects of this statement.
CREATE TABLE Student
(name VARCHAR(30),
image GIF,
phone setof{INTEGER},
);
The two nonSQL2 features are
Let's consider each of these in turn.
ADTs
An ADT is a data structure and a set of available operations on that
structure.
To create an ADT the user first creates a new ABSTRACT TYPE.
CREATE ABSTRACT DATA TYPE <ADTname>
ADTS must support the following three methods which are part of the
ADT definition.
(internalLength = <size>, input = <import>, output = <export>);
Only the operations are visible outside of the ADT.
Users register these operations with the DBMS using the following
SQL3 syntax.
CREATE FUNCTION <function name>(<list of parameter types>) RETURNS
<type>
This statment informs the DBMS that
the function, <function name>,
AS EXTERNAL NAME <object library>;
My guess is that only the DBA can define new ADTs and will only load
`trusted' code to implement those ADTs.
A GIF ADT:
First let's let the DBMS know that there is a new ADT called GIF.
CREATE ABSTRACT DATA TYPE GIF (
internalLength = VARIABLE,
input = serializeGIF,
output = unserializeGIF,
);
Next, let's register some GIF operations with the DBMS
CREATE FUNCTION size(GIF) RETURNS INTEGER
AS EXTERNAL NAME '/usr/lib/giflibrary.o';
CREATE FUNCTION hasBrownHair(GIF) RETURNS BOOLEAN
AS EXTERNAL NAME '/usr/lib/giflibrary.o';
CREATE FUNCTION sameHairStyle(GIF, GIF) RETURNS BOOLEAN
AS EXTERNAL NAME '/usr/lib/giflibrary.o';
The size operation returns the size of a GIF, hasBrownHair
returns whether or not the image is of a brown-haired person, while
sameHairStyle automagically determines whether two images have
the same hair style (OK, the last two are pretty magical operations).
We assume here that all the operations on a GIF are implemented in
the object code library /usr/lib/giflibrary.o (if not, we'll presume
the DBMS will warn us). All this operation does is let the DBMS know
what operation is available on a GIF, what the parameters are, and
what kind of value it returns. We could now use the 'size' operation
in a query as follows (find students who have brown hair).
SELECT *
FROM student
WHERE hasBrownHair(image);
non-1NF
Only the latter really need explanation. A table can be viewed as
a list, set, or bag of rows. The row type established a field
that points to, or ranges over rows in some other table.
We will see below how to use row.
Which students share a phone number?:
List the student names and their shared phone numbers.
SELECT A.name, B.name, A.phone INTERSECT B.phone
FROM student AS A, student AS B
WHERE A.phone INTERSECT B.phone AND A.name <> B.name;
Retrieve students that have more than two phone numbers:
SELECT *
FROM student
WHERE COUNT(phone) > 2;
A 1NF table of students and their phone numbers:
We generate each phone number. This query uses a set_gen
generator.
SELECT name, set_gen(phone)
FROM student;
CREATE ROW TYPE subjectRow
(code CHAR(6),
lecturer VARCHAR(20),
phone SETOF(INTEGER));
The ROW definition can be thought of as a record or struct
definition. Or we could think of is as the structure
of each tuple in some relation, where a relation is a
(multi)set of ROWs.
CREATE TABLE enrolledIn
(studentId INTEGER,
subject SETOF(ROW(subjectRow))
);
Alternatively we could set up an attribute that REFerences
ROWs in some other relation, because really a ROW is just
the structure of a single tuple.
Let's set up an attribute that REFerences ROWs in another table.
CREATE TABLE enrolledIn
(studentId INTEGER,
subject ROW(subjectRow) REF subjectTable
);
Effectively, what this does is store the OIDs for objects in the
subjectTable in the subject attribute. In other words, it is
almost the same as if the subject attribute is a foreign key into
the subjectTable, it is really just that it is implemented using
OIDs. To get values for subject we would have to dereference
the subject using the familar notation to get fields from a record.
We use the OID like we would use a pointer in C or C++!
Who is in cp2003?:
SELECT name
FROM student
WHERE student.deref(subject).code = 'cp2003';
or
SELECT name
FROM student
WHERE student.subject->code = 'cp2003';
OIDs
An object identifier (OID) is a persistent handle or name for
a particular object. Generally, OIDs are 32 or 64 bit integers that
are managed by the DBMS (32 bit ints can be swizzled on most
machines without wasting space). When a new object is created the
DBMSs table of allocated OIDs is updated with the new association between
the OID and the object (and where it resides on disk!). When an object
is deleted the table is updated to reflect that an OID has become free
for reuse (reference counts are commonly used to support reclamantion
of unused OIDs).
CREATE TABLE <table name> WITH IDENTITY ...
Equality
Since individual attribute values have structure, there are two
notions of equality.
Note that shallow equality implies deep equality but not vice-versa.