CP3020 Lecture Notes - Object-Oriented Databases


Table of Contents


History of OODBMS

Object-oriented database management systems (OODBMSs) first started to be researched in the 1980s. At that time, there were a few Object-Oriented Programming languages, such as Smalltalk and C++. By the late 1980s and early 1990s it was recognized that O-O languages had some very strong advantages over non-O-O languages for many programming applications. Primarly these were But programmers also want to make objects persistent, that is, object lifetimes should extend beyond the single execution of a program.

At the same time, within the database community, it was recognized that there were several shortcomings with commercial implementations of relational databases. These included (but were not limited to) the following.

OODBMS

Hellerstein (very insightfully) suggests that an Object-Oriented Database Management System (OODBMS) can be seen as trying "to add DBMS capabilities to an O-O language", in effect making it a persistent O-O language. Consequently application programmers who use OODBMSs typically write programs in a native O-O language such as Java or C++, and the language has some kind of Persistent class, Database class, Database Interface, or Database API that provides DBMS functionality as, effectively, an extension of the O-O language. There are several OODBMS vendors, but persistent versions of O-O languages currently have limited popularity.

ORDBMS

Hellerstein characterises an Object-Relational Database Management System (ORDBMS) as a system that "extends a relational database with O-O features". In particular ORDBMSs offer There are several ORDBMS vendors.

To get a better grasp of the advantages offered by OODBMSs and ORDBMSs we must first understand the "problems" in SQL implementations of the relational model.

non-1NF Data

Consider the table shown below.

  supplier | partToPlace
  ---------------------
   ACME    |  part | place
           | ------|------
           | liquor| Paris
           | engine| Paris
           |                 
 ----------|----------------
   DELCO   |  part | place
           | ------|------
           | engine| Rome 
   ....
This table could not be stored directly in a RDB because the partToPlace attribute has values which are from a nonsimple domain, that is, a domain of composite values. You should think of a value in a nonsimple domain as being composed of smaller, distinct values. Common nonsimple domains values include lists, sets, arrays, bags, tuples, etc. In contrast values from simple domains are atomic. Common simple domain values include integers, chars, strings, floats, etc.

Let's redraw the ACME tuple given above as it might be represented in a data structure. The arrows indicate pointers. So each Supplier has a Name and a parts field which is a pointer to a Parts-list value. The Parts-list value is in turn a list of pointers, each pointer points to an individual Parts tuple, which is a structure of two atomic fields: PartName and Place.


Supplier Parts-list Parts -------- --------- ------------------ |ACME | | part1 ------> | liquor | Paris | | | | | ------------------ | parts-----------> | | ------------------ | | | part2 ------> | engine | Rome | | | | | ------------------ -------- ---------

The problems, from a relational database perspective, with this kind of composite value are listed below.

So current implemenations of RDBs (there is some debate as to whether or not this restriction of simple domains applies to the relational model itself) restrict attribute values to simple domains. To store the above information in a RDB, we would have to un-nest the nonsimple attribute values, perhaps as follows.


  supplier | part   | place
  --------------------------
   ACME    | liquor | Paris
   ACME    | engine | Paris
   DELCO   | engine | Rome 
   ....
But this is a less than ideal situation since we have lost the "natural" structure in the data, and we have (needlessly) duplicated some values, e.g., ACME appears twice in the above table.

Need for Abstract Data Types

A second problem with current RDBs is that they have a very limited and restrictive type model. Stonebraker gives an excellent example to illustrate this point. Assume that we want to store a RECTANGLE in a database. We could store the RECTANGLE by storing the following four (integer) coordinates: upperLeftX, upperLeftY, lowerRightX, lowerRightY. Given this storage structure suppose we want to query whether a pair of rectangles, one each from relation R and S, overlap. After a little thinking we come up with the following SQL query.

    SELECT *
    FROM R, S
    WHERE NOT (R.upperLeftX > S.lowerRightX OR
               S.upperLeftX > R.lowerRightY OR
               R.upperLeftY < S.lowerRightY OR
               S.upperLeftY < R.lowerRightY);
From a programming point of view, it would be cleaner if we could write the following program instead.

    SELECT *
    FROM R, S
    WHERE overlaps(R, S);
The latter program is also more amenable to query optimization, and supports code reuse. For example, suppose R and S were POLYGONS or CIRCLES rather than RECTANGLES. The former query would break, but not the latter.

If SQL supported abstract data types, the user could directly model data values with complicated structure and operations.

O-O features in SQL3

Assume that we have our standard student world, where students are enrolled in various subjects and we decide that we would like to have a photo of each student. Below we give an SQL3 statement for creating the table and we then discuss the nonSQL2 aspects of this statement.

    CREATE TABLE Student 
       (name VARCHAR(30), 
        image GIF,
        phone setof{INTEGER},
       );
The two nonSQL2 features are
  1. GIF is an user-defined abstract data type, and
  2. phone is a set of integers, a non-1NF value.
Let's consider each of these in turn.

ADTs

An ADT is a data structure and a set of available operations on that structure. To create an ADT the user first creates a new ABSTRACT TYPE. ADTS must support the following three methods which are part of the ADT definition.
  1. size - how big it the object, it is stored in internalLength and could be VARIABLE if of variable size
  2. export - serialize the object, this is a method to create the object/ADT instance from text
  3. import - unserialize the object, convert the object into a string, say, for printing
Only the operations are visible outside of the ADT. Users register these operations with the DBMS using the following SQL3 syntax. This statment informs the DBMS that the function, <function name>, My guess is that only the DBA can define new ADTs and will only load `trusted' code to implement those ADTs.

A GIF ADT: First let's let the DBMS know that there is a new ADT called GIF.

    CREATE ABSTRACT DATA TYPE GIF ( 
       internalLength = VARIABLE, 
       input = serializeGIF,
       output = unserializeGIF,
       );
Next, let's register some GIF operations with the DBMS

    CREATE FUNCTION size(GIF) RETURNS INTEGER
       AS EXTERNAL NAME '/usr/lib/giflibrary.o';
    CREATE FUNCTION hasBrownHair(GIF) RETURNS BOOLEAN 
       AS EXTERNAL NAME '/usr/lib/giflibrary.o';
    CREATE FUNCTION sameHairStyle(GIF, GIF) RETURNS BOOLEAN 
       AS EXTERNAL NAME '/usr/lib/giflibrary.o';
The size operation returns the size of a GIF, hasBrownHair returns whether or not the image is of a brown-haired person, while sameHairStyle automagically determines whether two images have the same hair style (OK, the last two are pretty magical operations). We assume here that all the operations on a GIF are implemented in the object code library /usr/lib/giflibrary.o (if not, we'll presume the DBMS will warn us). All this operation does is let the DBMS know what operation is available on a GIF, what the parameters are, and what kind of value it returns. We could now use the 'size' operation in a query as follows (find students who have brown hair).

    SELECT *
    FROM student
    WHERE hasBrownHair(image);

Other issues to consider when adding an ADT are the following.

non-1NF

The following composite types are supported in SQL3.

Only the latter really need explanation. A table can be viewed as a list, set, or bag of rows. The row type established a field that points to, or ranges over rows in some other table. We will see below how to use row.

The usual operators are supported on these types. For instance setof permits intersection, union, difference, etc. Also, aggregate operations: COUNT, SUM, MAX, etc. are supported. Finally, generators, are supported. A generator is a 'looping constructs' that generates each item in a collection.

Which students share a phone number?: List the student names and their shared phone numbers.

  SELECT A.name, B.name, A.phone INTERSECT B.phone
  FROM student AS A, student AS B
  WHERE A.phone INTERSECT B.phone AND A.name <> B.name;

Retrieve students that have more than two phone numbers:

  SELECT *
  FROM student
  WHERE COUNT(phone) > 2;

A 1NF table of students and their phone numbers: We generate each phone number. This query uses a set_gen generator.

  SELECT name, set_gen(phone)
  FROM student;

So how do we use row? First, we have to define what a row looks like.


  CREATE ROW TYPE subjectRow
    (code  CHAR(6),
     lecturer VARCHAR(20),
     phone SETOF(INTEGER));
The ROW definition can be thought of as a record or struct definition. Or we could think of is as the structure of each tuple in some relation, where a relation is a (multi)set of ROWs.

We can use the ROW in one of two ways. First we could useby setting up a column that contains ROWs directly.


  CREATE TABLE enrolledIn 
    (studentId INTEGER,
     subject SETOF(ROW(subjectRow))
    );
Alternatively we could set up an attribute that REFerences ROWs in some other relation, because really a ROW is just the structure of a single tuple. Let's set up an attribute that REFerences ROWs in another table.

  CREATE TABLE enrolledIn 
    (studentId INTEGER,
     subject ROW(subjectRow) REF subjectTable
    );
Effectively, what this does is store the OIDs for objects in the subjectTable in the subject attribute. In other words, it is almost the same as if the subject attribute is a foreign key into the subjectTable, it is really just that it is implemented using OIDs. To get values for subject we would have to dereference the subject using the familar notation to get fields from a record.
Who is in cp2003?:

  SELECT name
  FROM student
  WHERE student.deref(subject).code = 'cp2003';
or

  SELECT name
  FROM student
  WHERE student.subject->code = 'cp2003';
We use the OID like we would use a pointer in C or C++!

OIDs

An object identifier (OID) is a persistent handle or name for a particular object. Generally, OIDs are 32 or 64 bit integers that are managed by the DBMS (32 bit ints can be swizzled on most machines without wasting space). When a new object is created the DBMSs table of allocated OIDs is updated with the new association between the OID and the object (and where it resides on disk!). When an object is deleted the table is updated to reflect that an OID has become free for reuse (reference counts are commonly used to support reclamantion of unused OIDs).

In SQL3, to create a table with OID support we use the following syntax.

Equality

Since individual attribute values have structure, there are two notions of equality. Note that shallow equality implies deep equality but not vice-versa.

Swizzling

objects in memory have pointers to sub-objects

Comparing Modelling Capabilities

Consider the situation where we want the database to maintain information on students and the subjects in which they are enrolled. There are several ways we can model this information.


Copyright © 1998 Curtis Dyreson. All rights reserved.