CP5010 : Codd Discussion

Pagona (Paula) Mavridoglou's 1st Question:


When we normalise, we remove non-simple domains. However, in doing so, we
create a lot of duplication. I was under the impression that we should try to 
     minimise duplication as it is hard to keep data consistent and it also doesn't
     waste space. So what is so special about normalising?

The first advantage of normalising is that it renders all values atomic, thus simplifying all data structures: a huge advantage for storage and communication purposes. It is true that as a result of normalisation (in the sense of removing non-simple domains) duplication is introduced but the process of normalisation is at the very heart of the relational model, i.e., atomic units in n-column homogeneous arrays. Therefore a small element of redundant information is introduced for this advantage. Redundancy can be further reduced by other degrees of normalisation.


Chin's 2nd Question:


We know that indexing dependence can be quite useful to quickly process 
queries and updates. However, Codd states that indexing needs to be removed. 
     What can  be done instead of using indexed dependence?

Codd asked this question in his paper:
"Can application programs and terminal activities remain invariant as indices come and go?"
Codd surmised that algorithms using indexes must know the indexes by name and therefore would fail if the indexes changed. Therefore Codd decided that the relational model would not be index dependent. Indexes can be used in a relational model but they are completely tied to performance.


Rob Fyfe's 1st Question:


Codd speaks of a concept called "point of ambiguity".
This refers to when an element of the domain part possesses more than
     one relative under both relations involved in a join. On page 11, Codd 
     offers two different results of joins on R and S. 
     The second seems in error. Why is this so?

Codd's definition of a join is not that of a join as we know it today. In Codd's definition of a join he specifies that the result of a join between two binary relations R and S is the ternary relation which will have the following features:

  • a projection of the first two columns will equal R
  • a projection of the last two columns will give S
    Given this definition of a join then both relations given on page 11 are possible as both meet the above criteria. The relation U is:
    1
    2
    2
    1
    1
    2
    2
    1
    1

    and taking the two projections specified in the definition gives:
    1
    2
    2
    1
    1
    2
    1
    1
    2
    2
    1
    1
    which are equal to R and S.
    (This was a definition that was subsequently improved upon with a join being further defined to be the result of a Cartesian product on the two relations and selections on tuples where the joining attribute matched) a projection of the first two columns


    Maree Fontana's 3rd Question:

    
         What is symmetric exploitation? 
    

    Symmetric exploitation is the fact that user's can access any column in a relation. This is opposed to information asymmetry that might exist if the columns were orderd in a hierarchy or tree structure, so that the access path to values in a column must proceed with accessing that column's parent.


    Martin Lucas' 1st Question:

    
         Codd made reference to relations existing of order ~30. 
         Wouldn't relations  of this order have redundancy problems? 
    

    Certainly a relation of this order must display some redundancy. However, a relation of large cardinality may not necessarily exhibit redundancy problems as the relation could be the result of a view. That is the data for a large relation may have been sourced from multiple smaller relations which, therefore, bypass any redundancy problems. (assuming the underlying relations are formally normalised)


    Alison Gunn's 1st Question:

    
         What is the difference between relations and relationships?
         Besides the fact that one is ordered and the other un-ordered of course.
    

    Codd based his original idea solidly on mathematical foundations, namely relations. However there existed one main difference between the mathematical representation of relations and Codd's data representation: relations are ordered. Therefore Codd, wanting to remain mathematically correct chose to call the representations relationships so that they may be unordered yet still bear likeness to relations. However, relationships is a bit wordy, so he used relations to mean relationships.


    Nghia La's 1st Question:

    
         Regarding relational composition,what exactly is "the connection trap"
    

    The example Curtis provided is as follows:
    There may exist a relation R that contains the entry that Alpha (a supplier) supplies engines (a part). They may also exist a relation S that has an entry linking engines and Paris (a project title). ie.
    Relation R
    Supplier Part
    Alpha engine
    Relation S
    Part Project
    engine Paris

    Therefore it is true that:

  • Alpha supplies engines and;
  • engines are shipped to Paris
    BUT NOT THAT:
  • Alpha supplies engines for Paris
    Making this assumption would be falling into the "connection trap"
    Steve Nesbitt's 1st Question:

    
    Why does Codd differentiate between the named set and the expressible set.
     It seems that both sets are used to define the set of relations, but I 
    can't follow why he defined the expressible set.
    

    The named set of relations is set by the users of the system and contains all data entered. The expressible set represents all other relations that could result from applying the operations available to relations in the named or expressible set. The difference is made so as to identify what base relations actually exist. Of course a relation from the expressible set can become a member of the named set through use of a (materialised) view.


    General Question:

    
         If a relation is split on which a query is based, need the query be
         changed to still return correctly?  
    

    As long as a view is constructed which joins the two new relations to attain the original then the query need not be changed. This exhibits an advantage of the relation model over a network model in which access paths are crucial. If access paths are changed then for queries to remain useful then the queries must mirror any change.


    guy@cs.jcu.edu.au