Department of Computer Science
James Cook University of North Queensland
CP3020 - Advanced Database Systems

Tutorial 8 - Distributed Database Systems - Solutions


  1. What are the major features of a distributed database?

    Solution: Major features of a DDB are:

  2. Discuss the relative advantages of centralized and distributed database management.

    Solution: Advantages of distributed database systems are:

  3. Disadvantages of distributed database systems are:
  4. Why is it useful to have replication or fragmentation of data? Discuss.

    Solution: The unit of distribution may be a relation or a fragment; often fragment is a more suitable unit since it allows parallel processing of a query.

    Replication improves availability since the system would continue to be fully functional even if a site goes down. Replication also allows increased parallelism since several sites could be operating on the same relations at the same time.

    Replication does result in increased overheads on update.

  5. Discuss the horizontal, vertical and hybrid fragmentation schemes. What properties all fragmentation schemes must satisfy?

    Solution: Fragmentation may be horizontal, vertical or hybrid (or mixed). Horizontal fragmentation splits a relation by assigning each tuple of the relation to a fragment of the relation. Often horizontal fragmentation is based on predicates defined on that relation.

    Vertical fragmentation splits the relation by decomposing a relation into several subsets of the attributes. Relation $R$ produces fragments $R sub 1 ,~R sub 2 ,~...., ~R sub n$ each of which contains a subset of attributes of $R$ as well as the primary key of $R$. Aim of vertical fragmentation is to put together those attributes that are accessed together.

    Mixed fragmentation uses both vertical and horizontal fragmentation.

    To obtain a sensible fragmentation design, it is necessary to know some information about the database as well as about applications. It is useful to know the predicates used in the application queries - at least the 'important' ones.

    Aim is to have applications using only one fragment.

    Fragmentation must provide completeness (all information in a relation must be available in the fragments), reconstruction (the original relation should be able to be reconstructed from the fragments) and disjointedness (no information should be stored twice unless absolutely essential, for example, the key needs to be duplicated in vertical fragmentation).

  6. What is meant by transparency and autonomy? Why are these concepts desirable in distributed database systems?

    Solution: Transparency involves the user not having to know how a relation is stored in the DDB; it is the system capability to hide the details of data distribution from the user.

    Autonomy is the degree to which a designer or administrator of one site may be independent of the remainder of the distributed system.

  7. What is meant by fragmentation transparency, replication transparency and location transparency?

    Solution: It is clearly undesirable for the users to have to know which fragment of the relation they require to process the query that they are posing. Similarly the users should not need to know which copy of a replicated relation or fragment they need to use. It should be upto the system to figure out which fragment or fragments of a relation a query requires and which copy of a fragment the system will use to process the query. This is called replication and fragmentation transparency.

    A user should also not need to know where the data is located and should be able to refer to a relation by name which could then be translated by the system into full name that includes the location of the relation. This is location transparency.

  8. Why is global query optimization difficult in distributed databases?

    Solution: Global query optimization is complex because of

    Computing cost itself can be complex since the cost is a weighted combination of the I/O, CPU and communications costs. Often one of the two cost models are used; one may wish to minimize the total cost (time) or the response time. Fragmentation and replication add another complexity to finding an optimum query plan.


    Gopal K. Gupta
    May 1997