Solution: Major features of a DDB are:
Solution: Advantages of distributed database systems are:
Solution: The unit of distribution may be a relation or a fragment; often fragment is a more suitable unit since it allows parallel processing of a query.
Replication improves availability since the system would continue to be fully functional even if a site goes down. Replication also allows increased parallelism since several sites could be operating on the same relations at the same time.
Replication does result in increased overheads on update.
Solution: Fragmentation may be horizontal, vertical or hybrid (or mixed). Horizontal fragmentation splits a relation by assigning each tuple of the relation to a fragment of the relation. Often horizontal fragmentation is based on predicates defined on that relation.
Vertical fragmentation splits the relation by decomposing a relation into several subsets of the attributes. Relation $R$ produces fragments $R sub 1 ,~R sub 2 ,~...., ~R sub n$ each of which contains a subset of attributes of $R$ as well as the primary key of $R$. Aim of vertical fragmentation is to put together those attributes that are accessed together.
Mixed fragmentation uses both vertical and horizontal fragmentation.
To obtain a sensible fragmentation design, it is necessary to know some information about the database as well as about applications. It is useful to know the predicates used in the application queries - at least the 'important' ones.
Aim is to have applications using only one fragment.
Fragmentation must provide completeness (all information in a relation must be available in the fragments), reconstruction (the original relation should be able to be reconstructed from the fragments) and disjointedness (no information should be stored twice unless absolutely essential, for example, the key needs to be duplicated in vertical fragmentation).
Solution: Transparency involves the user not having to know how a relation is stored in the DDB; it is the system capability to hide the details of data distribution from the user.
Autonomy is the degree to which a designer or administrator of one site may be independent of the remainder of the distributed system.
Solution: It is clearly undesirable for the users to have to know which fragment of the relation they require to process the query that they are posing. Similarly the users should not need to know which copy of a replicated relation or fragment they need to use. It should be upto the system to figure out which fragment or fragments of a relation a query requires and which copy of a fragment the system will use to process the query. This is called replication and fragmentation transparency.
A user should also not need to know where the data is located and should be able to refer to a relation by name which could then be translated by the system into full name that includes the location of the relation. This is location transparency.
Solution: Global query optimization is complex because of
Computing cost itself can be complex since the cost is a weighted combination of the I/O, CPU and communications costs. Often one of the two cost models are used; one may wish to minimize the total cost (time) or the response time. Fragmentation and replication add another complexity to finding an optimum query plan.