14.5 Relational Model of Databases
As an example of how data can be organized conceptually, we shall describe the relational data model. In this conceptual model, the data in the database is viewed as being organized into a series of relations or tables of data which are associated in ways defined in the data dictionary. A relation consists of rows of data with columns containing particular attributes. The term "relational" derives from the mathematical theory of relations which provides a theoretical framework for this type of data model. Here, the terms "relation" and data "table" will be used interchangeably. Table 14-2 defines one possible relation to record unit cost data associated with particular activities. Included in the database would be one row (or tuple) for each of the various items involved in construction or other project activities. The unit cost information associated with each item is then stored in the form of the relation defined in Table 14-2.
TABLE 14-2 Illustration of a Relation Description: Unit Price Information Attributes
Using Table 14-2, a typical unit cost entry for an activity in construction might be:
This entry summarizes the unit costs associated with construction of 12" thick brick masonry walls, as indicated by the item DESCRIPTION. The ITEM_CODE is a numerical code identifying a particular activity. This code might identify general categories as well; in this case, 04.2 refers to general masonry work. ITEM_CODE might be based on the MASTERFORMAT or other coding scheme. The CREW_CODE entry identifies the standard crew which would be involved in the activity. The actual composition of the standard crew would be found in a CREW RELATION under the entry 04.2-3, which is the third standard crew involved in masonry work (04.2). This ability to point to other relations reduces the redundancy or duplication of information in the database. In this case, standard crew number 04.2-3 might be used for numerous masonry construction tasks, but the definition of this crew need only appear once.
DESCRIPTION: common brick masonry, 12" thick wall, 19.0 bricks per S.F.
WORK_UNIT: 1000 bricks
WORK_UNIT, OUTPUT and TIME_UNIT summarize the expected output for this task with a standard crew and define the standard unit of measurement for the item. In this case, costs are given per thousand bricks per shift. Finally, material (MATL_UNIT_COST) and installation (INSTCOSTS) costs are recorded along with the date (DATEMCOS and DATEICOS) at which the prices were available and entered in the database. The date of entry is useful to insure that any inflation in costs can be considered during use of the data.
The data recorded in each row could be obtained by survey during bid preparations, from past project experience or from commercial services. For example, the data recorded in the Table 14-2 relation could be obtained as nationwide averages from commercial sources.
An advantage of the relational database model is that the number of attributes and rows in each relation can be expanded as desired. For example, a manager might wish to divide material costs (MATL_UNIT_COST) into attributes for specific materials such as cement, aggregate and other ingredients of concrete in the unit cost relation defined in Table 14-2. As additional items are defined or needed, their associated data can be entered in the database as another row (or tuple) in the unit cost relation. Also, new relations can be defined as the need arises. Hence, the relational model of database organization can be quite flexible in application. In practice, this is a crucial advantage. Application systems can be expected to change radically over time, and a flexible system is highly desirable.
With a relational database, it is straightforward to issue queries for particular data items or to combine data from different relations. For example, a manager might wish to produce a report of the crew composition needed on a site to accomplish a given list of tasks. Assembling this report would require accessing the unit price information to find the standard crew and then combining information about the construction activity or item (eg. quantity desired) with crew information. However, to effectively accomplish this type of manipulation requires the definition of a "key" in each relation.
In Table 14-2, the ITEMCODE provides a unique identifier or key for each row. No other row should have the same ITEMCODE in any one relation. Having a unique key reduces the redundancy of data, since only one row is included in the database for each activity. It also avoids error. For example, suppose one queried the database to find the material cost entered on a particular date. This response might be misleading since more than one material cost could have been entered on the same date. Similarly, if there are multiple rows with the same ITEMCODE value, then a query might give erroneous responses if one of the rows was out of date. Finally, each row has only a single entry for each attribute.
The ability to combine or separate relations into new arrangements permits the definition of alternative views or external models of the information. Since there are usually a number of different users of databases, this can be very useful. For example, the payroll division of an organization would normally desire a quite different organization of information about employees than would a project manager. By explicitly defining the type and organization of information a particular user group or application requires, a specific view or subset of the entire database can be constructed. This organization is illustrated in Fig. 14-1 with the DATA DICTIONARY serving as a translator between the external data models and the database management system.
Behind the operations associated with querying and manipulating relations is an explicit algebraic theory. This algebra defines the various operations that can be performed on relations, such as union (consisting of all rows belonging to one or the other of two relations), intersection (consisting of all rows belonging to both of two relations), minus (consisting of all rows belonging to one relation and not another), or projection (consisting of a subset of the attributes from a relation). The algebraic underpinnings of relational databases permits rigorous definitions and confidence that operations will be accomplished in the desired fashion.
Example 14-3: A Subcontractor Relation
As an illustration of the preceding discussion, consider the problem of developing a database of possible subcontractors for construction projects. This database might be desired by the cost estimation department of a general contractor to identify subcontractors to ask to bid on parts of a project. Appropriate subcontractors appearing in the database could be contacted to prepare bids for specific projects. Table 14-3 lists the various attributes which might be required for such a list and an example entry, including the subcontractor's name, contact person, address, size (large, medium or small), and capabilities.
TABLE 14-3 Subcontractor Relation Example
To use this relation, a cost estimator might be interested in identifying large, electrical subcontractors in the database. A query typed into the DBM such as:
SELECT from SUBCONTRACTORS
where SIZE = Large and ELECTRICAL = Yes
would result in the selection of all large subcontractors performing electrical work in the subcontractor's relation. More specifically, the estimator might want to find subcontractors in a particular state:
SELECT from SUBCONTRACTORS
where SIZE = Large and ELECTRICAL = Yes and STATE = VI.
In addition to providing a list of the desired subcontractors' names and addresses, a utility application program could also be written which would print mailing labels for the selected firms.
Other portions of the general contracting firm might also wish to use this list. For example, the accounting department might use this relation to record the addresses of subcontractors for payment of invoices, thereby avoiding the necessity to maintain duplicate files. In this case, the accounting code number associated with each subcontractor might be entered as an additional attribute in the relation, and the accounting department could find addresses directly.
Example 14-4: Historical Bridge Work Relation
As another simple example of a data table, consider the relation shown in Table 14-0 which might record historical experience with different types of bridges accumulated by a particular agency. The actual instances or rows of data in Table 14-4 are hypothetical. The attributes of this relation are:
These attributes could be used to answer a variety of questions concerning construction experience useful during preliminary planning.
TABLE 14-4 Example of Bridge Work Relation
As an example, suppose that a bridge is to be built with a span of 250 feet, located in Pittsburgh PA, and crossing a river with limestone sub-strata. In initial or preliminary planning, a designer might query the database four separate times as follows:
Each SELECT operation would yield the bridge examples in the database which corresponds to the desired selection criteria. In practice, an input/output interpreter program should be available to translate these inquiries to and from the DBM and an appropriate problem oriented language.
The four queries may represent subsequent thoughts of a designer faced with these problem conditions. He or she may first ask, "What experience have we had with bridges of this span over rivers?" "What experience have we had with bridges of this span with these site conditions? What is our experience with steel girder bridges in Pennsylvania? For bridges of this span, how many and which were erected without a sizable cost overrun? We could pose many more questions of this general type using only the small data table shown in Table 14-4.
14.6 Other Conceptual Models of Databases
While the relational model offers a considerable amount of flexibility and preserves considerable efficiency, there are several alternative models for organizing databases, including network and hierarchical models. The hierarchical model is a tree structure in which information is organized as branches and nodes from a particular base. As an example, Figure 14-2 illustrates a hierarchical structure for rented equipment costs. In this case, each piece of equipment belongs to a particular supplier and has a cost which might vary by the duration of use. To find the cost of a particular piece of equipment from a particular supplier, a query would first find the supplier, then the piece of equipment and then the relevant price.
The hierarchical model has the characteristic that each item has a single predecessor and a variable number of subordinate data items. This structure is natural for many applications, such as the equipment cost information described above. However, it might be necessary to construct similar hierarchies for each project to record the equipment used or for each piece of equipment to record possible suppliers. Otherwise, generating these lists of assignments from the database illustrated in Figure 14-2 would be difficult. For example, finding the least expensive supplier of a crane might involve searching every supplier and every equipment node in the database to find all crane prices.
Figure 14-2 Hierarchical Data Organization
The network model or database organization retains the organization of information on branches and nodes, but does not require a tree of structure such as the one in Figure 14-2. This gives greater flexibility but does not necessarily provide ease of access to all data items. For example, Figure 14-3 shows a portion of a network model database for a building. The structural member shown in the figure is related to four adjoining members, data on the joints designed for each end, an assembly related to a room, and an aggregation for similar members to record member specifications.
Figure 14-3 Example of a Network Data Model
While the early, large databases were based on the hierarchical or network organizations, the relational model is now preferred in many applications due to its flexibility and conceptual simplicity. Relational databases form the kernel for large systems such as ORACLE or SAP. However, databases distributed among numerous servers may have a network structure (as in Figure 14-3), with full relational databases contained at one or more nodes. Similarly, "data warehouse" organizations may contain several different types of databases and information files. For these data warehouses, more complicated search approaches are essential, such as automatic indexing of multi-media files such as photographs.
More recently, some new forms of organized databases have appeared, spurred in part by work in artificial intelligence. For example, Figure 14-4 illustrates a frame data structure used to represent a building design element. This frame describes the location, type, cost, material, scheduled work time, etc. for a particular concrete footing. A frame is a general purpose data representation scheme in which information is arranged in slots within a named frame. Slots may contain lists, values, text, procedural statements (such as calculation rules), pointers or other entities. Frames can be inter-connected so that information may be inherited between slots. Figure 14-5 illustrates a set of inter-connected frames used to describe a building design and construction plan. Object oriented data representation is similar in that very flexible local arrangements of data are permitted. While these types of data storage organizations are active areas of research, commercial database systems based on these organizations are not yet available.
Figure 14-4 Illustration of Data Stored in a Frame
Figure 14-5 Illustration of a Frame Based Data Storage Hierarchy