Database Design and SQL

SQL tutorial

Previous page Previous     TOC Contents     Next pageNext

Lesson 2 - Database design

A short case study

ezconsulting Inc. is a small consulting company offering database design and creation services to a fairly wide range of customers. The company employs about 30 consultants, analysts, programmers, network specialists, who will work in teams on projects for periods of time ranging from a few days to several months.

At any given time there may be 10-12 different projects on the go. Because ressources are scarce a specialist may be called upon to work on several projects simultaneously. In order to keep some control over scheduling and costing, every employee is assigned to a department and reports to only one manager, even when he's working on projects for other departments. Every week every employee must submit a timesheet showing the number of hours spent on each project.

As was the case for the shoemaker's children (they had no shoes because dad was too busy making shoes to sell in order to put bread on the table), this company has no Project Management database,simply because nobody has had the time to set one up. And this is typical in this kind of environment. Do you take an analyst who bills $800/day and put him to work on in-house maintenance? No, you don't. You wait until some bright college student shows up for a co-op work assignment and you give him/her the job. The company hopes that after the basic Project management application is operational other modules such as Employee Skills Management and control of bids and RFP's can be integrated to the database.

Designing the Project Management application

Here is what our first draft of the E-R diagram should look like for the Project Management case:


Fig. 2-1

The diagram contains the information we would have gathered by talking to the client. Notice that the attributes for Employee represent the minimum amount of information we have to keep at this time. We haven't included things like "Home address", "Date of birth" and so on. When we start working with SQL later we will add more information to the table. The same applies for the other entities - we will add attibutes as we develop the model later on.

It is important to make sure at this point that you understand the degrees of the relationships shown.
Department <--> Employee is a one-to-many relationship - a given employee is assigned to one and only one department and a given department contains zero or many employees. This means that every employee in the company will be assigned to a department, even the President who will be in Administration. A department may exist and have no employees assigned to it. For example, we could create a new department in the database and, until it is staffed, it will have zero employees assigned to it.
Employee <--> Project is a many-to-many relationship - a given employee works on one or many projects and a given project may have zero or many employees. In order to keep track of every employee's hours, all the work that is done will be billed to a project. However, projects could be things like "In-house systems development", "Professional development leave" or "Administrative duties". Any project may have no employees working on it.

Now, once you have the E-R diagram down, you go over it one more time with the client to make sure that you have the details down correctly and you are almost ready to start creating the actual database. Notice that I said "almost".


It is possible to start creating the database at this point. It's just a question of creating a new table for every entity identified in the diagram. We'll be using MS-Access to do that shortly. But how do you code the relationships?

There is a formal process to do that in database modeling. It's called normalization. It means applying a set of rules to the data so that you group the attributes in such a way that the relationships work. It's not really that complicated but it is a formula approach. If you prefer to use that approach, get any good book on databases, look-up "normalization" and follow the steps.

We'll do normalization using the intuitive approach - work with the data until it "feels" OK. This could also be called prototyping - create a working model of the database that is close to what you want and keep improving it until it works perfectly, then put it into production.

However, whatever the approach taken, there are some basic rules that have to be adhered to. The rules apply to any relational database and cannot be broken. They can't even be stretched. Think of them as the Prime directives. The rules are:
  1. Every table must have a primary key - an attribute or combination of attributes that uniquely identifies every occurence in the table.

  2. The primary key can never contain an empty or Null value. That makes sense - if you had 2 that were empty, they wouldn't be unique anymore.

  3. Every attribute of every occurence in the table can contain only one value. Think of the Employee table as a grid. Every occurence, or line, represents one employee and every column is an attribute. So, every employee can only have one ID and one First-name and one Last-name, and so on.


The one-to-many relationship

Let's start with the easiest relationship: Employee <--> Department.

First we create a new database and call it ProjMgt.mdb. Then we create the first two tables in the database: call them Employee and Department, and put in the fields from the E-R diagram. Notice that the column-names in the tables are all coded with a prefix: e_ for Employee, d_ for Department and p_ for Project. This is a good habit to get into. It will make your life easier later on. This is what we now have:

Fig. 2-2

Remembering that that is a one-to-many relationship, how do we associate the employee with the department? There are 2 ways it could be done:

  1. Add a column for Employee ID to the Department table. You get this:

    Fig. 2-3

    See the problem? When you start entering data, what do you put into the d_Employee column? Rule 3 says you can have only one value. What if there are 2 employees in the Department? You could try to add another column for d_Second_employee, but what if there are 20 employees, or 200? Obviously this is not going to work. So we scrap this brilliant idea.

  2. Add a column for Department to the Employee table. You get this:

    Fig. 2-4

    Any problem with this? Doesn't seem to be. Since every employee is assigned to only one department, I only have one value to put into the column: employee 101 works for department 10, and that's all.

In summary, to normalize a one-to-many relationship you add a column to the table at the "many" end of the relationship to refer to the primary key at the "one" end.


The many-to-many relationship

The many-to-many Employee <--> Project relationship is a bit trickier.

In the end we want to associate projects and employees, to see who is working on what project. To see how it must not be done we'll go through the exercise of adding columns to the tables. So we add the Project table to the relationships:

Fig. 2-5

To create the relationship we could add a Project_Number column to the Employee table. When we try it we see that we come up with the same problem we had in the previous relationship: when we get to the e_Project column, what do we write? The employee could be working on 7 different projects. Rule 3 says we can only enter one value.

Fig. 2-6

So we try it the other way - add an Employee_Number column to the Project table. Again, when we get to the p_Employee column what do we write? There could be 25 employees working on this project.

Fig. 2-7

Since those two attempts obviously won't work, there has to be something else. It's called a link entity or link table. Most textbooks will just call that table Employee-Project or Project-Employee. But in real life the entity does exist in our system. What is it that links employees with projects? Right! It's the timesheet. The timesheet contains all the information we need. So we add the Timesheet entity to the mix and modify our E-R diagram:


Fig. 2-8

t_Employee is an employee-ID that refers to the Employee table, t_Project is a project_number that refers to the Project table, t_Date is the period_ending date for the timesheet and t_Hours is the number of hours the employee spent on that project. We also specify that every line in the Timesheet table must have one and only one employee_ID and must have one and only one project_number. In other words we cannot create a Timesheet for an employee who doesn't exist or charge for work on a project that doesn't exist. Who would ever think of doing such a thing anyway!

What is the primary key for Timesheet? To get a feel for the key, let's look at the data that will be input:

Fig. 2-9

It's clear that t_Employee or t_Project can't be the primary key because they both repeat; remember: every occurence in a primary key column must be unique. How about a concatenation of t_Employee + t_Project. That looks good so we try it. It works fine for one week. The following week, employee 202 has worked on project S4440 again and we get a duplicate key error!

Fig. 2-10

So we add t_Date to the key and that solves the problem. Now, assuming that the client has said that if an employee works on a project twice in one week he adds-up the hours, the combination of employee + project + date is truly unique.

Conclusion: there is only one way to normalize a many-to-many relationship and that is to create a link table. The link table must contain columns that refer back to the other tables so that the many-to-many relationship becomes two one-to-many relationships.

If you haven't found the database resource you're looking for,
use our Google Search box for more information.



Home | Tutorials | Contact | Sitemap | Add URL | Privacy policy