Database Design and SQL. Joining tables

SQL tutorial

Previous page Previous     TOC Contents     Next pageNext

Lesson 7 - Joining tables

A few more words on modeling

If you've been following this from the beginning, you've been playing around with the ProjectMgt database. It's perfectly OK to have run tests on it, to have added or changed data or to have changed the structure of the tables themselves.

Before continuing, however, we should standardize the database so that we're all on the same wavelength for the rest of the lessons.

Let's review. We started out with this model:

Then we added a few columns to different tables: e_BirthDate in Employee, p_Country in Project and maybe a few more.

But there is still one problem with the design. In fact, the problem is that the database is not normalized to the Third Normal Form (3NF). Uh? Let's look at it in practical terms.

If you have one table for Timesheets, you get one row for each timesheet entry: on a given Friday, an employee who has worked on 2 projects submits his timesheet. You input the timesheet date, the employee-Id, the project number and the hours for the first project, creating one row in the table and then you repeat for the second project, creating another row in the table. Now, if you have an application (in VB, Powerbuilder or Access) that wants to print a timesheet report, it will probably print something like this:


This is very standard form format. It's called a Master/Detail form.

In business applications you will use dozens of these: Orders, Invoices, Purchase Orders, PO Requisitions, etc.

What they all have in common is that there is a Master section which contains information on the transaction as a whole, and a Detail section which contains information on the details of the transaction.

In an Invoice, for example, the invoice date, customer name and address, shipping date are in the Master while items purchased, quantities, prices are in the Detail section.

It is very difficult to produce a Master/Detail form from a single table.

Therefore, what we will do in our ProjectMgt database is normalize the Timesheet table into a Timesheet-Master table and a Timesheet-Detail table. Master will contain the Timesheet number as Pk, the Employee-Id and the Timesheet date (all the information common to all transactions).

Detail will contain the Timesheet number, Project number and Hours-worked for each project.

Since there may be several Project numbers associated with one Timesheet number in Detail, we will assign Timesheet number + Project number as Pk for the Detail table.

You may feel a bit overwhelmed at this point. Take your time.

You should download a new copy of The Project management database now. If you prefer to work with the 97 version, go to the Downloads area - it has several versions of the database. Study it carefully and try to relate the design of the database to the Timesheet form shown above. Remember that a database is not a theoretical concept - it has to be applied to real-life applications.



Let's go back to the ProjectMgt example.

When you have to look at Employee data you do a SELECT from the Employee table. Remember that the Employee table contains the employee's department but as a number only. When you run the SELECT you can't tell what the department's name is from the output.

SELECT e_Id, e_Fname, e_Lname, e_Dept
        FROM Employee;

When you look at data from 2 or more tables in a SQL statement, the operation is called a JOIN. You are in fact joining 2 tables to provide the result needed. However, there is no JOIN clause in SQL - everything is done with the SELECT statement.

In the example above, you want to see the department's name instead of it's number when you look at an employee record. Since the department name is in the Department table and all the other fields are in the Employee table, it is fairly obvious that you will have to open 2 tables in the SELECT. Let's try it:

SELECT e_Id, e_Fname, e_Lname,
        e_Dept, d_DeptNum, d_DeptName
                FROM Employee, Department;

It should be immediately obvious to you that although the query worked, it produced way, way too much data.

And that brings us to talk about how a Join operation works.

When you tell SQL to join 2 tables, it really joins them! In fact, it joins every row in the first table with every row in the second table. If the first table, Employee, contains 5 rows and the second table, Department, contains 3 rows, the result displays 5 x 3 = 15 rows. Which is what happened in the example above. However, since there are only 5 employees, it means that 10 of those rows are meaningless.

The trick to know about joining table is fairly simple yet, absolutely crucial:
The tables you are joining must have common columns. Those columns don't have to have the same name but, they must contain the same kind of data: same datatype and size. e_Dept and d_DeptNum are both Numeric, Long integer and the Dept. numbers assigned to employees exist in the Department table.

The only meaninful information in a JOIN operation is that which occurs when data in the two common columns is the same.

In database jargon, the field that is used as a reference from one table is called a foreign key (Fk) and it must correspond to another field which is a primary key (Pk) in it's table. In our example, e_Dept is a Fk in the Employee table and d_DeptNum is a Pk in the Department table.

The thing to recognize about the result of the query above is that the only good results are the ones where e_Dept and d_DeptNum are the same.

So, we implement the JOIN with a WHERE clause:

SELECT e_Id, e_Fname, e_Lname,
        e_Dept, d_DeptNum, d_DeptName
                FROM Employee, Department
                        WHERE e_Dept = d_DeptNum;


Let's look at more examples:

List all the timesheets, showing the employee's name and phone.
SELECT tm_Num, tm_Date, tm_EmpID,
        e_Fname, e_Lname, e_Tel
                FROM Employee, TS_Master
                        WHERE tm_EmpID = e_ID;

List all the timesheets, showing project titles, start and end dates.
SELECT td_Num, td_ProjNum, td_Hours,
        p_Title, p_StartDate, p_EndDate
                FROM Project, TS_Detail
                        WHERE td_ProjNum = p_Number;

To obtain a particular employee's timesheets, add the condition to the WHERE clause:

SELECT tm_Num, tm_Date, tm_EmpID,
        e_Fname, e_Lname, e_Tel
                FROM Employee, TS_Master
                        WHERE tm_EmpID = e_ID
                         AND tm_EmpID = 'A1111';

You may be able to guess from the previous examples that joining 3 or 4 tables requires that all tables have pairs of common columns.

To obtain data from the Department, Employee and TS_Master tableswe have to know that Dept. Number exists in both Employee and Department and that Employee ID exists in both Employee and Timesheet Master:
SELECT tm_Num, tm_Date, tm_EmpID,
        e_Fname, e_Lname, d_Name
                FROM Employee, TS_Master, Department
                        WHERE tm_EmpID = e_ID
                        AND e_Dept = d_DeptNum;

To list all timesheets, with employee names and project titles, we know that Timesheet Number exists in both Timesheet Master and Timesheet Detail, that Employee Id exists in both Employee and Timesheet Master and finally, that Project Number exists in both Timesheet Detail and Project:

SELECT tm_Num, tm_Date, tm_EmpID, td_ProjNum,
        e_Fname, e_Lname, p_Title
                FROM Employee, TS_Master, TS_Detail, Project
                        WHERE tm_Num = td_Num
                        AND tm_EmpID = e_ID
                        AND td_ProjNum = p_Number;

OK, so it doesn't look all that great! But it works. All you have to do is arrange the column names and use the ORDER clause to sort it in proper order. And again, if you want to see the timesheets relating to a particular project, modify the WHERE clause:

SELECT tm_Num, tm_Date, tm_EmpID, td_ProjNum,
        e_Fname, e_Lname, p_Title
                FROM Employee, TS_Master, TS_Detail, Project
                        WHERE tm_Num = td_Num
                        AND tm_EmpID = e_ID
                        AND td_ProjNum = p_Number
                        AND td_ProjNum = 'C33333';

The 'JOIN' Formula

Joining multiple tables is not difficult as long as the database is designed properly: tables that are to be joined must have columns in common.

The formula is applied in the WHERE clause:
WHERE table1_column_w = table2_column_x AND table2_column_y = table3_column_z AND ...

If 2 tables have no common columns they cannot be joined. For example, if we still had the Products table in our database, we couldn't join Products and Employee or Products and Project because there is no common data in those tables.

The great thing about JOINS is that once you've mastered the technique you can obtain information from anywhere in the database. It may involve 4 or 5 or 10 joins but, so what!

The Boss wants to know which Departments are involved in projects in Germany at the moment.

Follow the joins:

SELECT DISTINCT d_DeptName, td_ProjNum, p_Title, p_Country
       FROM Department, Employee, TS_Master, TS_Detail, Project
               WHERE td_ProjNum = p_Number
               AND tm_Num = td_Num
               AND tm_EmpID = e_ID
               AND e_Dept = d_DeptNum
               AND p_Country LIKE 'Germany*'
               AND DATE( ) BETWEEN p_StartDate AND p_EndDate;

There are several points that should be noted about this query:

  • If a project in Germany has many timesheets submitted on it from one department, each occurence will generate one row - we only want to know the name of the department, not how many times it shows up so, we use the DISTINCT clause.

  • In the WHERE clause, always do the joins first - there are 5 tables involved and therefore, there are 4 joins.

  • Whenever you are comparing to a string or text field, use the LIKE operator - the country could have been mistakenly entered as "Germany " in the project data - the strings "Germany" and "Germany " do not match.

  • The Boss said " Germany at the moment". Listen to the question. That means currently active. You don't want project that are already over or that haven't started yet. If today is between the start and end dates, the project is currently active.

  • If there is an active project in Germany but it hasn't had timesheets submitted for it yet, it won't show up in the list. There is a way to list it in a query and we'll cover that in the next Lesson.

If you haven't found the database resource you're looking for,
use our Google Search box for more information.



Home | Tutorials | Contact | Sitemap | Add URL | Privacy policy