Website Design United States, Website Design California, Website Designing United States, Website Designing California

Why do I Want to Use a JOIN Clause?


Introduction

Recently a friend of mine asked me why he should use a JOIN statement rather than adding extra criteria to the where clause. It's an interesting question, and I thought it was worth investigation. Being that these articles are meant for those new to SQL Server though, let's first discuss the join itself.

What is a Join?

A join is a method of combining data in rows from multiple tables. Joins can be performed in both the FROM and WHERE clauses. There are many types of joins, but we are only going to look at inner joins and left and right outer joins.

Inner Joins

Inner joins limit the returned data to only the data that matches the criteria in both tables.

    Sample Tables:
    employee (Table)

    emp_ID     fname    lname      dept_ID
    -----------    ------    -------    -----------
    1            John           Doe      3
    2            Jane           Doe      2
    3            Ron            Smith    3
    4            Nancy         King      4
    5            James         Monroe  1
    6             Lisa            Jones    0

    department (Table)

    dept_ID            dept_name
    -----------            ------------------------------
    1                     Sales
    2                     Customer Service
    3                     Accounting
    4                     Marketing
    5                     Purchasing

    -- Inner Join of the tables in the FROM clause
    SELECT e.fname, e.lname, d.dept_name
    FROM department d
        INNER JOIN employee e ON d.dept_ID = e.dept_ID

    -- Inner Join of the tables in the WHERE clause
    SELECT e.fname, e.lname, d.dept_name
    FROM department d, employee e
    WHERE d.dept_ID = e.dept_ID


































Both of these inner joins give us the same results:

    fname      lname         dept_name
    ----------- ------------ ------------------------------
    John         Doe             Accounting
    Jane         Doe             Customer Service
    Ron          Smith           Accounting
    Nancy       King             Marketing
    James       Monroe         Sales









These queries join the tables based on matches in the dept_ID field of the tables. It is not required that the two fields have the same name.

Notice that the Purchasing department is not in the returned data because it does not match any records in the employee table. Similarly, Lisa Jones was not returned because she did not link to any department from the department table. Further analysis shows that the accounting department is listed twice because it matched two records in the employee table.

This might be what you were expecting, but what happens if you wanted to see all of the employees, even if they are not yet assigned to a department?

Outer Joins

Outer joins allow you to include information even if they don't have any matching data in the other table. The two outer joins we are concerned with in this article are left outer joins and right outer joins. As their names imply, left outer join returns all rows from the table listed on the left of the join and a right outer join returns all rows from the table to the right of the join. A third outer join you might use is a full outer join, this returns all records from both tables. Outer joins can be performed in both the FROM and WHERE clause, but performing them in the WHERE clause is no longer being supported, and will be removed from SQL Server before long. We'll discuss a bit of why in a minute, but first, let's put our outer joins to work.

    -- A left outer join in the FROM clause
    SELECT e.fname, e.lname, d.dept_name
    FROM department d
      LEFT OUTER JOIN employee e ON d.dept_ID = e.dept_ID

    fname     lname     dept_name
    --------- --------- ------------------------------
    James     Monroe     Sales
    Jane       Doe          Customer Service
    John       Doe          Accounting
    Ron        Smith        Accounting
    Nancy     King         Marketing
    NULL       NULL        Purchasing















By doing a LEFT OUTER JOIN we got all the values in department (the table left of the OUTER JOIN statement). Therefore, you notice Purchasing shows up and has NULL values for every field retrieved from the employee table. Also notice that Accounting still shows up twice as it matched two seperate records in the employee table. But, back to our question, we wanted to see all the employees, not all the departments.

    -- A right outer join in the FROM clause
    SELECT e.fname, e.lname, d.dept_name
    FROM department d
      RIGHT OUTER JOIN employee e ON d.dept_ID = e.dept_ID

    fname    lname    dept_name
    --------- --------- ------------------------------
    John      Doe         Accounting
    Jane      Doe         Customer Service
    Ron       Smith       Accounting
    Nancy    King         Marketing
    James    Monroe     Sales
    Lisa       Jones       NULL

















Same as the left outer join, but now we get all the employees, and not necessarily all the departments. Notice in this case, no employee was assigned to purchasing, so it is not in the return values.

What about Outer Joins in the WHERE Clause?

Before I get started, I would just like to say that if you are reading this to get a basic understanding of JOINS, and are willing to just believe me when I say to always do your joins in the FROM clause, there is no need to read on. If you want to see why though, keep reading.

Why isn't the outer join going to be supported in the WHERE clause anymore? Well, let's see. To perform an outer join in the where clause, you do the same as with an inner join, but instead of an (=) between the values you use a (*=).

    -- Left outer join in the WHERE clause
    SELECT e.fname, e.lname, d.dept_name
    FROM department d, employee e
    WHERE d.dept_ID *= e.dept_ID
    fname     lname     dept_name
    ---------- ---------- ------------------------------
    James     Monroe     Sales
    Jane         Doe         Customer Service
    John        Doe          Accounting
    Ron          Smith       Accounting
    Nancy       King         Marketing
    NULL        NULL         Purchasing

 














Alright, you probably want to know what's p with that right? I mean, the results are the same aren't they? Yeah they are. What's the problem then?

The problem is that queries are rarely that simple. Let's add another clause and see what happens. Let's say we work in HR and want to know what departments don't have any employees. To find this we need to find all the departments where the employee fields are NULL. You could use any field, but to insure a field isn't NULL for another reason, I always check the field I am joining on.

    -- retrieve departments with no employees listed
    SELECT d.dept_name
    FROM department d
      LEFT OUTER JOIN employee e ON d.dept_ID = e.dept_ID WHERE e.dept_ID IS NULL

    dept_name
    ------------------------------
    Purchasing










Exactly what we are looking for. We know the Purchasing department had no employees assigned to it, and that was returned. Let's try it with the join in the WHERE clause now.

    -- the same query but using the WHERE clause for the join SELECT d.dept_name

    FROM department d, employee e
    WHERE d.dept_ID *= e.dept_ID
    AND e.dept_ID IS NULL

    dept_name
    ------------------------------
    Sales
    Customer Service
    Accounting
    Marketing
    Purchasing









 

 

 



What went wrong? These two queries join on the same fields, and use the exact same criteria in their WHERE clause. But, this does not take into consideration one factor: filtering priority.

Filtering Priority

When SQL Server processes a query, there are three sets of criteria it must use to filter the data. The first to be processed is a join in the FROM clause, then criteria in the WHERE clause, and finally criteria in the HAVING clause (we will not discuss the HAVING clause in this article, but I thought I would throw it in for those familiar with it). In our first example above, the FROM clause join was performed, and then the WHERE clause filtered the resulting data. In the second example, the check for NULL values becomes part of the joining action. Because it is an outer join SQL Server provides NULL values for the fields that are not matching the other criteria. Therefore, it returns all the fields. Being that

Of course, this result is something you would never want, and that is why you do not want to do your join in the WHERE clause. That and the fact it is already not supported, and in a future version of SQL Server will be dropped.

Wait a Second, What About Inner Joins?

It's inevitable that you would ask that question, so let's try it. Ok, you can give it a shot on your own if you want, but the result is, the two queries match. If you are familiar with query analyzer, check the execution plan and you will see they will match, but if you check it on the Outer Joins, you will see they are extremely different.

So why use a INNER JOIN clause in the FROM instead of doing the inner join in the WHERE clause? For one, it is the standard, and it's always nice to follow standards. Another reason is that there are many more advanced features of joins, and seem to be more on the way. New features for joins are targeted at the FROM clause. Finally, all new optimization methods are targeted at the join in the FROM clause.

I made an attempt to create a query that did not give the same results, or even broke the execution plans, but I was unable to create a sample with the INNER JOIN.

Conclusion

There are many more advanced options we did not discuss with joins, but are covered elsewhere on the internet. When using joins, stick with the standard, and that is to add your join to the FROM clause.

If you would like to try the samples in this article, following is the script to recreate the tables and database:

    CREATE DATABASE PCsamples
    GO

    USE PCsamples
    GO

    CREATE TABLE department
    (
        dept_ID INT IDENTITY(1,1) PRIMARY KEY,
        dept_name VARCHAR(30)
    )
    go

    INSERT department (dept_name) VALUES ('Sales')
    INSERT department (dept_name) VALUES ('Customer Service')
    INSERT department (dept_name) VALUES ('Accounting')
    INSERT department (dept_name) VALUES ('Marketing')
    INSERT department (dept_name) VALUES ('Purchasing')
    go

    CREATE TABLE employee
    (
        emp_ID INT IDENTITY(1,1) PRIMARY KEY,
        fname VARCHAR(10),    lname VARCHAR(10),
       dept_ID INT
    )
    go

    INSERT employee (fname, lname, dept_ID)
        VALUES ('John', 'Doe', 3)
    INSERT employee (fname, lname, dept_ID)
       VALUES ('Jane', 'Doe', 2)
    INSERT employee (fname, lname, dept_ID)
        VALUES ('Ron', 'Smith', 3)
    INSERT employee (fname, lname, dept_ID)
       VALUES ('Nancy', 'King', 4)
    INSERT employee (fname, lname, dept_ID)
        VALUES ('James', 'Monroe', 1)
    INSERT employee (fname, lname, dept_ID)
       VALUES ('Lisa', 'Jones', 0)



Author Information:

Josh Fruits

http://www.programmers-corner.com

josh@programmerscorner.com

Comments:

Add your comments here.

Name

Comment

You can also send feedback to feedback@programmers-corner.com

There are currently no comments available.

 















 


© 2008-2009 dotnet4all.com