This article discusses thedifferent types of joins in SQL as specified by ANSI SQL.
In the previous article, we worked with a single table to learn the basics of SQL. However, relational databases define relationships between tables, and often queries require gleaning information from two or more tables. Joins allow us to combine rows from multiple tables using columns common between them. In fact, the relations defined amongst tables is what makes relational databases, relational.
To drive the concepts home, we’ll work with two tables that have one column common between them. The two tables are shown below:
The ANSI SQL standard defines five types of joins that we’ll discuss.
This Articles Contents
Cross Joins in SQL
We’ll start with the cross join, which is also known as the Cartesian product. In this case, we pick the first row of Table A and match it with every row of Table B. Next, we pick the second row of Table A and match it with every row of Table B. There’s no condition specified which is tested to determine if a row from Table A should be joined with a row from Table B. The resulting table for a cross join of the two example tables will be as follows:
Inner Joins in SQL
In the case of an inner join, a condition, or multiple conditions, are tested to determine if a row from Table A should be joined with a row from Table B. This condition is called the join predicate. In the case of our example, the two tables share the movie iD column as the common value between them. The movie iD column establishes a relation between the two tables. Using the common column, we can determine if a given movie was screened on any of the theaters we have in our database and if so, then for how many days.
Inset theory, the Venn diagram representation for an inner join is as follows:
Left Outer Joins in SQL
In the case of left join, the result set consists of rows that match the join predicate and also rows from the table specified on the left of the left join clause that doesn’t match the join predicate. Null is inserted for columns from Table B and for rows from table A that didn’t satisfy the join predicate. Said a different way, all rows from the left are always included in the result set and rows from the right are only included if they match the join predicate.
Right Outer Joins in SQL
The right join is the reverse of the left join. In this case, all rows from the right table are always included in the result set and only those rows from the left table make it to the result set that satisfies the join condition. With left and outer joins, we specify which side of the join is allowed to have a row in the result when the join predicate isn’t satisfied.
Full Outer Joins in SQL
In the case of a full join, rows from both the tables are included in the result set. Rows that evaluate true for the join predicate are only included once. Rows that don’t match the predicate have NULL inserted for columns belonging to the other table. Note that MySQL doesn’t support a full join.
In set theory, the Venn diagram representation for a full join is as follows:
Self Joins in SQL
A self join is the result set when a table is joined to itself. If we create a self join of the movie table based on the movie iD the result will be as follows:
Self Inner Joins in SQL
The simplest join one can use is the inner join. Rows from two tables are joined together using a common column between them.
Syntax :
SELECT *
FROM table1
INNER JOIN table1
ON <join condition>;
-- The lesson queries are reproduced below for convenient copy/paste into the terminal.
-- Query 1
SELECT * FROM Actors a INNER JOIN Actors b;
-- Query 2
SELECT * FROM Actors a INNER JOIN Actors b USING(FirstName);
-- Query 3
SELECT * FROM Actors a INNER JOIN Actors b USING(NetWorthInMillions);
SELECT * FROM table a INNER JOIN table b;
It may come as a surprise, but we can also join a table with itself. However, we need to use aliases as the INNER JOIN clause requires the two tables to be unique.
If you run the above query, the result will be a Cartesian product, i.e., each row will join with every other row of the second table. The total number of rows in the resulting table will be 121 because the table has 11 rows. The count for the inner join is shown below:
SELECT * FROM table a INNER JOIN table b USING(column);
We can use the USING clause to specify the column to join the two tables on. For example:
SELECT * FROM Actors a INNER JOIN Actors b USING(FirstName);
Note we have exactly 11 rows now because each row in the first table matches exactly one row in the second table. However, if we change the query and specify the NetWorthInMillions column in the USING clause, we’ll get 13 rows in the result because the two rows with value 240 for the NetWorthInMillions column match twice for a total of four rows.
SELECT * FROM Actors a INNER JOIN Actors b USING(NetWorthInMillions);
Remember, the USING clause defines one or more columns that are in both tables or results and used to join or match rows. Furthermore, if any rows from the two tables don’t match, they aren’t included in the output. This obviously, will not happen in the case of a self-join.
Inner Joins in SQL
We’ll introduce another table called Digital Assets that’ll contain the online public properties such as Twitter, Facebook, and Pinterest belonging to a celebrity. The table structure is shown below:
A few rows from the table are shown below:
Note that the primary key of the table is the URL column as every URL is guaranteed to be unique. The DigitalAssets table is linked with the Actors table with the common column of ID for the actor as shown below. However, note that the column names in the two tables are different.
Syntax :
SELECT *
FROM table1
INNER JOIN table2
ON <join condition>;
-- Query 1
SELECT FirstName, SecondName, AssetType, URL FROM Actors INNER JOIN DigitalAssets ON Actors.Id = DigitalAssets.ActorID;
-- Query 2
SELECT FirstName, SecondName, AssetType, URL FROM Actors INNER JOIN DigitalAssets USING(Id);
-- Query 3
SELECT FirstName, SecondName, AssetType, URL FROM Actors, DigitalAssets WHERE ActorId=Id;
-- Query 4
SELECT FirstName, SecondName, AssetType, URL FROM Actors, DigitalAssets;
-- Query 5
SELECT FirstName, SecondName, AssetType, URL FROM Actors INNER JOIN DigitalAssets;
-- Query 6
-- Makes no sense to join tables on FirstName and URL columns as they aren't related.
SELECT * FROM Actors INNER JOIN DigitalAssets ON URL = FirstName;
-- Query 7
-- Again no sense in combining net worth and actor id. Additionally, one is an int and the ot
SELECT * FROM Actors INNER JOIN DigitalAssets ON NetWorthInMillions = ActorId;
SELECT column1, column2, column3, column4 FROM table1 INNER JOIN table2 ON table1.Id = table2.table1ID;
Using the INNER JOIN, we are now able to answer queries such as listing the Facebook pages for each celebrity. Note that each table in isolation can’t answer this query as the Actors table doesn’t hold the information of the digital asset for each actor and the DigitalAssets table doesn’t hold the names for each actor.
SELECT FirstName, SecondName, AssetType, URL FROM Actors INNER JOIN DigitalAssetsON Actors.Id = DigitalAssets.ActorID;
SELECT column1, column2, column3, column4 FROM table1 INNER JOIN table2 USING(Id);
If the two tables had the same column name for the actor’s ID then we could have used the alternative syntax with USING clause to make the query slightly less verbose as shown below:
SELECT FirstName, SecondName, AssetType, URL FROM Actors INNER JOIN DigitalAssets USING(Id);
Note that the columns listed in the SELECT clause are unique across the two tables. However, if the two tables had columns with the same names then we would need to disambiguate the two by fully qualifying the column with the table name.
Also notice that celebrities with no digital assets, or assets with no corresponding celebrity entries, in the Actors table aren’t captured with the results of the query. The server picks rows from both tables that have the same value for the two columns. Or you can think of it as an intersection of the two tables based on the IDs of the celebrities.
SELECT column1, column2, column3, column4 FROM table1, table2 WHERE column5=Id;
It’s not necessary to use the INNER JOIN clause to get an inner join between two tables. We can also use the WHERE clause to achieve the same effect as shown below:
SELECT FirstName, SecondName, AssetType, URL FROM Actors, DigitalAssets WHERE ActorId=Id;
There’s no difference in using the WHERE clause or the INNER JOIN clause in query performance, rather it is just a matter of taste.
SELECT column1, column2, column3, column4 FROM table1, table2;
We can also create a Cartesian product between the two tables as we did in the self join section. We can use either the where or the inner join syntax. Both are shown below:
SELECT FirstName, SecondName, AssetType, URL FROM Actors, DigitalAssets;
Or,
SELECT FirstName, SecondName, AssetType, URL FROM Actors INNER JOIN DigitalAssets;
SELECT * FROM table1 INNER JOIN table2 ON column1 = column2;
We can join any two columns from two tables that have the same type, or which can be converted to one another albeit with data loss. For instance, the following two queries are nonsensical, but the tables can still be joined on the columns that appear in the queries.
-- Makes no sense to join tables on FirstName and URL columns a they aren't related.
SELECT * FROM Actors INNER JOIN DigitalAssets ON URL = FirstName;
Or,
-- Again no sense in combining net worth and actor id. Additionally, one is an int and the other a decimal but still comparable.
SELECT * FROM Actors INNER JOIN DigitalAssets ON NetWorthInMillions = ActorId;
Both the queries result in empty sets.
Union in SQL
The UNION clause allows us to combine the results from several queries together. The clause doesn’t join the table but merely clubs the two results together.
Syntax :
<Query1>
UNION
<Query2>
-- Query 1
SELECT FirstName FROM Actors UNION SELECT URL FROM DigitalAssets;
-- Query 2
(SELECT CONCAT(FirstName, ' ', SecondName) AS "Actor Name" FROM ActorsORDER BY NetworthInMillions DESC LIMIT 2) UNION (SELECT CONCAT(FirstName, ' ', SecondName) AS "ThisAliasIsIgnored" FROM Actors ORDER BY Networth InMillions ASCLIMIT 2);
-- Query 3
SELECT FirstName, Id FROM Actors UNION SELECT FirstName FROM Actors;
-- Query 4
SELECT FirstName, Id FROM Actors UNION SELECT FirstName, null FROM Actors;
-- Query 5
SELECT MaritalStatus FROM Actors UNION SELECT Gender FROM Actors;
-- Query 6
SELECT MaritalStatus FROM Actors UNION ALL SELECT Gender FROM Actors;
-- Query 7
(SELECT CONCAT(FirstName, ' ', SecondName) AS "Actor Name" FROM Actors ORDER BY NetworthInMillions DESC LIMIT 2) UNION (SELECT NetworthInMillions FROM Actors ORDER BY NetworthInMillions ASC);
-- Query 8
(SELECT CONCAT(FirstName, ' ', SecondName) AS "Actor Name" FROM Actors ORDER BY NetworthInMillions DESC LIMIT 2) UNION (SELECT NetworthInMillions FROM Actors ORDER BY NetworthInMillions ASC LIMIT 3);
SELECT column1 FROM table1 UNION SELECT column2 FROM table2;
As a contrived example, we’ll write a query that prints all the first names from the Actors table and all the URLs from the DigitalAssets table.
SELECT FirstName FROM Actors UNION SELECT URL FROM DigitalAssets;
(SELECT CONCAT(column1, ‘ ‘, column2) AS “Actor Name” FROM table1 ORDER BY column3 DESC LIMIT 2) UNION (SELECT CONCAT(column2, ‘ ‘, column2) AS “ThisAliasIsIgnored” FROM table1 ORDER BY column4 ASC LIMIT 2);
A more realistic example would be a query where you are required to print the top two richest actors and the least two richest.
(SELECT CONCAT(FirstName, ' ', SecondName) AS "Actor Name" FROM Actors ORDER BY NetworthInMillions DESC LIMIT 2) UNION (SELECT CONCAT(FirstName, ' ', SecondName) AS "ThisAliasIsIgnored" FROM Actors ORDER BY NetworthInMillions ASC LIMIT 2);
Note we have used the various techniques learned so far to enhance the above query. We use the concat function to join first and second names with a space separating them and use the alias “Actor Name” for the resulting column. The alias from the second query is ignored. Furthermore, we wrap the two queries in parentheses which is a requirement when using the order by or limit clause in subqueries of a union query.
SELECT column1, column2 FROM table UNIONSELECT column3 FROM table;
When using the UNION clause, the two result sets being combined should have the same number and order of columns. The columns from the result sets should be of the same type or types that are compatible. For instance, the following query will error out:
SELECT FirstName, Id FROM Actors UNIONSELECT FirstName FROM Actors;
To make the above query work, we can insert a fake column or null as follows:
SELECT FirstName, Id FROM Actors UNION SELECT FirstName, null FROM Actors;
Left and Right Joins
The two joins add additional rows to the result set for one of the tables participating in the join. We can best exemplify the two joins pictorially as follows:
Left Join
Right Join
Syntax for Left Join :
SELECT * FROM table1 LEFT [OUTER] JOIN table2 ON <join condition>
Syntax for Right Join #
SELECT * FROM table1 RIGHT [OUTER] JOIN table2 ON <join condition>
-- Query 1
SELECT FirstName, SecondName, AssetType, URL FROM Actors LEFT JOIN DigitalAssets ON Actors.Id = DigitalAssets.ActorID;
-- Query 2
SELECT FirstName, SecondName, AssetType, URL FROM DigitalAssets LEFT JOIN Actors ON Actors.Id = DigitalAssets.ActorID;
-- Query 3
SELECT FirstName, SecondName, AssetType, URL FROM Actor RIGHT JOIN DigitalAssets ON Actors.Id = DigitalAssets.ActorID;
SELECT column1, column2, column3, column4 FROM table1 LEFT JOIN table2 ON table1.Id = table2.column5;
We’ll start with the query from the inner join lesson that output all the actors with digital assets. If you remember, the inner join query only outputs celebrities who have a digital presence. If we use the LEFT JOIN instead, we’ll get a list of all the actors with or without digital presence. The query is shown below:
SELECT FirstName, SecondName, AssetType, URL FROM Actors LEFT JOIN DigitalAssets ON Actors.Id = DigitalAssets.ActorID;
Note that the output now includes those actors who don’t have a digital presence. The LEFT JOIN includes those rows from the table on its left that don’t match with rows in the table to its right.
SELECT column1, column2, column3, column4 FROM table2 LEFT JOIN table1 ON table1.Id = table2.column5;
Interestingly, if we flip the order of the two tables in the query we get a different result:
SELECT FirstName, SecondName, AssetType, URL FROM DigitalAssets LEFT JOIN Actors ON Actors.Id = DigitalAssets.ActorID;
The outcome makes sense, because the DigitalAssets table doesn’t have any rows that don’t have an owner in the Actors table, so all the rows in the DigitalAssets table match with a row in the Actors table and become part of the output. Note that actors without digitalpresence are left out.
SELECT column1, column2, column3, column4 FROM table1 RIGHT JOIN table2ON table1.Id = table2.column5;
The RIGHT JOIN is very similar to the LEFT JOIN. The only difference is that in the case of the left join, the unmatched rows come from the table specified on the left of the LEFT JOIN clause whereas, in the case of right join, the unmatched rows come from the table specified on the right of the RIGHT JOIN clause. If we use right join in the first query of the lesson, we would not need to flip the tables as we did above.
SELECT FirstName, SecondName, AssetType, URL FROM Actors RIGHT JOIN DigitalAssetsON Actors.Id = DigitalAssets.ActorID;
Note that an alternative syntax for left and right joins is LEFT OUTER JOIN and RIGHT OUTER JOIN respectively, though there’s no difference in functionality if you skip the OUTER keyword.
Natural Join
The clause attempts to find the natural join between participating tables by matching on columns with same name.
Syntax for Natural Join :
SELECT *
FROM table1
NATURAL JOIN table2
-- Query 1
SELECT FirstName, SecondName, AssetType, URL FROM Actors NATURAL JOIN DigitalAssets;
-- Query 2
SELECT FirstName, SecondName, AssetType, URL FROM Actors INNER JOIN DigitalAssets;
-- Query 3
-- Alter the column name
ALTER TABLE DigitalAssets CHANGE ActorId Id INT;
-- rerun the previous query
SELECT FirstName, SecondName, AssetType, URL FROM Actors NATURAL JOIN DigitalAssets;
-- Query 4
SELECT FirstName, SecondName, AssetType, URL FROM Actors INNER JOIN DigitalAssets USING (Id);
-- Query 5
SELECT FirstName, SecondName, AssetType, URL FROM Actors NATURAL LEFT OUTER JOIN DigitalAssets;
SELECT column1, column2, column3, column4 FROM table1 NATURAL JOIN table2;
The NATURAL JOIN performs an inner join of the participating tables essentially without the user having to specify the matching columns. An example is as follows:
SELECT FirstName, SecondName, AssetType, URL FROM Actors NATURAL JOIN DigitalAssets;
Note that since none of the columns in the two tables share the same name, the result is a Cartesian product. The screenshot shows the Cartesian product only partially. The same result can be achieved using the inner join as follows:
SELECT FirstName, SecondName, AssetType, URL FROM Actors INNER JOIN DigitalAssets;
ALTER TABLE table2 CHANGE column1 Id INT;
We’ll execute the above query again, but we’ll alter the column name for the DigitalAssets table from ActorID to ID so that it matches the column name in the Actors table.
-- Alter the column name
ALTER TABLE DigitalAssets CHANGE ActorId Id INT;
-- rerun the previous query
SELECT FirstName, SecondName, AssetType, URL FROM Actors NATURAL JOIN DigitalAssets;
You can observe from the results that the server matched the columns with the same name in both the tables and we get results equivalent to the following inner join query:
SELECT FirstName, SecondName, AssetType, URL FROM Actors INNER JOIN DigitalAssets USING (Id);
Under the hood, a natural join query is translated into an inner join query with matching column names ending up inside the using clause.
SELECT column1, column2, column3, column3 FROM table1 NATURAL LEFT OUTER JOIN table2;
We can also ask for natural left and right joins. As an example, we show a natural left join below:
SELECT FirstName, SecondName, AssetType, URL FROM Actors NATURAL LEFT OUTER JOIN DigitalAssets;
From the output you can see there’s nothing magical about the natural join, it’s just syntactic sugar that implicitly finds the columns to join the tables. Ideally, we should write expressive queries and avoid using the natural join as it hides the columns that’ll be used for the join and can subtly introduce bugs. Imagine a situation where a table is altered to have an additional column that has the same name as a column in another table which is naturally joined with the first table in an existing query. Suddenly, the results from the natural join query will stop to make sense.