In this article we are discussing about the most asked SQL Query in various interviews, So let get started.
The database relationship model is reprinted below for reference.
This Articles Contents
Write a SQL Query that prints the top three movies by box office collection?
This asks for us to print the top three movies and should hint towards sorting. The sort key should be the column CollectionInMillions. However, remember that by default, ORDER BY sorts in ascending order so we’ll need to sort in descending order. The last piece to the puzzle is to apply the LIMIT clause so that we only retrieve the top three rows
SELECT Name FROM Movies ORDER BY CollectionInMillions DESC LIMIT 3;
Can you write a SQL Query to determine if any two actors share the same second name?
The information we want to extract is contained within the Actors table. However, we need a way to compare the second name of the first actor with all the other actors in the table except with itself. How can you make the comparison? The answer is using a self join.
Whenever you hear yourself thinking in terms of picking up a row from a table and comparing it to another row from the same table or another table, you are looking for a join. Without further ado, we’ll perform an inner join of the Actors table.
SELECT * FROM Movies a INNER JOIN Movies b; //like us on facebook : https://facebook.com/onurdesk
Only part of the join is shown below because of limited space.
Notice we didn’t specify any condition for the inner join and the result is a cartesian product. Every row from the table is compared or joined with every other row including itself. Since we are joining on the table itself, we have to specify aliases a and b to refer to the table and its copy to avoid ambiguity. This is what we wanted. Now we’ll apply our criteria or check to narrow down to rows we want.
- We don’t want each row to compare to itself so our condition can be a.Id != b.Id.
- The other condition is to find those actors that have matching second names. We can specify that as a.SecondName = b.SecondName.
Let’s see what we have so far:
SELECT concat(a.FirstName," ",b.SecondName) FROM Actors a INNER JOIN Actors b ON a.SecondName = b.SecondName WHERE a.ID != b.ID;
We are able to print the names of the actors with same second names, however, each name is printed twice. This is because when we find two actors, say X and Y, with matching second names, the matches also take place when we process the row for actor Y. So, one match happens when we process X and Y and the second match happens when we process Y and X. The final piece we need is to add the DISTINCT clause so that all names are printed only once.
SELECT DISTINCT concat(a.FirstName," ",b.SecondName) AS Actors_With_Shared_SecondNames FROM Actors a INNER JOIN Actors b ON a.SecondName = b.SecondName WHERE a.Id != b.Id;
Write a SQL Query to count the number of actors who share the same second name. Print the second name along with the count.
This question builds upon the previous question. We listed all the actors who shared the same last name but now we want to count how many actors share a particular second name. For instance, the second name Kardashian is shared by 3 actors. Asking for the count per shared second name is a hint to use aggregation. We have already done the heavy lifting in the previous question. Let’s GROUP BY second name first and see what we get:
SELECT b.SecondName FROM Actors a INNER JOIN Actors b ON a.SecondName = b.SecondName WHERE a.Id != b.Id GROUP BY b.SecondName;
Remember GROUP BY returns one row per group so we get two rows as there are only two shared second names. Now we need to count how many rows fall into each group. One way is to count the first names in each group as follows:
SELECT a.SecondName, COUNT(DISTINCT a.FirstName) FROM Actors a INNER JOIN Actors b ON a.SecondName = b.SecondName WHERE a.Id != b.Id group by a.SecondName;
The astute reader would immediately realize that in our setup, FirstName isn’t the primary key. The above query will fail when we have two actors with exactly the same first and second names (born on different dates), which is a possibility. We’ll report one less than the actual number of actors who share the same second name in such a scenario. The fix is to count the IDs that fall into each group as follows:
SELECT a.SecondName AS Actors_With_Shared_SecondNames, COUNT(DISTINCT a.Id) AS Count FROM Actors a INNER JOIN Actors b ON a.SecondName = b.SecondName WHERE a.Id != b.Id group by a.SecondName;
Write a SQL Query to display all those actors who have acted in at least one movie?
We want to print the names of all those actors who have acted in at least one movie. Imagine our database also holds aspiring actors who have not yet landed a movie contract.
The data we are interested in resides in the two tables Cast and Actors. If we join the two tables, we’ll get all the actors that have at least one movie to their name. We can then grab the names of the actors from the joined tables and print them. There’s a caveat though; we may print an actor’s name if the actor has multiple movies, therefore, we must use the DISTINCT clause. The query is shown below:
SELECT DISTINCT CONCAT(FirstName, " ", SecondName) AS Actors_Acted_In _Atleast_1_Movies FROM Actors INNER JOIN Cast ON Id = ActorId;
As a corollary to the previous question, can you find the different ways of listing those aspiring actors who haven’t acted in any movie yet?
One of the easiest ways to answer this query is to take IDs of the actors from the previous query and minus those IDs from the entire set of Actors. The remaining IDs will be of actors who don’t have a film to their name yet. The query is shown below:
SELECT Id, CONCAT(FirstName, " ", SecondName) AS Actors_With_No_Movies FROM Actors WHERE Id NOT IN (SELECT Id FROM Actors INNER JOIN Cast ON Id = ActorId);
The subquery returns IDs of those actors who have participated in films. We simply list those actors whose ID doesn’t appear in the result set of the inner subquery.
Yet another way is to use a LEFT JOIN the two tables, with Actor as the left argument and the Cast table as the right argument. Let’s see the result of a left join between the two tables.
SELECT * FROM Actors LEFT JOIN Cast ON Id = ActorId;
You can observe from the joined table that the MovieID column is NULL for those actors who haven’t been part of any movie. Thus, we can use the condition MovieID = NULL in our join query to identify aspiring actors. The query is shown below:
SELECT CONCAT(FirstName, " ", SecondName) FROM Actors LEFT JOIN Cast ON Id = ActorId WHERE MovieId IS NULL;
Write a SQL Query to display all those actors who have acted in 2 or more movies.
In this question we are required to print the names of the actors who have acted in two or more movies. The names of the actors are in the Actors table and the number of movies an actor has appeared in is in theCast table. If we join the two tables, we can get the name of actor and the ID of the movie that the actor has starred in. Let’s see how that looks like:
SELECT Id, FirstName, SecondName, MovieId FROM Actors INNER JOIN Cast ON Id = ActorId;
Each row contains a movie ID in which an actor has starred. We can GROUP BY the result of the above query by ID of each actor so that all the movies that an actor has acted in, fall into the same group. Next, we simply count the rows in each group. So far, we have the following:
SELECT Id, COUNT(*) FROM Actors INNER JOIN Cast ON Id = ActorIdGROUP BY Id;
Note we have removed the columns in the SELECT clause since they don’t participate in the aggregation criteria. Now we’ll apply the restriction to only list those groups which have more than one row to fulfill the requirement to print names of only those actors who have acted in at least two movies.
SELECT Id, COUNT(*) AS MovieCount FROM Actors INNER JOIN Cast ON Id = ActorIdGROUP BY Id HAVING MovieCount > 1;
The last piece is to print the actor’s name. The above query is printing actor Id and the count of movies the actor has been part of. We can join the result of the above query with the Actors table based on the common actor ID column the two tables hold. From the joined result we can extract the actor name and the movie count columns. The complete query appears below:
SELECT CONCAT (FirstName, " ", SecondName) AS Actor_Names, Movie_Count FROM Actors a INNER JOIN (SELECT Id, COUNT(*) AS Movie_Count FROM Actors INNER JOIN Cast ON Id = ActorId GROUP BY Id HAVING Movie_Count > 1) AS tblON tbl.Id = a.Id;
Find the cast of the movie Mr and Mrs. Smith without using joins.
We are given the name of the movie but not its ID. We’ll first need a query to extract the ID of the movie. If we were asked to find the cast of the movie Ocean’s 11, which has two entries in the Movies table since there have been two movies by the same name, we would need furtherinformation to narrow down to exactly one movie. In the case of, Mr. & Mrs. Smith we have only one entry and we can get the ID of the movie as follows:
SELECT Id FROM Movies WHERE Name="Mr & Mrs. Smith";
We can identify the cast of the movie by looking for rows in the Cast table that have the same value for the MovieID column as the ID of the movie Mr. & Mrs. Smith. We’ll use nested query to list the IDs of the cast as follows:
SELECT ActorId FROM Cast WHERE MovieId IN (SELECT Id FROM Movies WHERE Name="Mr & Mrs. Smith");
Once we have the actor IDs, we can find all the names of the actors from the Actors table by matching on the ID. The complete query appears below:
SELECT CONCAT(FirstName, " ", SecondName)AS "Cast Of Mr. & Mrs. Smith" FROM Actors WHERE Id IN ( SELECT ActorId FROM Cast WHERE MovieId IN (SELECT Id FROM Movies WHERE Name="Mr & Mrs. Smith"));
The solution we provided used nested queries, we could also have arrived at the same result using joins. The complete query with joins is as follows:
SELECT CONCAT(FirstName, " ", SecondName) AS "Cast Of Mr. & Mrs. Smith" FROM Actors INNER JOIN (SELECT ActorId FROM Cast INNER JOIN Movies ON MovieId = Id WHERE Name="Mr & Mrs. Smith") AS tbl ON tbl.ActorId = Id;
Print a list of movie and the actor(s) who participated in the movie ordered by movie name.
In this question, we are asked to list the movie and actor name pairs. The Cast table exactly does that; however, it lists integer IDs for actors and movies instead of actual names. We have to expand the IDs into names for both actors and movies. Let’s start by joining the Cast table with the Actors table on movie ID. The resultant derived table will have the movie names. The query is shown below:
SELECT Name, ActorId FROM Movies INNER JOIN Cast On Id = MovieId;
Next, we can join the derived table from the above query with the Actors table based on actor ID and thus extract the actor name. Lastly, don’t forget to order the result by movie name. The complete query is shown below:
SELECT tbl.Name AS Movie_Name, CONCAT(FirstName, " ", SecondName) AS Actor_Name FROM Actors INNER JOIN (SELECT Name, ActorId FROM Movies INNER JOIN Cast On Id = MovieId) AS tbl ON tbl.ActorId = Id ORDER BY tbl.Name ASC;
Print the count of actors in each movie.
The requirement to use count hints towards aggregation. In the previousquery, we were able to print the pair of actor and movie names. The astute reader would immediately realize that if we GROUP BY the results by movie name from the previous query, all the actors who acted in that movie will fall into that bucket. We can then count the number of actors in each bucket/group and report that as the result. However, there’s one catch! If we group by movie name, then actors from two movies with the same name will all fall in one bucket and distort the counts. Therefore, we must GROUP BY movie ID.
Modifying previous query, we get:
SELECT tbl.Name, COUNT(*) FROM Actors INNER JOIN (SELECT Name, ActorId, MovieId FROM Movies INNER JOIN Cast On Id = MovieId) AS tbl ON tbl.ActorId = Id GROUP BY tbl.MovieId;
The above query gets us what we want but is convoluted and unnecessarily complex. Note that the information we require is available in the Cast and the Movies table. We don’t require the actor names in our final result, so we don’t need to join with the Actors table. We can GROUP BY the contents of the Movies table by movie ID and find the counts of actors for each movie as follows:
SELECT MovieId, COUNT(*) FROM Cast GROUP BY MovieId;
Now we can join the derived table from the above query with the Movies table on the movie to infer the movie name. The query is shown as follows:
SELECT Name AS Movie_Name, Actor_Count FROM Movies INNER JOIN (SELECT MovieId, COUNT(*) AS Actor_Count FROM Cast GROUP BY MovieId) AS tbl ON tbl.MovieID = Id;
List the names of Producers who never produced a movie for Tom Cruise.
Information about movie producers resides in the Movies table. We don’t know the ID of Tom Cruise on top of our head, so we’ll need to query the Actors table too. Finally, the Cast table will let us connect the various queries to find all producers who didn’t include Tom Cruise in the cast.Let s start by first finding the ID of Tom Cruise.
SELECT Id FROM Actors WHERE FirstName = "Tom" AND SecondName = "Cruise";
Next, we’ll find all the movies in which Tom Cruise has acted. The producers of these movies shouldn’t be included in our final result as they did cast Tom Cruise in their movies.
SELECT MovieId FROM Cast WHERE ActorId = (SELECT Id FROM Actors WHERE FirstName = "Tom" AND SecondName = "Cruise");
So far, we have been successful in collecting all the movie IDs in which Tom was an actor. Using these IDs we can join on the Movies table and find all those producers who had Tom in their movies.
SELECT DISTINCT Producer FROM Movies WHERE Id IN (SELECT MovieId FROM Cast WHERE ActorId = (SELECT Id FROM Actors WHERE FirstName = "Tom" AND SecondName = "Cruise"));
We may be tempted to add in the NOT IN in the outermost select clause to get all the producers who didn’t have Tom act in their movies and declare it as the final result, however, that is incorrect. The resulting query will be:
SELECT DISTINCT Producer FROM Movies WHERE Id NOT IN (SELECT MovieId FROM Cast WHERE ActorId = (SELECT Id FROM Actors WHERE FirstName = "Tom" AND SecondName = "Cruise"));
You can observe from the result that J. J. Abrams is also returned in the result set even though he did a movie with Tom. A producer with two movies and Tom as an actor in only one of them, is also returned in the result whereas we want only those producers who never worked with Tom. The fix is to find all those producers which have at least one movie with Tom, which we have already done, and then subtract these producers from the set of all the producers. The complete query appears below:
SELECT DISTINCT Producer FROM Movies WHERE Producer NOT IN (SELECT Producer FROM Movies WHERE Id IN (SELECT MovieId FROM Cast WHERE ActorId = (SELECT Id FROM Actors WHERE FirstName = "Tom" AND SecondName = "Cruise")));
We have chosen to use nested queries to get the desired result, however, we could also have used joins.
Write a SQL Query to display all those movie titles whose budget is greater than the average budget of all the movies.
This question also requires flexing MySQL’s aggregation capabilities. First we’ll write a query to calculate the average budget for all the films as follows:
SELECT AVG(BudgetInMillions) FROM Movies;
Now, we can plug the above query as a sub-query and list all the movies whose budget was greater than the average budget across all movies.
SELECT Name FROM Movies WHERE BudgetInMillions > (SELECT AVG(BudgetInMillions) FROM Movies);
Find all those actors who don’t have any digital media presence using a right join statement.
The Actors table has the ID column which is the same as the ActorID column of the DigitalAssets table. In a right join, the table on the right side of the join has all the rows included which don’t satisfy the join criteria. In this case, we want to include all the actors that don’t have their ID present in the DigitalAssets table, so we need to place the Actors table on the right of the RIGHT JOIN clause.
SELECT * FROM DigitalAssets RIGHT JOIN Actors ON Id = ActorId;
The output in the below screenshot is truncated for lack of space.
The result set of the above query sets NULL for columns from DigitalAssets table for those rows from the Actors table that don’t have a corresponding entry in the DigitalAssets table. We can predicate on the column URL being NULL to identify those actors who don’t have a social media presence.
SELECT CONCAT(FirstName, " ", SecondName) AS Actors_With_No_Online_Presence FROM DigitalAssets RIGHT JOIN Actors ON Id = ActorId WHERE URL IS NULL;
Can you rewrite the previous query without a join and using EXISTS operator?
We can grab all the actor IDs that exist in the DigitalAssets table and then select those actors from the Actors table whose ID doesn’t appear in the result set of the subquery.
SELECT CONCAT(FirstName, " ", SecondName) FROM Actors WHERE NOT EXISTS (SELECT ActorId FROM DigitalAssetsWHERE ActorId = Id);
Write a query to print the name of the fifth highest grossing movie at the box office.
It’s trivial to print the list of Movies sorted by how much they made at the box office in descending fashion as follows:
SELECT Name, CollectionInMillions FROM Movies ORDER BY CollectionInMillions DESC;
To print the 5th highest grossing movie, we need to leverage the OFFSET and LIMIT clauses. OFFSET allows us to print results starting from a specific row in the sorted result set. We’ll want to print the row immediately after the first 4 rows, which will be the fifth row in the sorted result and the 5th highest grossing movie. Finally, we’ll use the LIMIT clause set to 1 to print only one row. The final query is shown below:
SELECT Name, CollectionInMillions AS Collection_In_Millions FROM Movies ORDER BY CollectionInMillions DESC LIMIT 1 OFFSET 4;
Alternative syntax would be:
SELECT Name, CollectionInMillions AS Collection_In_Millions FROM Movies ORDER BY CollectionInMillions DESC LIMIT 4, 1;
Find those movies, whose cast latest activity on social media occurred between the span of 5 days before and 5 days after therelease date.
This question is an example of a complex query. We are asked to find the names of those movies whose cast’s latest activity on their digital assets was around the same time when the movie was released. We store the last time an actor updated any of his digital accounts in the DigitalAssets table whereas the release date of the movie is in the Movies table. We’ll need to connect all information from the DigitalAssets table to the Movies using the Cast table step by step. Let’s see how we can do that:
First, we’ll retrieve the latest times for all the actors when they made updates to their online accounts along with their IDs.
SELECT LastUpdatedOn, Id FROM Actors INNER JOIN DigitalAssets ON ActorId = Id;
Next, we’ll join the result of the previous step with the Cast table based on the actor ID. The result of this step will be another derived table whose each row will consist of a movie ID, an actor that was part of the movie, and time of their latest online activity.
SELECT * FROM Cast INNER JOIN (SELECT LastUpdatedOn, Id FROM Actors INNER JOIN DigitalAssets ON ActorId = Id) AS tbl ON tbl.Id = ActorId;
In the third step we can join the derived table of the second step with the Movies table based on the movie ID. The derived table resulting from this joining will have the movie name, movie release date, an actor participating in that movie, and the latest time of that actor’s online activity. By now we have all the necessary columns we need to compare.
SELECT * FROM Movies AS m INNER JOIN (SELECT * FROM Cast INNER JOIN (SELECT LastUpdatedOn, Id FROM Actors INNER JOIN DigitalAssets ON ActorId = Id) AS tbl1 ON tbl1.Id = ActorId) AS tbl2 ON tbl2.MovieId = m.Id;
In the fourth step we’ll add a WHERE clause and set the LastUpdatedColumn to be between the plus/minus 5 days from thedate of the movie release. Also, remember that two actors cast in the same movie could have posted about their upcoming movie, but we want to print the name of the movie only once, therefore, we also add the DISTINCT clause.
SELECT DISTINCT Name AS Actors_Posting_Online_Within_Five_Days_Of_Movie_Release FROM Movies AS m INNER JOIN (SELECT * FROM Cast INNER JOIN (SELECT LastUpdatedOn, Id FROM Actors INNER JOIN DigitalAssets ON ActorId = Id) AS tbl1 ON tbl1.Id = ActorId) AS tbl2 ON tbl2.MovieId = m.Id WHERE ADDDATE(ReleaseDate, INTERVAL -5 Day) <= LastUpdatedOn AND ADDDATE(ReleaseDate, INTERVAL +5 Day) >= LastUpdatedOn;
Write a query to display the average collection in millions of producers who have produced more than one movie.
To answer this question, we need to shortlist only those producers whose name appears more than once in the Movies table. This is hinting towards grouping the results by producers:
SELECT Producer FROM Movies GROUP BY Producer;
Since we are only interested in producers who have produced more than one movie, we will add a restriction on the number of times a producer’s name appears in the Movies table. The HAVING clause used with the GROUP BY clause will give us the desired result.
SELECT Producer FROM Movies GROUP BY Producer HAVING COUNT(Producer) > 1;
We are now left with two produces who have multiple movies. The laststep is to find the average collection in millions of the films by these two producers:
SELECT Producer AS Producer_Name, AVG(CollectionInMillions) AS Average_Collection_In_Millions FROM Movies GROUP BY Producer HAVING COUNT(Producer) > 1;
Find all those actors who have not worked with producer Ryan Seacrest.
Approach 1: Joining three tables in a single SQL query.
The information on actors, movies and producers is scattered in three tables; Actors, Cast and Movies. We need to join the tables together to find the answer. Joining three tables in a single SQL query can be a tricky concept. The first table is related to the second table and the second table is related to the third table. In our case the Actors table is related to the Cast table and the Cast table is related to the Movies table. We want the names of the actors from the Actors table and information on producers from the Movies table.
The Cast table joins the two tables Actors and Movies together and contains the primary key of both tables. The primary key (PK) of the Actors table is a foreign key (FK) in the Cast table. Similarly, the primary key of the Movies table is the foreign key in the Cast table. Understanding these table relationships is the key to joining multiple tables in a single MySQL query.
The basic syntax of joining three tables in MySQL is as follows:
SELECT table1.col, table3.col FROM table1 JOIN table2 ON table1.PK = table2.FK JOIN table3 ON table2.FK = table3.PK
First we join table1 and table2 and then the combined data of the two tables is joined with table3. Here table2 is the table that connects table1 and table3. In our case this will translate to the following query:
SELECT CONCAT (FirstName, " ", SecondName) AS Actors, MovieId, Producer FROM Actors JOIN Cast ON Actors.Id = Cast.ActorId JOIN Movies ON Cast.MovieId = Movies.Id;
This query joins the Actors and Movies table using the connecting Cast table. We need only those actors who have not worked with producer Ryan Seacrest. Elinimation of the unwanted rows can be accomplished by adding a condition at the end of our multiple join query as:
SELECT CONCAT (FirstName, " ", SecondName) AS Actors, MovieId, Producer FROM Actors JOIN Cast ON Actors.Id = Cast.ActorId JOIN Movies ON Cast.MovieId = Movies.Id AND Producer <> 'Ryan Seacrest';
The query now gives us 10 rows after removing three rows with Ryan Seacrest as producer. A glance at the results shows that some actors have worked in multiple movies and thus their name appears more than once. Since we are only interested in the names of those actors who have not worked with a particular producer, we need to apply DISTINCT clause. The final query looks like this:
SELECT DISTINCT(CONCAT (FirstName, " ", SecondName)) AS Actors_Who_Have_Not_Worked_with_Ryan_Seacrest FROM Actors JOIN Cast ON Actors.Id = Cast.ActorId JOIN Movies ON Cast.MovieId = Movies.Id AND Producer <> 'Ryan Seacrest';
Approach 2: Without using joins.
The information required to answer this question is scattered across three tables; Actors, Cast and Movies. First we will find the producers of the movies by combining data from Cast and Movies tables on the common column, movie ID.
SELECT c.ActorID, c.MovieId, m.Producer FROM Cast c, Movies m WHERE c.MovieId = m.Id;
The query adds the producers to each row of the Cast table. Next step is to find those actors who have not worked with producer Ryan Seacrest. This can be done by adding a condition Producer <> ‘Ryan Seacrest’.
SELECT c.ActorID, c.MovieId, m.Producer FROM Cast c, Movies m WHERE c.MovieId = m.IdAND m.Producer <> 'Ryan Seacrest';
We are left with 10 rows now and it can be seen that some actors are appearing multiple times having acted in more than one movie. The last step is to display the names of the actors after removing duplicate entries. The DISTINCT clause will remove duplicates. For the actor names, we will add columns from Actors table to the query created so far:
SELECT DISTINCT(CONCAT (a.FirstName, " ", a.SecondName)) AS Actors_Who_Have_Not_Worked_with_Ryan_Seacrest FROM Cast c, Movies m, Actors a WHERE c.MovieId = m.Id AND m.Producer <> 'Ryan Seacrest' AND c.ActorId = a.Id;
Populate a table DigitalActivityTrack with the last digital activity of each actor along with the asset type on which the activity occurred.
Every actor has multiple digital assets which they update at different times. We need to find the digital asset which was last updated by the actor.
We need to track the update time of the digital assets. This information is contained in the DigitalAssets table. Since an actor has multiple digital assets, we need to group the rows of the DigitalAssets table according to ActorId.
SELECT ActorId, AssetType, LastUpdatedOn FROM DigitalAssetsORDER BY ActorId ASC, LastUpdatedOn DESC;
This query orders the table according to actors. We have sorted the LastUpdatedOn column in descending order because we want to track the last digital activity. Since we need the time when an actor last updated a digital asset, we are interested in the first row for each actor from the output shown above. We will use the MAX function to limit the output so that only the first row of each group is displayed.
SELECT ActorId, MAX(LastUpdatedOn) FROM DigitalAssets GROUP BY ActorId;
Here we find the latest digital activity of the actors but we cannot display the asset type when using the aggregate function in the SELECT clause. For this purpose we need to use the IN operator to move the aggregate function to the WHERE clause as follows:
SELECT ActorId, AssetType, LastUpdatedOn FROM DigitalAssets WHERE (ActorId, LastUpdatedOn) IN (SELECT ActorId, MAX(LastUpdatedOn) FROM DigitalAssets GROUP BY ActorID);
The output shows that we need an ORDER BY clause to sort the output based on LastUpdatedOn. Now that we have the required columns, we need to create table DigitalActivityTrack that contains the actor name, the name of the digital asset and the time it was last updated.
CREATE TABLE DigitalActivityTrack ( Id INT AUTO_INCREMENT NOT NULL PRIMARY KEY, Actor_Id INT NOT NULL, Digital_Asset VARCHAR(20) NOT NULL, Last_Updated_At DATETIME Not NULL DEFAULT NOW());
To populate the DigitalAcitvityTrack table with the query created above, we will use the INSERT INTO SELECT statement as follows:
INSERT INTO DigitalActivityTrack (Actor_Id, Digital_Asset, Last_Updated_At) SELECT ActorId, AssetType, LastUpdatedOn FROM DigitalAssets WHERE (ActorId, LastUpdatedOn) In (SELECT ActorId, MAX(LastUpdatedOn) FROM DigitalAssets GROUP BY ActorID) ORDER BY LastUpdatedOn DESC;
As a final step, we will display the results with actor names:
SELECT CONCAT(a.FirstName, " ", a.SecondName) AS Actor_Name, Digital_Asset, Last_Updated_At FROM Actors a, DigitalActivityTrack WHERE a.Id = Actor_Id;
Find the actor with the third lowest Net Worth in Millions without using the LIMIT clause.
This is practice for a widely asked interview question where the candidate is required to find the nth highest value of a column where n could be any number greater than 1. So for example, you could be asked to find the 3rd highest salary of an employee. We have twisted this question to find the 3rd lowest net worth. The simplest way to solve this problem is by using the IN clause but that solution does not scale up well as the value of n increases. A more efficient way to solve this question by using a correlated sub-query.A correlated sub-query is much like a recursive function where the inner query is executed for every row of the outer query. This is because the data returned by the inner query is compared to the row from the outer query at each step. This makes correlated queries slow and in general, they are avoided unless there is no better way to solve the problem.
Mapping this concept of correlated queries to our problem, we need to compare the NetWorthInMillions returned by the outer query with other values of net worth to find out exactly how many values are lower than this value.
To generalize this concept, we can use the following syntax to find the nth lowest value:
SELECT col1, col2 From table_name t1 WHERE N – 1 = ( SELECT COUNT ( DISTINCT (col2)) FROM table_name t2 WHERE t2.col2 < t1.col2 )
If we are finding the 2nd lowest value, then N will be 2. If we are asked t find the 4th lowest value then N will be 4.The first row of the Actors table has NetWorthInMillions equal to 240. This value is plugged in the correlated sub-query and the sub-query is then evaluated to find out how many values are lower than 240. The sub-query becomes:
SELECT COUNT(DISTINCT (NetWorthInMillions)) FROM Actors t2 WHERE t2.NetWorthInMillions < 240
The DISTINCT keyword will remove any duplicate net worth values. If COUNT returns 2, that means there are two values lower than 240 which makes it the third lowest value. If the answer is not 2, then we move on the second row of the Actors table and repeat the same process till wefind a value for NetWorthInMillions which has exactly two values lower than it. This explains why we have used N-1 in the WHERE clause.
Based on the above discussion, to find the actor with the third lowest net worth we will use the following query:
SELECT CONCAT (FirstName, " ", SecondName) AS Actor_Name, NetWorthInMillions AS 3rd_Lowest_Net_Worth_In_Millions From Actors a1 WHERE 2 = (SELECT COUNT(DISTINCT (NetWorthInMillions)) FROM Actors a2 WHERE a2. NetWorthInMillions < a1. NetWorthInMillions);
If we examine the Actors table we can see that there are two actors with the third lowest NetWorth namely Kourtney Kardashian and Abhishek Bachan. Let’s execute our query and see if it gives the same result: This is a generic solution but it is slow because the inner query is processed for each row of the outer query.
Write a query to display actors along with a comma separated list of their digital assets.
Actors have multiple digital assets and this information is contained in the DigitalAssets table. We need to find duplicate rows based on the the ActorId column. In the DigitalAssets table, the column ActorId has duplicate values but when ActorId is coupled with AssetType, the value is unique. In MySQL the GROUP BY clause is used to identify duplicates.
SELECT ActorID, COUNT(ActorId) FROM DigitalAssets GROUP BY ActorId;
This query has found duplicate entries in the DigitalAssets table based on ActorId. Instead of a count of digital assets that the actor has, we are interested in a comma separated list. The GROUP_CONCAT() function comes to our aid as it is built specifically for the purpose of concatenating the results into a comma separated list.
SELECT ActorID, GROUP_CONCAT(AssetType) FROM DigitalAssets GROUP BY ActorId;
The last step is to join this table with the Actors table to get the names of the actors.
SELECT CONCAT (FirstName, " ", SecondName) AS Actor_Name, GROUP_CONCAT(AssetType) AS Digital_Assets FROM Actors INNER JOIN DigitalAssets ON Actors.Id = DigitalAssets.ActorId GROUP BY Id;
The result set is missing actors without any digital assets. So we will change the join type to LEFT JOIN.
SELECT CONCAT (FirstName, " ", SecondName) AS Actor_Name, GROUP_CONCAT(AssetType) AS Digital_Assets FROM Actors LEFT JOIN DigitalAssets ON Actors.Id = DigitalAssets.ActorId GROUP BY Id;